Ingesting Data Using DataHub UI Guidance - Step-by-Step Instructions

user-1 · June 17, 2024, 12:03am

<@U06TM7M40RK> How to ingest below data using datahub UI
{
“Table Name”: “p13n_data.appeasement_ab_test_analytics”,
"Old Assignment ": “Seshanth”,
“New_Assignment”: “Sivasangari”,
“In-Scope CCPA Final”: null,
“In-Scope MHMD Final”: null,
“Asset Type”: “Table”,
“Referenced Table”: null,
“GCS Location”: null,
“In-Scope(CCPA)”: “No”,
“CCPA Related Attribute”: null,
“In-scope(MHMD)”: “No”,
“MHMD Related Attribute”: “CID :yes -cid”,
“ML Model Based Table”: null,
“Complex Data”: null,
“To be deleted”: null,
“Progress Status as of May 8 2024”: “Completed”
}

user-2 · June 17, 2024, 12:03am

To ingest the provided data using the DataHub UI, you can follow these steps:

Navigate to the Ingestion Tab: Ensure you have the necessary privileges (Manage Metadata Ingestion and Manage Secrets) and navigate to the ‘Ingestion’ tab in DataHub.
Create a New Ingestion Source:
- Click on + Create new source.
- Select a Custom platform template to construct an ingestion recipe from scratch.
Configure the Ingestion Recipe:
- Define the ingestion recipe in YAML format. The recipe will include the source type, source configuration, sink type, and sink configuration.

Here is an example YAML configuration for ingesting the provided JSON data:

source:
  type: file
  config:
    path: /path/to/your/data.json

sink:
  type: "datahub-rest"
  config:
    server: "http://localhost:8080"

Upload the JSON File:
- Ensure the JSON data is saved in a file (e.g., data.json) and accessible at the specified path.
Run the Ingestion:
- Execute the ingestion pipeline through the UI by following the prompts to schedule and run the ingestion job.

Example JSON File (data.json):

[
    {
        "Table Name": "p13n_data.appeasement_ab_test_analytics",
        "Old Assignment": "Seshanth",
        "New_Assignment": "Sivasangari",
        "In-Scope CCPA Final": null,
        "In-Scope MHMD Final": null,
        "Asset Type": "Table",
        "Referenced Table": null,
        "GCS Location": null,
        "In-Scope(CCPA)": "No",
        "CCPA Related Attribute": null,
        "In-scope(MHMD)": "No",
        "MHMD Related Attribute": "CID :yes -cid",
        "ML Model Based Table": null,
        "Complex Data": null,
        "To be deleted": null,
        "Progress Status as of May 8 2024": "Completed"
    }
]

Additional Notes:

Ensure the DataHub server URL (http://localhost:8080) is correctly configured to point to your DataHub instance.
The file path in the YAML configuration should be accessible from the environment where the DataHub ingestion job is running.

Would you like more detailed steps or further assistance on any specific part of the process?

Sources:

<https://datahubproject.io/docs/0.13.1/ui-ingestion|DataHub UI Ingestion Documentation>
<https://datahubproject.io/docs/0.13.1/metadata-ingestion|Metadata Ingestion Overview> 0 button 0 button Hint: Mention @runllm in the thread for followups.

datahub_team · June 17, 2024, 12:03am

Hey there! Make sure your message includes the following information if relevant, so we can help more effectively!

Are you using UI or CLI for ingestion?
Which DataHub version are you using? (e.g. 0.12.0)
What data source(s) are you integrating with DataHub? (e.g. BigQuery)

<@U06TM7M40RK> Can you help with the questions in this thread?

We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a or so other folks in the Community know if you’re still looking for help on the topic!

user-2 · June 17, 2024, 12:03am

To ingest the provided data using the DataHub UI, you can follow these steps:

Ensure Prerequisites: Make sure you have the Manage Metadata Ingestion and Manage Secrets privileges assigned to your account. These can be granted by a <https://github.com/datahub-project/datahub/blob/master/docs/authorization/policies.md|Platform Policy>.
Navigate to the Ingestion Tab: Go to the ‘Ingestion’ tab in DataHub.
Create a New Ingestion Source:
- Click on + Create new source.
- Select a Recipe Template. Since you are ingesting a CSV-like structure, you might need to use a custom template or a file-based ingestion source.
Configure the Recipe:
- Define the ingestion Recipe in YAML format. Here is an example configuration for ingesting a CSV file using the file source:
```
source:
  type: file
  config:
    path: /path/to/your/data.csv

sink:
  type: "datahub-rest"
  config:
    server: "http://localhost:8080"
```
- If you are using a JSON file, you might need to adjust the source type and configuration accordingly.
Run the Ingestion:
- Save and run the ingestion source.
- Monitor the ingestion process and check for any errors in the logs.
Validate the Ingestion:
- After the ingestion process completes, validate that the data has been ingested correctly by checking the dataset location in DataHub.

For more detailed steps and troubleshooting, you can refer to the <https://github.com/datahub-project/datahub/blob/master/docs/ui-ingestion.md|UI-based Ingestion documentation>.

Would you like more detailed steps or further assistance on any specific part of the process?

Sources:

<https://github.com/datahub-project/datahub/blob/master/docs/ui-ingestion.md|UI-based Ingestion>
<https://github.com/datahub-project/datahub/blob/master/docs/authorization/policies.md|Platform Policy> 0 button 0 button Hint: Mention @runllm in the thread for followups.

user-3 · June 17, 2024, 12:04am

<@U073WRZ8FM1> Hi Deepak! Please don’t create multiple threads tagging RunLLM for the same question - try limiting your request to one thread at a time

Topic		Replies	Views
Ingesting a Specific Table from Source Platform into DataHub using Recipe or CLI ingestion	1	75	September 2, 2024
Creating a Custom Ingestion Source in DataHub with CSV Data ingestion	5	80	August 12, 2024
Ingesting a CSV file into Datahub using version 0.13 and configuring the ingestion from the Datahub interface ingestion	4	199	April 22, 2024
Creating New Datasets with CSV Ingestion using DataHub UI in v0.13.0 ingestion	1	98	April 15, 2024
Clarifying Information for DataHub Integration and Ingestion in UI ingestion	2	62	April 8, 2024

Ingesting Data Using DataHub UI Guidance - Step-by-Step Instructions

Related topics