Ingesting Data Using DataHub UI Guidance - Step-by-Step Instructions

Original Slack Thread

<@U06TM7M40RK> How to ingest below data using datahub UI
{
“Table Name”: “p13n_data.appeasement_ab_test_analytics”,
"Old Assignment ": “Seshanth”,
“New_Assignment”: “Sivasangari”,
“In-Scope CCPA Final”: null,
“In-Scope MHMD Final”: null,
“Asset Type”: “Table”,
“Referenced Table”: null,
“GCS Location”: null,
“In-Scope(CCPA)”: “No”,
“CCPA Related Attribute”: null,
“In-scope(MHMD)”: “No”,
“MHMD Related Attribute”: “CID :yes -cid”,
“ML Model Based Table”: null,
“Complex Data”: null,
“To be deleted”: null,
“Progress Status as of May 8 2024”: “Completed”
}

To ingest the provided data using the DataHub UI, you can follow these steps:

  1. Navigate to the Ingestion Tab: Ensure you have the necessary privileges (Manage Metadata Ingestion and Manage Secrets) and navigate to the ‘Ingestion’ tab in DataHub.

  2. Create a New Ingestion Source:

    • Click on + Create new source.
    • Select a Custom platform template to construct an ingestion recipe from scratch.
  3. Configure the Ingestion Recipe:

    • Define the ingestion recipe in YAML format. The recipe will include the source type, source configuration, sink type, and sink configuration.

Here is an example YAML configuration for ingesting the provided JSON data:

source:
  type: file
  config:
    path: /path/to/your/data.json

sink:
  type: "datahub-rest"
  config:
    server: "http://localhost:8080"
  1. Upload the JSON File:

    • Ensure the JSON data is saved in a file (e.g., data.json) and accessible at the specified path.
  2. Run the Ingestion:

    • Execute the ingestion pipeline through the UI by following the prompts to schedule and run the ingestion job.

Example JSON File (data.json):

[
    {
        "Table Name": "p13n_data.appeasement_ab_test_analytics",
        "Old Assignment": "Seshanth",
        "New_Assignment": "Sivasangari",
        "In-Scope CCPA Final": null,
        "In-Scope MHMD Final": null,
        "Asset Type": "Table",
        "Referenced Table": null,
        "GCS Location": null,
        "In-Scope(CCPA)": "No",
        "CCPA Related Attribute": null,
        "In-scope(MHMD)": "No",
        "MHMD Related Attribute": "CID :yes -cid",
        "ML Model Based Table": null,
        "Complex Data": null,
        "To be deleted": null,
        "Progress Status as of May 8 2024": "Completed"
    }
]

Additional Notes:

  • Ensure the DataHub server URL (http://localhost:8080) is correctly configured to point to your DataHub instance.
  • The file path in the YAML configuration should be accessible from the environment where the DataHub ingestion job is running.

Would you like more detailed steps or further assistance on any specific part of the process?

Sources:

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

<@U06TM7M40RK> Can you help with the questions in this thread?


:information_source: We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a :thumbsup: or :thumbsdown: so other folks in the Community know if you’re still looking for help on the topic!

To ingest the provided data using the DataHub UI, you can follow these steps:

  1. Ensure Prerequisites: Make sure you have the Manage Metadata Ingestion and Manage Secrets privileges assigned to your account. These can be granted by a <https://github.com/datahub-project/datahub/blob/master/docs/authorization/policies.md|Platform Policy>.

  2. Navigate to the Ingestion Tab: Go to the ‘Ingestion’ tab in DataHub.

  3. Create a New Ingestion Source:

    • Click on + Create new source.
    • Select a Recipe Template. Since you are ingesting a CSV-like structure, you might need to use a custom template or a file-based ingestion source.
  4. Configure the Recipe:

    • Define the ingestion Recipe in YAML format. Here is an example configuration for ingesting a CSV file using the file source:
    source:
      type: file
      config:
        path: /path/to/your/data.csv
    
    sink:
      type: "datahub-rest"
      config:
        server: "http://localhost:8080"
    
    • If you are using a JSON file, you might need to adjust the source type and configuration accordingly.
  5. Run the Ingestion:

    • Save and run the ingestion source.
    • Monitor the ingestion process and check for any errors in the logs.
  6. Validate the Ingestion:

    • After the ingestion process completes, validate that the data has been ingested correctly by checking the dataset location in DataHub.

For more detailed steps and troubleshooting, you can refer to the <https://github.com/datahub-project/datahub/blob/master/docs/ui-ingestion.md|UI-based Ingestion documentation>.

Would you like more detailed steps or further assistance on any specific part of the process?

Sources:

<@U073WRZ8FM1> Hi Deepak! Please don’t create multiple threads tagging RunLLM for the same question - try limiting your request to one thread at a time