Ingesting Column Descriptions into DataHub from a File

Original Slack Thread

<@U06TM7M40RK> is it possible to ingest column descriptions from a file, or do I have to enter them manually?

Yes, it is possible to ingest column descriptions from a file into DataHub. You can use the CSV Enricher to bulk upload metadata such as descriptions, tags, glossary terms, and owners. This process involves preparing a CSV file with the necessary metadata and using the DataHub CLI to ingest the file.

According to the <https://datahubproject.io/docs/0.13.1/generated/ingestion/sources/csv|CSV Enricher documentation>, you can create a CSV file with columns like resource, subresource, description, etc., and then use a YAML recipe to configure the ingestion process. Here is a brief overview of the steps:

  1. Prepare the CSV File: Create a CSV file with the necessary metadata. The CSV should have a header row and columns such as resource, subresource, description, etc.

    Example CSV:

    resource,subresource,description
    "urn:li:dataset:(urn:li:dataPlatform:postgres,post.user.table1,PROD)",,"Table 1 description"
    "urn:li:dataset:(urn:li:dataPlatform:postgres,post.user.table1,PROD)",column1,"Column 1 description"
    
  2. Create a Recipe File: Write a YAML configuration file for the ingestion process.

    Example Recipe:

    source:
      type: csv-enricher
      config:
        filename: ./path/to/your/file.csv
        write_semantics: PATCH  # or OVERRIDE
    
  3. Run the Ingestion: Use the DataHub CLI to ingest the CSV file.

    datahub ingest -c path/to/your/recipe.yaml
    

This process will update the descriptions for the specified tables and columns in DataHub.

Would you like more detailed steps or examples on any part of this process?

Sources:

This link is broken: https://datahubproject.io/docs/0.13.1/generated/ingestion/sources/csv