Ingesting XLSX and CSV Files in DataHub

Original Slack Thread

<@U06TM7M40RK> Can i ingest xlsx and CSV

To ingest CSV files into DataHub, you can use the CSV Enricher source. This allows you to enrich entities already ingested into DataHub by applying metadata such as glossary terms, tags, descriptions, and owners. The CSV Enricher is designed to work with CSV files, and you can find more details in the <https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/csv-enricher|CSV Enricher documentation>.

For XLSX files, there is no direct mention of support for XLSX ingestion in the provided data sources. DataHub primarily supports CSV files for metadata enrichment and ingestion. If you have XLSX files, you might need to convert them to CSV format before ingestion.

Would you like more detailed steps on how to ingest CSV files, or do you have any other questions related to DataHub ingestion?

Sources:

<@U06TM7M40RK> see below the error

[2024-11-01 19:23:39,475] ERROR    {datahub.ingestion.run.pipeline:739} - Pipeline Error: Ingestion pipeline raised an unexpected exception!Pipeline Error: Ingestion pipeline raised an unexpected exception!: 'resource'```

The error message KeyError: 'resource' indicates that the ingestion process is attempting to access a key named “resource” in a dictionary (likely a row from your CSV file), but this key does not exist. This issue is specifically occurring in the get_workunits_internal method of the csv_enricher.py file.

Here are some potential causes and steps to resolve the issue:

  1. CSV File Format: Ensure that your CSV file includes a column named “resource”. This column is mandatory for the csv-enricher to function correctly. Here’s an example of a correctly formatted CSV file:

    resource,subresource,glossary_terms,tags,owners,ownership_type,description,domain
    "urn:li:dataset:(urn:li:dataPlatform:snowflake,datahub.growth.users,PROD)",,[urn:li:glossaryTerm:Users],[urn:li:tag:HighQuality],[urn:li:corpuser:lfoe|urn:li:corpuser:jdoe],TECHNICAL_OWNER,description for users table,urn:li:domain:Engineering
    
  2. CSV File Encoding: Sometimes, CSV files saved from different editors (like Excel) might have encoding issues or hidden characters that can cause parsing problems. Ensure your CSV file is saved with UTF-8 encoding.

  3. CSV Configuration in Recipe: Verify that your ingestion recipe correctly points to the CSV file and uses the correct delimiters. Here is an example of a correct configuration:

    source:
      type: "csv-enricher"
      config:
        filename: /path/to/your/csvfile.csv
        write_semantics: PATCH
        delimiter: ","
    

If you need further assistance or more detailed steps on any of these points, please let me know!

Sources:

  • <Slack conversation on CSV upload issues> :+1: 0 button :-1: 0 button :bulb: Hint: Mention @runllm in the thread for followups.

Hi :slight_smile: ! Did this interaction resolve your issue or do you still require assistance ?

Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look :slight_smile:

Hope this helps ! ><