Ingesting a CSV file into Datahub using version 0.13 and configuring the ingestion from the Datahub interface

user-1 · April 22, 2024, 12:02am

Hello everyone. I want to ingest a CSV file located into my computer inside Datahub. What are the steps to follow and how to properly configure the ingestion from Datahub interface? I’m using version 0.13, and the final goal would be to build a Data Catalog over BigQuery, but for the sake of this message I just want to ingest a CSV file from my PC into Datahub

datahub_team · April 22, 2024, 12:02am

Hey there! Make sure your message includes the following information if relevant, so we can help more effectively!

Are you using UI or CLI for ingestion?
Which DataHub version are you using? (e.g. 0.12.0)
What data source(s) are you integrating with DataHub? (e.g. BigQuery)

datahub_team · April 22, 2024, 12:02am

It will probably be easier to directly ingest from bigquery using our bigquery https://datahubproject.io/docs/next/generated/ingestion/sources/bigquery/|source than to ingest from a csv. CSV files don’t contain as rich metadata as databases / data warehouses, so we’d have to do type inference on the data to be able to ingest them properly. We currently don’t have an ingestion source that does this on csv files.

user-3 · April 22, 2024, 12:02am

Actually, you can use https://datahubproject.io/docs/next/generated/ingestion/sources/s3/ to ingest a csv file. Despite the source name, you can use this one to ingest local files. Specify as follows:

  type: s3
  config:
    path_specs:
      - include: "./relative/directory"```
to ingest all files in a directory. Or I believe you can specify a single CSV file directly.

user-1 · April 22, 2024, 12:02am

Thank you Andrew! Yes, I know that ingesting from BigQuery is easier and more powerful, I just wanted to try every possibility

Topic		Replies	Views
Ingesting Data Using DataHub UI Guidance - Step-by-Step Instructions ingestion	4	272	June 17, 2024
Ingesting Dashboards into Datahub using CSV: Questions for Effective Help ingestion	1	60	April 8, 2024
Ingesting a Google Sheets CSV File into DataHub using the UI ingestion	2	169	May 27, 2024
Creating New Datasets with CSV Ingestion using DataHub UI in v0.13.0 ingestion	1	95	April 15, 2024
Creating a Custom Ingestion Source in DataHub with CSV Data ingestion	5	60	August 12, 2024

Ingesting a CSV file into Datahub using version 0.13 and configuring the ingestion from the Datahub interface

Related topics