Ingesting a Google Sheets CSV File into DataHub using the UI

Original Slack Thread

Hello! A newbie here. How to ingest a Google Sheets file (published to web as CSV) into datahub?

  1. I’m using UI for ingestions;
  2. v0.13.2
  3. Google Sheets file as CSV

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

<@U06TM7M40RK> Can you help with the questions in this thread?


:information_source: We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a :thumbsup: or :thumbsdown: so other folks in the Community know if you’re still looking for help on the topic!

To ingest a Google Sheets file (published to the web as CSV) into DataHub using the UI, follow these steps:

Summary

  1. Publish Google Sheets as CSV: Ensure your Google Sheets file is published to the web in CSV format.
  2. Create a New Source in DataHub UI: Use the DataHub UI to create a new ingestion source for CSV.
  3. Configure the Source: Provide the URL of the published CSV file in the configuration.

Step-by-Step Instructions

  1. Publish Google Sheets as CSV:

    • Open your Google Sheets document.
    • Go to File > Share > Publish to the web.
    • Choose Comma-separated values (.csv) as the format.
    • Copy the generated URL.
  2. Create a New Source in DataHub UI:

    • Navigate to the DataHub UI and go to the Ingestion tab.
    • Click on + Create new source at the top left of the table.
  3. Configure the Source:

    • Scroll to the bottom of the sources list and select CSV.
    • Fill out the form with the necessary details:
      • Source Type: CSV
      • URL: Paste the URL of the published Google Sheets CSV file.
    • Save and run the ingestion.

Example Configuration
Here is an example of how the configuration might look:

source:
  type: csv
  config:
    url: "https://docs.google.com/spreadsheets/d/e/2PACX-1vQ1Y2Y3Z4A5B6C7D8E9F0G1H2I3J4K5L6M7N8O9P0Q1R2S3T4U5V6W7X8Y9Z0/pub?output=csv"

Additional Notes

  • Ensure that the Google Sheets file is accessible to anyone with the link.
  • The CSV ingestion in DataHub is primarily for adding metadata, like glossary terms and tags, to existing entities.

References

Sources