Ingesting Lineage from BTEQ Queries and Stored Procedures in Datahub

Original Slack Thread

Hi there, I am using Datahub v0.13.2. I can ingest teradata using UI, but wondering if it possible to ingest lineage from BTEQ queries and StoredProc?

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

<@U06TM7M40RK> Can you help with the questions in this thread?


:information_source: We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a :thumbsup: or :thumbsdown: so other folks in the Community know if you’re still looking for help on the topic!

To help you more effectively, could you please provide the following information?

  1. Are you using the UI or CLI for ingestion?
  2. Which DataHub version are you using? (You mentioned v0.13.2, please confirm.)
  3. What data source(s) are you integrating with DataHub? (e.g., Teradata, BigQuery)

Based on your initial query, it seems you are interested in ingesting lineage from BTEQ queries and Stored Procedures in Teradata. Here’s some relevant information:

  1. Teradata Ingestion: According to the <https://datahubproject.io/docs/0.13.1/generated/ingestion/sources/teradata|DataHub documentation for Teradata>, you can ingest metadata from Teradata, including table-level lineage.
  2. SQL Queries Source: For lineage extraction from SQL queries, you can use the <https://datahubproject.io/docs/next/generated/ingestion/sources/sql-queries/|SQL Queries Source>. This source can help in generating column-level lineage from SQL queries.
  3. Stored Procedures: Currently, there isn’t a direct method to ingest lineage from stored procedures using a file-based approach. However, you can model stored procedures as data jobs and create lineage using the DataHub SDK. An example can be found https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/library/lineage_dataset_job_dataset.py|here.

Would you like more detailed steps on how to set this up?

Sources: