Hi, Can anyone that has used Spark Push based agent help understand if its possible to get column level lineage information, i dont find any information related to it in the documentation either
Hey there! Make sure your message includes the following information if relevant, so we can help more effectively!
- Are you using UI or CLI for ingestion?
- Which DataHub version are you using? (e.g. 0.12.0)
- What data source(s) are you integrating with DataHub? (e.g. BigQuery)
<@U06TM7M40RK> Can you help with the questions in this thread?
We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a
or
so other folks in the Community know if you’re still looking for help on the topic!
To help you better, could you please provide the following information:
- Are you using the UI or CLI for ingestion?
- Which DataHub version are you using? (e.g., 0.12.0)
- What data source(s) are you integrating with DataHub? (e.g., BigQuery)
Regarding your question about obtaining column-level lineage information using the Spark Push-based agent, the documentation does not explicitly mention support for column-level lineage with the Spark Push-based agent. However, there are some relevant details that might help:
-
Column-Level Lineage in General: According to the <Lineage | DataHub Lineage API documentation>, column-level lineage can be added programmatically using the API. This involves creating fine-grained lineage objects and emitting them to the DataHub REST API.
-
Spark Lineage: There have been discussions in the community about capturing column-level lineage in Spark jobs. For instance, a <Slack thread> mentions that a newer version of the Spark Lineage library, which captures column-level lineage, is available. You can check the setup instructions for this new plugin https://github.com/datahub-project/datahub/tree/master/metadata-integration/java/spark-lineage-beta|here.
Would you like more detailed steps on how to set up and use the new Spark Lineage library to capture column-level lineage?
Sources:
- <Lineage | DataHub Lineage API documentation>
- <Slack thread on Spark Lineage>
- https://github.com/datahub-project/datahub/tree/master/metadata-integration/java/spark-lineage-beta|here
0 button
0 button
Hint: Mention @runllm in the thread for followups.
- CLI push based ingestion
- 0.13.1
- Datasource - Spark jobs on Databricks