Retrieving Column-Level Lineage Info from DataHub using Python

Original Slack Thread

Hello everyone! I was trying to read the column-level-lineage from DataHub (which I can check the column lineages at console) via python but the doc below does not provide column level lineage info but just table lineage. Is there any reference that I can get column lineage info via python not the console?

https://datahubproject.io/docs/next/api/tutorials/lineage/#read-lineage

Following is my environment.

  1. DataHub version == 0.12.1.0
  2. data source == BigQuery

Yeah, there’s several ways:

• You can fetch theupstreamLineage⁣aspect usingDataHubGraph.get_aspect(urn, UpstreamLineageClass)⁣, then look at thefineGrainedLineages⁣property, to get all column-level upstreams of that entity. If you need downstreams, you’d have to query for all downstream tables, then fetch each downstream table’supstreamLineage⁣aspect.
• You can use the graphql querysearchAcrossLineage⁣as described in your linked article, except provide aschemaField⁣urn to reference a column, to get that column’s upstreams / downstreams
Will either of these options work for your use case?

Thanks <@U04N9PYJBEW>!!
First technique fits my case. It was very helpful :slight_smile: