Exploring Column-Level Lineage Support in Snowflake with Python SDK and DataHubGraph Client

Original Slack Thread

Hi everyone, I’m very interested in the column-level lineage support for Snowflake. I read this <Medium post> and I was wondering if there is a way to use the python SDK to derive the column lineage from a string (containing sql queries) and have the results be consumable by downstream applications? I’m thinking of a method or object that takes in a list of sql queries and returns the column lineage information, similar to https://github.com/reata/sqllineage|sqllineage which is mentioned in the blog post.

the DataHubGraph client has a parse_sql_lineage method - is that what you’re looking for?

Hey Harshal, yes that is what I’m looking for, but I can’t find any documentation online regarding that method

It should be in the docs here: https://datahubproject.io/docs/next/python-sdk/clients/

Note that it is mostly a pass-through to an underlying method, which is documented here https://github.com/datahub-project/datahub/blob/53c7790f9aa56eeb6695d3fbf602b3b84a7283e4/metadata-ingestion/src/datahub/utilities/sqlglot_lineage.py#L1162