Understanding Data Lineage Storage and Processing in DataHub

Original Slack Thread

Hi everyone, How is the data lineage information stored in datahub? Store to database (Mysql? neo4j)? How is it reflected in the source code? I don’t know which part of the source code to look at

Hi flash. Personally, i think there are several places in the source code need to look at regading how the lineage data to be stored and processed in datahub.
• metadata-ingestion module where included all the supported sources that can be ingested into datahub, the lineage related data such as table-linage or column-lineage are processed in that part from my understanding. The core componet used in this module is SqlAIChemry which is the bases for other SQL based sources(example class is “SQLAlchemySource”).
• metadata-io module where has different implenmenation for the JAVA interface ‘GraphService’ that located under the metadata-services module. So depenting on different implenmennation, the store layer of the lineage data can be ES or Neo4j…
That’s what i knew currently . Hope this helps.

<@U05B3RG1PD1> Your answer is very helpful. Thank you very much