Capturing Lineage of Actual Data in DataHub

Original Slack Thread

Hi Team, newbie here. Can we use datahub to capture the lineage of the actual data/record instead of just the metadata level lineage?

DataHub generally doesn’t store your actual data so we have no way of representing this. For large data sets, this would also be a massive amount of data, if we were tracking which DDL created each data point. However, we do store column-level lineage, so if you’re curious how each of the columns of a table were constructed, that can be modeled in datahub.

