Understanding Classification and Terminology of Lineage Types

Original Slack Thread

Hi, team!
I want to understand general classification of lineage types. At the page <https://datahubproject.io/docs/next/generated/lineage/lineage-feature-guide/|About Datahub Lineage> I see sporadic information about lineage classification. For example, we can speak about table-level lineage or column-level lineage. It is clear. But also I see such defenition as ‘incremental_lineage’, or inside Types of Lineage Connections table in A.K.A column - ‘Pipeline Lineage’. Sometimes it is tricky to understand, what exactly definition means. Is there some general classification of lineage or terms of convention about it?

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

Unfortunately I’m not sure we have a doc site which is dedicate to this but I agree we should have one.
In the meantime if you list here what is not clear I can answer your questions.

Will be great, if we can fill gaps in this table and give short description of lineage types:

<@UV14447EU> Because, I believe, all of them can be named as pipeline lineage (if lineage exists - all it`s parts are elements of this pipeline).

I don’t think we have a name for each of these, and I don’t know why we marked Pipeline Lineage as a special case.
The above list shows how you can set lineage edges between different entity types.
There is one additional concept incremental lineage which is supported in some sources.
Incremental lineage means you don’t set all the lineage at once but you can add lineage edges incrementally. If you set lineage in a non-incremental way that will mean always the latest ingested lineage will be the lineage and the old one will be replaced.

<@UV14447EU> Hi!
Why in schema_classes we have only UpstreamClass but don`t have ‘DownstreamClass’ for lineage?

I don’t know tbh

Who can I ask? I want to clarify, can I use this class to build both of lineage types (upstream and downstream) or only one (according to name).

With the UpstreamClass you should be able to model you dependency hiearachy as wherever you set an Upstream there the entity where you set the Upstream will be the Downstream of that entity.
This way it is less complex to set and less way to set inconsistent lineage (like setting one side an upstream but fail to set as downstream on the other side)

Yes, it is pretty clear and logically! Tank you!