How to Emit Airflow Task to Airflow Task Lineage in Datahub?

Original Slack Thread

Hi team, How can we emit airflow task -> airflow task lineage in datahub? Is there a support for this. I know inlets and outlets work for airflow task -> dataset lineage.

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)
  1. Custom code
  2. 0.10.5
  3. airflow

Hello Preskha,

I wanted to share an example demonstrating how to create a DataFlow/DataJob and emit it using the Datahub API. In our Airflow plugin, we utilize the same API. You can find the example scripts at the following links:

  1. <https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/library/lineage_job_dataflow_new_api_simple.py|Simple DataFlow Example>
  2. <https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/library/lineage_job_dataflow_new_api_verbose.py|Verbose DataFlow Example>
    For more detailed information, you can refer to the <Lineage | DataHub documentation>. Additionally, there’s a more complex example available https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/library/lineage_emitter_datajob_finegrained.py|here.

Feel free to explore these resources, and let me know if you have any questions.

Best Regards, Avani

Thank you!