Using Airflow OpenLineage Plugin for Lineage Information: A Comparison to DataHub Airflow Plugin

Original Slack Thread

Hi Team, Couple of questions:

  1. Can i use Airflow open lineage plugin for publishing lineage information to datahub?
  2. How Airflow datahub plugin stacks up against the open lineage plugin?

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

<@U01GZEETMEZ> can you confirm? To the best of my knowledge:

  1. Yes, you can use the Airflow OpenLineage plugin for publishing lineage information to DataHub
  2. The DataHub Airflow plugin specifically caters to DataHub’s metadata model, providing features like automatic lineage extraction, task run information, and the ability to manually annotate lineage

<@U01GZEETMEZ> can you confirm on above? If publishing is possible can you share any documentation for the same. Also what are goodies which will be missed if i start using Airflow open lineage plugin then using datahub airflow plugin.

The biggest differences are that

  1. the datahub airflow plugin has significantly better automatic lineage extraction
  2. it does a better job at representing dags and tasks in ways that feel native in the datahub UI
    Docs are here https://datahubproject.io/docs/next/lineage/airflow