Troubleshooting Airflow Lineage Emission with Datahub Package Versions

<@U01GZEETMEZ> we were able to upgrade the plugin to 0.11.0.1, but the inlets are still not working for us. We are still on DataHub 0.10.5. The inlets are emitted for the task itself and show up in the Properties tab in the UI, but they do not show up in the Lineage or Runs tabs.

Huh - they’re supposed to show up in properties, lineage tab, and runs tab

What version of airflow are you using?

Airflow is at 2.5.3. The emission logs show the inlets populated for the DataJob, but not the DataProcess. I’m guessing the Properties tab gets them from the DataJob emission and the other tabs get them from the DataProcess emission. Outlets show up in all tabs.

So outlets appear as you’d expect, but inlets are missing from the data processes? This sounds the the original issue, which should’ve been fixed by this PR https://github.com/datahub-project/datahub/pull/8631 - kinda makes me suspect that the airflow plugin may not have gotten upgraded to 0.11.0.1

If not, it’d be useful to see the code you’re using to declare the inlets/outlets

Has anyone else ran into the issue of the --constraint “https://raw.githubusercontent.com/apache/airflow/constraints-2.6.3/constraints-3.10.txt” file for Airflow blocking the datahub package from being installed? I use airflow 2.6.3 and this constraints file, the dependency issue is with typing-extensions causing pip to install acryl-datahub-airflow-plugin==0.10.2.3 which is way lower than I need…


The conflict is caused by:
    acryl-datahub 0.11.0.1 depends on typing-extensions&lt;4.6.0 and &gt;=3.10.0.2; python_version &gt;= "3.8"
    The user requested (constraint) typing-extensions==4.7.1

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit <https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts>```

I’m working on removing this constraint so that we’re installable with typing-extensions >= 4.6

Awesome <@U01GZEETMEZ>! Do you have an ETA for the release with the fix? :pray:

I think this is already in v0.11.0.2

Also as demo’d during the town hall, in the next version, we have an airflow plugin v2 that supports automatically extracting lineage from various airflow operators. New docs: https://datahubproject.io/docs/next/lineage/airflow/

I don’t think it has been released yet:

...
INFO: pip is looking at multiple versions of acryl-datahub-airflow-plugin to determine which version is compatible with other requirements. This could take a while.
ERROR: Cannot install acryl-datahub-airflow-plugin==0.11.0.2 because these package versions have conflicting dependencies.

The conflict is caused by:
    acryl-datahub-airflow-plugin 0.11.0.2 depends on typing-extensions&lt;4.6.0 and &gt;=3.10.0.2; python_version &gt;= "3.8"
    The user requested (constraint) typing-extensions==4.7.1```

This works:
pip install 'acryl-datahub-airflow-plugin[plugin-v2]' --constraint <https://raw.githubusercontent.com/apache/airflow/constraints-2.7.1/constraints-3.11.txt>
And the result is:

acryl-datahub==0.10.2.2
acryl-datahub-airflow-plugin==0.10.2.2```

I see 0.11.0.3 has been released. But there are still conflicts with the Airflow 2.7.1 constraints:

...
INFO: pip is looking at multiple versions of acryl-datahub[datahub-rest] to determine which version is compatible with other requirements. This could take a while.
ERROR: Cannot install acryl-datahub-airflow-plugin==0.11.0.3 and acryl-datahub[datahub-rest]==0.11.0.3 because these package versions have conflicting dependencies.

The conflict is caused by:
    acryl-datahub-airflow-plugin 0.11.0.3 depends on pydantic&gt;=1.5.1
    acryl-datahub[datahub-rest] 0.11.0.3 depends on pydantic!=1.10.3, &lt;2 and &gt;=1.5.1
    The user requested (constraint) pydantic==2.3.0

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit <https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts>```

Replied on the github issue https://github.com/datahub-project/datahub/issues/8892