Troubleshooting Lineage Creation Issue After Airflow and Datahub Upgrade

Original Slack Thread

Hi All
After upgrading Airflow from version 2.5.3 to 2.8.4 (python 3.11.8) and updating Datahub from 0.11 to 0.13.1 with the acryl-datahub-airflow-plugin[plugin-v2]==0.13.1.3 plugin, incoming and outgoing lineages for tasks stopped being created. I use context[‘ti’].task.inlets and context[‘ti’].task.outlets in custom operator. Incoming and outgoing entities are created before try to set lineage. Before the update Airflow and Datahub , everything was functioning correctly.

All information about DAG/task runs is being sent to Datahub with v2, but the lineages are not. If I set [plugin-v1] (in the same config), lineages appear, but, task runs info don’t show in the runs tab in Datahub.
In the Airflow task run logs inlets and outlets entities are present, logs data looks the same for v1 and v2



[2024-05-03, 12:24:46 UTC] {_lineage_core.py:64} INFO - Emitted from Lineage: DataJob(id='OZON_TRAFFIC_SOURCE', urn=DataJobUrn(urn:li:dataJob:(urn:li:dataFlow:(airflow,Ozon_Loader_from_CA,prod),OZON_TRAFFIC_SOURCE)), flow_urn=DataFlowUrn(urn:li:dataFlow:(airflow,Ozon_Loader_from_CA,prod)), name=None, description=None, properties={'depends_on_past': 'False', 'email': "['a???????p']", 'label': "'OZON_TRAFFIC_SOURCE'", 'execution_timeout': 'None', 'sla': 'None', 'task_id': "'OZON_TRAFFIC_SOURCE'", 'trigger_rule': "<TriggerRule.ALL_SUCCESS: 'all_success'>", 'wait_for_downstream': 'False', 'downstream_task_ids': '[]', 'inlets': "[Dataset(platform='ozon', name='CA_vendor_traffic', env='PROD', platform_instance=None)]", 'outlets': "[Dataset(platform='postgres', name='dwh.sa.ozon_ext_traffic_source', env='PROD', platform_instance=None)]"}, url='<http://airflow-dwh.apps.dmz.carely.group/taskinstance/list/?flt1_dag_id_equals=Ozon_Loader_from_CA&_flt_3_task_id=OZON_TRAFFIC_SOURCE>', tags={'loaders', 'ozon'}, owners={'A??????v'}, group_owners=set(), inlets=[DatasetUrn(urn:li:dataset:(urn:li:dataPlatform:ozon,CA_vendor_traffic,PROD))], outlets=[DatasetUrn(urn:li:dataset:(urn:li:dataPlatform:postgres,dwh.sa.ozon_ext_traffic_source,PROD))], fine_grained_lineages=[], upstream_urns=[])```![attachment](https://files.slack.com/files-pri/TUMKD5EGJ-F071PCV532A/image.png)![attachment](https://files.slack.com/files-pri/TUMKD5EGJ-F071FF5E1EK/image.png)

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

And when using plugin v1 I have airflow error
Broken plugin: [datahub_airflow_plugin.datahub_plugin] No module named 'cattr'![attachment]({‘ID’: ‘F071W25J33M’, ‘EDITABLE’: False, ‘IS_EXTERNAL’: False, ‘USER_ID’: ‘U052NDW21LG’, ‘CREATED’: ‘2024-05-03 13:35:10+00:00’, ‘PERMALINK’: ‘Slack’, ‘EXTERNAL_TYPE’: ‘’, ‘TIMESTAMPS’: ‘2024-05-03 13:35:10+00:00’, ‘MODE’: ‘hosted’, ‘DISPLAY_AS_BOT’: False, ‘PRETTY_TYPE’: ‘PNG’, ‘NAME’: ‘image.png’, ‘IS_PUBLIC’: True, ‘PREVIEW_HIGHLIGHT’: None, ‘MIMETYPE’: ‘image/png’, ‘PERMALINK_PUBLIC’: ‘https://slack-files.com/TUMKD5EGJ-F071W25J33M-bb23b03775’, ‘FILETYPE’: ‘png’, ‘EDIT_LINK’: None, ‘URL_PRIVATE’: ‘Slack’, ‘HAS_RICH_PREVIEW’: False, ‘TITLE’: ‘image.png’, ‘IS_STARRED’: False, ‘PREVIEW_IS_TRUNCATED’: None, ‘URL_PRIVATE_DOWNLOAD’: ‘Slack’, ‘PREVIEW’: None, ‘PUBLIC_URL_SHARED’: False, ‘MESSAGE_TS’: ‘1714743329.047789’, ‘PARENT_MESSAGE_TS’: ‘1714742458.953079’, ‘MESSAGE_CHANNEL_ID’: ‘CUMUWQU66’, ‘_FIVETRAN_DELETED’: False, ‘LINES_MORE’: None, ‘LINES’: None, ‘SIZE’: 18445, ‘_FIVETRAN_SYNCED’: ‘2024-05-05 08:22:31.152000+00:00’})

We’re aware of an issue specifically with Airflow 2.8 and python 3.11. Seems like downgrading either one fixes the issue, but we’re still looking into it

try downgrading python to 3.10 and next 3.9 - same issue