<@U06TM7M40RK> Does Airflow platform support platform instance? What is a best way to ingest lineage from multiple airflow instances?
It seems likely that Airflow does not natively support the concept of “platform instances” in the same way that some other platforms do. However, you can manage multiple Airflow instances in DataHub by using the cluster
configuration to differentiate between them. This approach allows you to separate metadata from different Airflow environments within DataHub.
To ingest lineage from multiple Airflow instances, you can follow these steps:
-
Install the DataHub Airflow Plugin: Depending on your Airflow version, you can use either the v1 or v2 plugin. For Airflow 2.3+ and Python 3.8+, the v2 plugin is recommended. You can install it using:
pip install 'acryl-datahub-airflow-plugin[plugin-v2]'
-
Set Up a DataHub Connection in Airflow: Configure an Airflow connection for DataHub using either REST or Kafka-based connections. For example, for REST-based:
airflow connections add --conn-type 'datahub_rest' 'datahub_rest_default' --conn-host 'http://datahub-gms:8080' --conn-password '<optional datahub auth token>'
-
Configure the Plugin: Update the
airflow.cfg
file to enable the plugin and set the necessary configurations. Use thecluster
parameter to differentiate between different Airflow instances:[datahub] enabled = true conn_id = datahub_rest_default cluster = <unique_cluster_name> # Use a unique name for each Airflow instance
-
Validate the Plugin: Ensure the plugin is loaded in Airflow by checking the Admin → Plugins menu. Run an Airflow DAG and check the task logs for DataHub-related log messages like “Emitting DataHub …”.
-
Automatic Lineage Extraction: The v2 plugin supports automatic lineage extraction for various operators. Ensure that
enable_extractors
is set to true in the configuration. -
Manual Lineage Annotation: For operators that do not support automatic lineage extraction, you can manually annotate lineage by setting inlets and outlets on your Airflow operators.
For more detailed guidance, you can refer to the <https://datahubproject.io/docs/0.14.0/lineage/airflow|DataHub Airflow documentation>.
Would you like more detailed steps or examples on any of these points?
Sources:
- <https://datahubproject.io/docs/0.14.0/lineage/airflow|DataHub Airflow documentation>
0 button
0 button
Hint: Mention @runllm in the thread for followups.
Hi <@U07EH72TCBU>, how did my answer work out for you? Need any more assistance? You can also tag <@U07FV3UHE4D> to let your fellow community members know you’re still looking for guidance.
Hi ! Did this interaction resolve your issue or do you still require assistance ?
Please tag @ RunLLM if you still need further assistance !
Or tag @Datahub Community Support if you require a person to take a look
Hope this helps ! ><