I am connected Datahub with airflow using the datahub rest api plugin in airflow. The deleted DAGs are still exist in datahub, how can I make the DAGs consistent in the both platform?
Hey there!
<@U06TM7M40RK> Can you help with the questions in this thread?
Please continue to tag RunLLM (e.g <@U06TM7M40RK>) to have it continue generating responses!
To ensure that deleted DAGs in Airflow are also removed from DataHub, you can follow a cleanup process. According to the <Airflow Integration | DataHub documentation>, there is no automatic mechanism to delete metadata for removed DAGs in Airflow. However, you can manually clean up obsolete pipelines and tasks from DataHub.
Here are the steps you can follow:
-
Create a Cleanup DAG: You can create a DAG in Airflow specifically for cleaning up obsolete data in DataHub. This DAG will identify and remove pipelines and tasks that no longer exist in Airflow but are still present in DataHub.
from datetime import datetime from airflow import DAG from airflow.operators.bash import BashOperator with DAG( "Datahub_Cleanup", start_date=datetime(2024, 1, 1), schedule_interval=None, catchup=False, ) as dag: task = BashOperator( task_id="cleanup_obsolete_data", dag=dag, bash_command="echo 'cleaning up the obsolete data from datahub'", )
This DAG, once ingested, will remove all obsolete pipelines and tasks from DataHub based on the cluster value set in the
airflow.cfg
. -
Manual Deletion: If you prefer a manual approach, you can use the DataHub CLI to delete the metadata associated with the DAGs. First, identify the URNs of the DAGs you want to delete, and then use the
datahub delete
command to perform a soft or hard delete.For example, to perform a soft delete:
datahub delete --urn "<urn-of-your-dag>" --soft
To perform a hard delete:
datahub delete --urn "<urn-of-your-dag>" --hard
Would you like more detailed steps or information on a specific part of the process?
Sources:
- <Airflow Integration | DataHub documentation on using inlets and outlets> 0 button 0 button Hint: Mention @runllm in the thread for followups.
Hi, how did my answer work out for you? Need any more assistance? You can also tag <@U07FV3UHE4D> to let your fellow community members know you’re still looking for guidance.
Hi ! Did this interaction resolve your issue or do you still require assistance ?
Please tag @ RunLLM if you still need further assistance !
Or tag @Datahub Community Support if you require a person to take a look
Hope this helps ! ><