Managing Deleted DAG Metadata in DataHub with Airflow Integration

Original Slack Thread

Hi team,
I integrated Airflow to Datahub.
Question: How we can delete the metadata for a DAG which is removed on Airflow? Or it’s removed automatic.

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

<@U06TM7M40RK> Can you help with the questions in this thread?


:information_source: We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a :thumbsup: or :thumbsdown: so other folks in the Community know if you’re still looking for help on the topic!

To help you effectively, I need a bit more information:

  1. Are you using the UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g., 0.12.0)?
  3. What data source(s) are you integrating with DataHub (e.g., BigQuery)?

Regarding your question about deleting metadata for a DAG that has been removed in Airflow, based on the available data sources, there isn’t an automatic mechanism to delete metadata for removed DAGs in Airflow. You would need to manually delete the metadata using the DataHub CLI.

Here are the steps you can follow to delete the metadata for a specific DAG:

  1. Identify the URN: First, identify the URN of the DAG you want to delete. You can list all the entities and find the specific URN for your DAG.

  2. Delete the Metadata: Use the datahub delete command to delete the metadata associated with the DAG. You can perform a soft delete or a hard delete depending on your requirements.

For example, to perform a soft delete:

datahub delete --urn "<urn-of-your-dag>" --soft

To perform a hard delete:

datahub delete --urn "<urn-of-your-dag>" --hard

For more detailed information, you can refer to the <https://datahubproject.io/docs/0.13.1/how/delete-metadata|DataHub documentation on deleting metadata>.

Would you like more detailed steps or information on a specific part of the process?

Sources:

<@U06TM7M40RK>

  1. I’m using UI
  2. DataHub version , 0.12.0)
  3. Datasource: Airflow

<@U06TM7M40RK> here.

Thank you for the additional information.

Since you are using the UI for ingestion and DataHub version 0.12.0 with Airflow as the data source, here are the steps to manually delete the metadata for a DAG that has been removed in Airflow:

  1. Identify the URN: You need to identify the URN of the DAG you want to delete. This can be done by listing all the entities in DataHub and finding the specific URN for your DAG.

  2. Delete the Metadata: Although you are using the UI for ingestion, the deletion process requires using the DataHub CLI. You can perform a soft delete or a hard delete depending on your requirements.

For example, to perform a soft delete:

datahub delete --urn "<urn-of-your-dag>" --soft

To perform a hard delete:

datahub delete --urn "<urn-of-your-dag>" --hard

Unfortunately, there isn’t an automatic mechanism to delete metadata for removed DAGs in Airflow. You would need to manually delete the metadata using the DataHub CLI.

For more detailed information, you can refer to the <https://datahubproject.io/docs/0.13.1/how/delete-metadata|DataHub documentation on deleting metadata>.

Would you like more detailed steps or information on a specific part of the process?

Sources: