Troubleshooting Tag Deletion Issue in DataHub CLI and UI

Original Slack Thread

Hi team!
I’ve deleted a tag via DataHub CLI datahub delete --urn 'urn:li:tag:PII' --hard but I’m still able to see the tag in datasets and columns in the UI. I can even access the tag at <my_host>/tag/urn:li:tag:PII. I have also tried deleting the tag via GraphQL with no luck

  deleteTag(urn: "urn:li:tag:PII")
}```
Is there something I need to refresh? I'm using datahub v0.13.2. Thanks in advance

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

<@U06TM7M40RK> Can you help with the questions in this thread?


:information_source: We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a :thumbsup: or :thumbsdown: so other folks in the Community know if you’re still looking for help on the topic!

To address the issue of the tag still appearing in the UI after deletion, you may need to perform additional steps to ensure the metadata is fully removed and the UI is refreshed. Here’s a summary of the steps you can take:

  1. Check Elasticsearch Indices: Ensure that the Elasticsearch indices are updated to reflect the deletion.
  2. Restart DataHub Services: Sometimes, restarting the DataHub services, especially the GMS (Graph Metadata Service), can help clear any cached data.
  3. Manual Reindexing: Perform a manual reindex to ensure that the search indices are in sync with the database.

Step-by-Step Breakdown

  1. Check Elasticsearch Indices:

    • DataHub uses Elasticsearch for search functionality. If the indices are not updated, the deleted tag might still appear in search results.
    • You can manually drop the Elasticsearch indices related to tags and let DataHub recreate them.
  2. Restart DataHub Services:

    • Restarting the DataHub GMS container can help clear any cached data that might be causing the tag to still appear.
    • According to <Slack discussion>, restarting the GMS container has resolved similar issues for other users.
  3. Manual Reindexing:

    • You can perform a manual reindex to ensure that the search indices are in sync with the database. This can be done using the RestoreIndices functionality as mentioned in the <DataHub Upgrade Docker Image | DataHub documentation>.

Example Commands

Dropping Elasticsearch Indices
You can use the Elasticsearch API to drop the indices manually. Here’s an example using curl:

curl -X DELETE "http://localhost:9200/tagindex_v2"

Restarting DataHub GMS
If you are using Docker, you can restart the GMS container with:

docker restart datahub-gms

Manual Reindexing
Refer to the <DataHub Upgrade Docker Image | DataHub documentation> for detailed steps on how to perform a manual reindex.

Additional Resources

Sources:

The UI will generate a page to represent a tag even if the tag doesn’t actually exist. It is possible that the tag shown on datasets and columns is a cached result or there might be a case-sensitivity difference between that tag and the one being deleted. One method to check for the actual existence of tags is using the OpenAPI interface or curl commands. Screenshots attached. The endpoint will return 204 if the tag exists or 404 if it does not.