Is there a way to delete a specific Aspect from an Entity?
Hey there! Make sure your message includes the following information if relevant, so we can help more effectively!
- Are you using UI or CLI for ingestion?
- Which DataHub version are you using? (e.g. 0.12.0)
- What data source(s) are you integrating with DataHub? (e.g. BigQuery)
<@U06TM7M40RK> Can you help with the questions in this thread?
We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a
or
so other folks in the Community know if you’re still looking for help on the topic!
To delete a specific aspect from an entity in DataHub, the current support is limited. Here’s a summary of the situation:
- Timeseries Aspects: You can delete timeseries aspects using the CLI. This involves specifying the aspect name and the time range for the data you want to delete.
- Non-Timeseries Aspects: Deleting non-timeseries aspects directly is not supported yet. The recommended approach is to delete the entire entity or use a rollback if the aspect was created by a single ingestion run.
Detailed Steps and Reasoning
-
Deleting Timeseries Aspects:
According to the <https://datahubproject.io/docs/0.13.1/how/delete-metadata|DataHub CLI documentation>, you can delete a range of timeseries aspect data for an entity without deleting the entire entity. Here’s an example command:datahub delete --urn "<my urn>" --aspect <aspect name> --start-time '-30 days' --end-time '-7 days'
This command deletes the specified timeseries aspect data within the given time range.
-
Deleting Non-Timeseries Aspects:
As per the <Slack discussion>, there is currently no support for deleting non-timeseries aspects directly. The backend does not support this yet, and the recommended workaround is to delete the entire entity or use a rollback.The relevant code snippet from the https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/cli/delete_cli.py|delete_cli.py file confirms this limitation:
elif aspect_name: # Hard delete of non-timeseries aspect. # TODO: The backend doesn't support this yet. raise NotImplementedError( "Delete by aspect is not supported yet for non-timeseries aspects. Please delete the full entity or use rollback instead." )
Example CLI Commands
-
Soft Delete an Entity:
datahub delete --urn "<urn>" --soft
-
Hard Delete an Entity:
datahub delete --urn "<urn>" --hard
Additional Information Needed
To provide more specific guidance, please share:
- Are you using the UI or CLI for ingestion?
- Which DataHub version are you using?
- What data source(s) are you integrating with DataHub?
Sources
- https://github.com/datahub-project/datahub/blob/master/metadata-service/openapi-servlet/src/main/java/io/datahubproject/openapi/v2/controller/EntityController.java|EntityController.java
- <https://datahubproject.io/docs/0.13.1/how/delete-metadata|Removing Metadata from DataHub>
- <Slack Discussion on Aspect Deletion>
0 button
0 button
Hint: Mention @runllm in the thread for followups.
is there any reason deleting non-timeseries Aspects hasn’t been implemented? or just a matter of developer bandwidth?
There might be no support using the CLI to (hard) delete a single aspect of an entity, but using the REST API it should be possible…
You can check out the REST API of the demo using this link: https://demo.datahubproject.io/openapi/swagger-ui/index.html (you might have to access the normal demo site first before accessing the API documentation: https://demo.datahubproject.io/)