Hi! I want to clean up a platform and I am planing to use
datahub delete --platform X
Will this command cause problem to the future ingestion with this platform
Hey there! Make sure your message includes the following information if relevant, so we can help more effectively!
- Are you using UI or CLI for ingestion?
- Which DataHub version are you using? (e.g. 0.12.0)
- What data source(s) are you integrating with DataHub? (e.g. BigQuery)
<@U06TM7M40RK> Can you help with the questions in this thread?
We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a or so other folks in the Community know if you’re still looking for help on the topic!
To clean up a platform using the command datahub delete --platform X
, you should be aware of a few important details:
-
Soft vs. Hard Delete: By default, the
datahub delete
command performs a soft delete, which means the entities are marked as removed but not permanently deleted. This can be undone if needed. If you use the--hard
flag, it will permanently delete the entities, which cannot be undone. -
Impact on Future Ingestion:
- Soft Delete: If you perform a soft delete, the entities will be hidden from the UI but still exist in the backend. Future ingestions should not be affected as the entities can be re-ingested and will appear again.
- Hard Delete: If you perform a hard delete, the entities are permanently removed. Future ingestions will need to re-ingest all the metadata from scratch, which could be time-consuming depending on the volume of data.
-
Command Usage: According to the <https://github.com/datahub-project/datahub/blob/master/docs/how/delete-metadata.md|DataHub documentation>, you can use the following command to delete everything from a specific platform:
datahub delete --platform X
-
Confirmation and Dry Run: It is recommended to use the
--dry-run
flag to see what entities will be deleted before actually performing the delete operation. Additionally, the--force
flag can be used to skip confirmation prompts.
Would you like more detailed information on how to perform this operation safely, or do you have any specific concerns about the deletion process?
Sources:
- <https://github.com/datahub-project/datahub/blob/master/docs/how/delete-metadata.md|DataHub Delete Metadata Documentation>
- <https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/cli/delete_cli.py|DataHub CLI Delete Command> 0 button 0 button Hint: Mention @runllm in the thread for followups.
no it does not affect future ingestions. The only thing to consider is if you use soft-delete (default) or hard delete.
if a soft-deleted entity gets re-ingested all modification etc will appear again like it was before deletion.