How to Delete a Specific Aspect from an Entity in DataHub

Original Slack Thread

Is there a way to delete a specific Aspect from an Entity?

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

<@U06TM7M40RK> Can you help with the questions in this thread?


:information_source: We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a :thumbsup: or :thumbsdown: so other folks in the Community know if you’re still looking for help on the topic!

To delete a specific aspect from an entity in DataHub, the current support is limited. Here’s a summary of the situation:

  1. Timeseries Aspects: You can delete timeseries aspects using the CLI. This involves specifying the aspect name and the time range for the data you want to delete.
  2. Non-Timeseries Aspects: Deleting non-timeseries aspects directly is not supported yet. The recommended approach is to delete the entire entity or use a rollback if the aspect was created by a single ingestion run.

Detailed Steps and Reasoning

  1. Deleting Timeseries Aspects:
    According to the <https://datahubproject.io/docs/0.13.1/how/delete-metadata|DataHub CLI documentation>, you can delete a range of timeseries aspect data for an entity without deleting the entire entity. Here’s an example command:

    datahub delete --urn "<my urn>" --aspect <aspect name> --start-time '-30 days' --end-time '-7 days'
    

    This command deletes the specified timeseries aspect data within the given time range.

  2. Deleting Non-Timeseries Aspects:
    As per the <Slack discussion>, there is currently no support for deleting non-timeseries aspects directly. The backend does not support this yet, and the recommended workaround is to delete the entire entity or use a rollback.

    The relevant code snippet from the https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/cli/delete_cli.py|delete_cli.py file confirms this limitation:

    elif aspect_name:
        # Hard delete of non-timeseries aspect.
        # TODO: The backend doesn't support this yet.
        raise NotImplementedError(
            "Delete by aspect is not supported yet for non-timeseries aspects. Please delete the full entity or use rollback instead."
        )
    

Example CLI Commands

  • Soft Delete an Entity:

    datahub delete --urn "<urn>" --soft
    
  • Hard Delete an Entity:

    datahub delete --urn "<urn>" --hard
    

Additional Information Needed
To provide more specific guidance, please share:

  1. Are you using the UI or CLI for ingestion?
  2. Which DataHub version are you using?
  3. What data source(s) are you integrating with DataHub?

Sources

is there any reason deleting non-timeseries Aspects hasn’t been implemented? or just a matter of developer bandwidth?

There might be no support using the CLI to (hard) delete a single aspect of an entity, but using the REST API it should be possible…![attachment]({‘ID’: ‘F074VA56T42’, ‘EDITABLE’: False, ‘IS_EXTERNAL’: False, ‘USER_ID’: ‘U03GWPR6FSS’, ‘CREATED’: ‘2024-05-23 17:00:49+00:00’, ‘PERMALINK’: ‘Slack’, ‘EXTERNAL_TYPE’: ‘’, ‘TIMESTAMPS’: ‘2024-05-23 17:00:49+00:00’, ‘MODE’: ‘hosted’, ‘DISPLAY_AS_BOT’: False, ‘PRETTY_TYPE’: ‘PNG’, ‘NAME’: ‘image.png’, ‘IS_PUBLIC’: True, ‘PREVIEW_HIGHLIGHT’: None, ‘MIMETYPE’: ‘image/png’, ‘PERMALINK_PUBLIC’: ‘https://slack-files.com/TUMKD5EGJ-F074VA56T42-540880d554’, ‘FILETYPE’: ‘png’, ‘EDIT_LINK’: None, ‘URL_PRIVATE’: ‘Slack’, ‘HAS_RICH_PREVIEW’: False, ‘TITLE’: ‘image.png’, ‘IS_STARRED’: False, ‘PREVIEW_IS_TRUNCATED’: None, ‘URL_PRIVATE_DOWNLOAD’: ‘Slack’, ‘PREVIEW’: None, ‘PUBLIC_URL_SHARED’: False, ‘MESSAGE_TS’: ‘1716483749.221639’, ‘PARENT_MESSAGE_TS’: ‘1716472336.953509’, ‘MESSAGE_CHANNEL_ID’: ‘CUMUWQU66’, ‘_FIVETRAN_DELETED’: False, ‘LINES_MORE’: None, ‘LINES’: None, ‘SIZE’: 33662, ‘_FIVETRAN_SYNCED’: ‘2024-05-26 08:21:46.900000+00:00’})

You can check out the REST API of the demo using this link: https://demo.datahubproject.io/openapi/swagger-ui/index.html (you might have to access the normal demo site first before accessing the API documentation: https://demo.datahubproject.io/)