Hello, how to clean up datahub from droped tables <@U06TM7M40RK>
To clean up DataHub from dropped tables, you can use stateful ingestion to automatically handle the removal of stale entities. Here are the steps you can follow:
-
Enable Stateful Ingestion: Configure your ingestion recipe to enable stateful ingestion. This will allow DataHub to automatically soft-delete tables and views that were present in a previous run but are missing in the current run.
-
Configuration Example: Below is a sample configuration for enabling stateful ingestion in a Snowflake source. You can adapt this configuration to your specific data source.
source:
type: "snowflake"
config:
username: <user_name>
password: <password>
host_port: <host_port>
warehouse: <ware_house>
role: <role>
include_tables: True
include_views: True
# Rest of the source specific params ...
## Stateful Ingestion config ##
stateful_ingestion:
enabled: True # False by default
remove_stale_metadata: True # default value
## Default state_provider configuration ##
# state_provider:
# type: "datahub" # default value
# This section is needed if the pipeline-level `datahub_api` is not configured.
# config: # default value
# datahub_api:
# server: "http://localhost:8080"
# The pipeline_name is mandatory for stateful ingestion and the state is tied to this.
# If this is changed after using with stateful ingestion, the previous state will not be available to the next run.
pipeline_name: "my_snowflake_pipeline_1"
# Pipeline-level datahub_api configuration.
datahub_api: # Optional. But if provided, this config will be used by the "datahub" ingestion state provider.
server: "http://localhost:8080"
sink:
type: "datahub-rest"
config:
server: 'http://localhost:8080'
-
Run the Ingestion: Execute the ingestion job with the above configuration. This will ensure that any tables or views that were present in the last successful run but are missing in the current run will be soft-deleted.
-
Manual Cleanup (if needed): If you need to manually delete metadata, you can use the DataHub CLI. For example, to soft delete a specific entity, you can use the following command:
datahub delete --urn "urn:li:dataset:(urn:li:dataPlatform:snowflake,my_table,PROD)" --soft
For more detailed information, you can refer to the <https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/docs/dev_guides/stateful.md|stateful ingestion documentation> and the <https://github.com/datahub-project/datahub/blob/master/docs/how/delete-metadata.md|metadata deletion guide>.
Would you like more details on any specific part of this process?
Sources:
Hey there! Make sure your message includes the following information if relevant, so we can help more effectively!
- Are you using UI or CLI for ingestion?
- Which DataHub version are you using? (e.g. 0.12.0)
- What data source(s) are you integrating with DataHub? (e.g. BigQuery)
<@U06TM7M40RK> Can you help with the questions in this thread?
We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a or so other folks in the Community know if you’re still looking for help on the topic!
Sure, I’d be happy to help! To assist you more effectively, could you please provide the following information:
- Are you using the UI or CLI for ingestion?
- Which DataHub version are you using? (e.g., 0.12.0)
- What data source(s) are you integrating with DataHub? (e.g., BigQuery, Snowflake, PowerBI, etc.)
This information will help me provide you with the most relevant guidance and resources.
Sources:
- https://datahubproject.io/docs/0.13.1/quick-ingestion-guides/snowflake/configuration|https://datahubproject.io/docs/0.13.1/quick-ingestion-guides/snowflake/configuration
- https://datahubproject.io/docs/0.13.1/quick-ingestion-guides/powerbi/configuration|https://datahubproject.io/docs/0.13.1/quick-ingestion-guides/powerbi/configuration 0 button 0 button Hint: Mention @runllm in the thread for followups.
- <https://github.com/datahub-project/datahub/blob/master/docs/how/delete-metadata.md|Metadata Deletion Guide> 0 button 0 button Hint: Mention @runllm in the thread for followups.
how to automate this process <@U06TM7M40RK>
Sure thing! I’m looking through the available data to answer your question. Hold on a second…