Can we edit the tags which are assigned along with ingestion (through recipe) in datahub? If No , is it possible to make them editable after assigning then through ingestion?
Hey there! Make sure your message includes the following information if relevant, so we can help more effectively!
- Are you using UI or CLI for ingestion?
- Which DataHub version are you using? (e.g. 0.12.0)
- What data source(s) are you integrating with DataHub? (e.g. BigQuery)
<@U06TM7M40RK> Can you help with the questions in this thread?
Please continue to tag RunLLM (e.g <@U06TM7M40RK>) to have it continue generating responses! We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66|> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a or so other folks in the Community know if you’re still looking for help on the topic and to help us improve the assistant!
To provide you with the most accurate and helpful response, could you please share the following details?
- Are you using the UI or CLI for ingestion?
- Which DataHub version are you using? (e.g., 0.12.0)
- What data source(s) are you integrating with DataHub? (e.g., BigQuery)
This information will help tailor the response to your specific setup.
Sources:
- https://github.com/datahub-project/datahub/blob/master/docs/features/feature-guides/properties.md|https://github.com/datahub-project/datahub/blob/master/docs/features/feature-guides/properties.md
- https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/docs/transformer/dataset_transformer.md|https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/docs/transformer/dataset_transformer.md 0 button 0 button Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK> We are using the UI for ingestions and adding the tags through transformation section in the recipe for any source type
Also we are using 0.13.3 version
Thank you for the details. Based on your setup, here is how you can manage tags assigned during ingestion and make them editable afterward:
Editing Tags Assigned During Ingestion
By default, tags assigned during ingestion can overwrite existing tags. However, you can configure your ingestion recipe to ensure that tags added through the UI are not overwritten by subsequent ingestions. This can be achieved using the simple_add_dataset_tags
transformer with the semantics: PATCH
configuration.
Configuration Example
Here’s how you can configure your ingestion recipe to add tags while preserving existing tags:
transformers:
- type: "simple_add_dataset_tags"
config:
tag_urns:
- "urn:li:tag:NeedsDocumentation"
- "urn:li:tag:Legacy"
semantics: PATCH # This ensures existing tags are preserved
Steps to Implement
- Modify Your Ingestion Recipe: Update your ingestion recipe to include the
simple_add_dataset_tags
transformer with thesemantics: PATCH
configuration. - Run the Ingestion: Execute the ingestion process through the UI as usual.
- Edit Tags in the UI: After ingestion, you can manually add or edit tags through the DataHub UI. The
PATCH
semantics ensure that these manual changes are not overwritten by subsequent ingestions.
Example Recipe
Here’s a complete example of an ingestion recipe that pulls metadata from a source and applies tags while preserving existing ones:
source:
type: your_source_type
config:
# Your source configuration here
transformers:
- type: "simple_add_dataset_tags"
config:
tag_urns:
- "urn:li:tag:NeedsDocumentation"
- "urn:li:tag:Legacy"
semantics: PATCH # Preserve existing tags
sink:
type: "datahub-rest"
config:
server: "http://localhost:8080"
Additional Information
- Behavior of UPSERT: According to <Slack Slack discussion>, the default behavior of ingestion is to overwrite existing tags. Using the
PATCH
semantics in your transformer configuration will prevent this. - Handling Multiple Ingestions: If you have multiple ingestion jobs, ensure that all of them use the
PATCH
semantics to maintain consistency across your DataHub instance.
Would you like more detailed instructions or have any other specific questions?
Sources:
- <https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/docs/transformer/dataset_transformer.md|DataHub Transformer Documentation>
- <Slack Discussion on Tag Overwriting> 0 button 0 button Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK> We are not able to delete the tag added on column or dataset from UI which are assigned through the transformers in the ingestion. How to make then tags can be editable like remove from UI which are assigned through transformers?
To make tags assigned through transformers in the ingestion editable (including the ability to remove them) from the UI, you need to ensure that the tags are not overwritten by subsequent ingestions. This can be achieved by using the simple_add_dataset_tags
transformer with the semantics: PATCH
configuration. This configuration ensures that tags added through the UI are not overwritten by subsequent ingestions.
Configuration Example
Here’s how you can configure your ingestion recipe to add tags while preserving existing tags:
transformers:
- type: "simple_add_dataset_tags"
config:
tag_urns:
- "urn:li:tag:NeedsDocumentation"
- "urn:li:tag:Legacy"
semantics: PATCH # This ensures existing tags are preserved
Steps to Implement
- Modify Your Ingestion Recipe: Update your ingestion recipe to include the
simple_add_dataset_tags
transformer with thesemantics: PATCH
configuration. - Run the Ingestion: Execute the ingestion process through the UI as usual.
- Edit Tags in the UI: After ingestion, you can manually add, edit, or remove tags through the DataHub UI. The
PATCH
semantics ensure that these manual changes are not overwritten by subsequent ingestions.
Example Recipe
Here’s a complete example of an ingestion recipe that pulls metadata from a source and applies tags while preserving existing ones:
source:
type: your_source_type
config:
# Your source configuration here
transformers:
- type: "simple_add_dataset_tags"
config:
tag_urns:
- "urn:li:tag:NeedsDocumentation"
- "urn:li:tag:Legacy"
semantics: PATCH # Preserve existing tags
sink:
type: "datahub-rest"
config:
server: "http://localhost:8080"
Additional Information
- Behavior of UPSERT: According to <Slack Slack discussion>, the default behavior of ingestion is to overwrite existing tags. Using the
PATCH
semantics in your transformer configuration will prevent this. - Handling Multiple Ingestions: If you have multiple ingestion jobs, ensure that all of them use the
PATCH
semantics to maintain consistency across your DataHub instance.
Would you like more detailed instructions or have any other specific questions?
Sources:
- <https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/docs/transformer/dataset_transformer.md|DataHub Transformer Documentation>
- <Slack Discussion on Tag Overwriting> 0 button 0 button Hint: Mention @runllm in the thread for followups.