Hello, when running the ingestion recipe in , none of the ingested items are tagged with the domain specified in the Transformer. Could anybody help me troubleshoot? thank you.
Hey there!
<@U06TM7M40RK> Can you help with the questions in this thread?
Please continue to tag RunLLM (e.g <@U06TM7M40RK>) to have it continue generating responses!
To troubleshoot the issue of your ingested items not being tagged with the specified domain in the transformer, let’s go through some potential steps and configurations based on the provided data sources.
Steps to Troubleshoot and Resolve
-
Verify Ingestion Recipe Configuration:
Ensure that your ingestion recipe is correctly configured. Here is an example configuration for using thepattern_add_dataset_domain
transformer:pipeline_name: "my_dbt_pipeline" # This is mandatory when stateful ingestion is enabled source: type: dbt config: manifest_path: "<path_to_your_manifest.json>" catalog_path: "<path_to_your_catalog.json>" sources_path: "<path_to_your_sources.json>" stateful_ingestion: enabled: true remove_stale_metadata: true # This will help in removing metadata of deleted models transformers: - type: "pattern_add_dataset_domain" config: semantics: OVERWRITE domain_pattern: rules: 'urn:li:dataset:\(urn:li:dataPlatform:postgres,postgres\.public\.n.*': ["hr"] 'urn:li:dataset:\(urn:li:dataPlatform:postgres,postgres\.public\.t.*': ["urn:li:domain:finance"] sink: type: "datahub-rest" config: server: 'http://localhost:8080'
-
Check for Entity Deletion:
If the entities were previously deleted, they might be marked as deleted in DataHub. You can undelete these entities using the DataHub Python library. Here is an example function to undelete an entity:from datahub.emitter.mce_builder import make_dataset_urn from datahub.emitter.rest_emitter import DatahubRestEmitter from datahub.metadata.schema_classes import StatusClass, MetadataChangeProposalWrapper def undelete_entity(entity_urn: str): emitter = DatahubRestEmitter(gms_server='http://localhost:8080') mcp_wrapper = MetadataChangeProposalWrapper( entityUrn=entity_urn, aspect=StatusClass(removed=False), changeType="UPSERT", ) emitter.emit(mcp_wrapper) print(f"Entity {entity_urn} has been unmarked for deletion.") # Example usage undelete_entity("urn:li:dataset:(urn:li:dataPlatform:dbt,example_dataset,PROD)")
-
Run DataHub CLI with Debug Logs:
Running the DataHub CLI with debug logs enabled can provide more insights into what might be going wrong. Set theDATAHUB_DEBUG
environment variable totrue
and run the ingestion command.export DATAHUB_DEBUG=true datahub ingest -c your_ingestion_recipe.yml
type: "looker"
config:
# Coordinates
base_url: "<https://looker-api>.<domain>"
# Credentials
client_id: "<redacted>"
client_secret: "<redacted>"
external_base_url: "<https://looker>.<domain>"
skip_personal_folders: true
transformers:
- type: "simple_add_dataset_domain"
config:
semantics: OVERWRITE
domains:
- "MyDomain"```
-
Inspect DBT Artifacts:
Ensure that themanifest.json
,catalog.json
, andrun_results.json
files are correctly generated and contain the expected metadata. You can manually inspect these files to ensure they include the necessary information about your models. -
Check for Version Compatibility:
Ensure that you are using compatible versions of DataHub and dbt. Sometimes, changes in either tool can cause issues with metadata ingestion.
Additional Resources
- <https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/dbt|DataHub DBT Ingestion Config Details>
- <https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/docs/transformer/dataset_transformer.md|Pattern Add Dataset Domain Transformer Documentation>
Would you like more detailed steps on any of these points or further assistance on a specific part of the process?
Sources:
- https://datahubspace.slack.com/archives/CUMUWQU66/p1725306020.518589|https://datahubspace.slack.com/archives/CUMUWQU66/p1725306020.518589
- https://datahubspace.slack.com/archives/C029A3M079U/p1723214252.167059|https://datahubspace.slack.com/archives/C029A3M079U/p1723214252.167059 0 button 1 button Hint: Mention <@U06TM7M40RK> in the thread for followups.
<@U06ERQ9DH45> please have a look at this when you get a minute - thank you!
Hi ! Did this interaction resolve your issue or do you still require assistance ?
Please tag @ RunLLM if you still need further assistance !
Or tag @Datahub Community Support if you require a person to take a look
Hope this helps ! ><