Hi Team. Someone Know how to ingest data domain . data_domain . Is this correct ? dbt + datahub
dbt + data domain + receipe
Hey there!
<@U06TM7M40RK> Can you help with the questions in this thread?
Please continue to tag RunLLM (e.g <@U06TM7M40RK>) to have it continue generating responses!
To ingest data domains using DataHub with dbt, you need to configure your ingestion recipe correctly. The dbt meta automated mappings
feature does not support direct domain mappings, but you can achieve this using transformers in your ingestion recipe.
Here are the steps to configure your ingestion recipe to include domain mappings:
- Ingestion Recipe Configuration:
You need to use thepattern_add_dataset_domain
transformer to map datasets to domains based on patterns in their URNs. Below is an example configuration:
source:
type: dbt
config:
manifest_path: '${DBT_PROJECT_ROOT}/target/manifest.json'
catalog_path: '${DBT_PROJECT_ROOT}/target/catalog.json'
sources_path: '${DBT_PROJECT_ROOT}/target/sources.json'
test_results_path: '${DBT_PROJECT_ROOT}/target/run_results.json'
env: DEV
target_platform: postgres
transformers:
- type: "pattern_add_dataset_domain"
config:
semantics: OVERWRITE
domain_pattern:
rules:
'urn:li:dataset:\(urn:li:dataPlatform:postgres,postgres\.public\.n.*': ["hr"]
'urn:li:dataset:\(urn:li:dataPlatform:postgres,postgres\.public\.t.*': ["urn:li:domain:finance"]
-
Verify Domain Provisioning:
Ensure that the domains you are trying to use are provisioned on DataHub. You can verify this by searching for the domain in the DataHub UI or using the GraphQL API to fetch the domain details. -
Run the Ingestion:
Execute the ingestion using the DataHub CLI with your configured recipe.
Would you like more detailed instructions or have any other specific questions?
Sources:
- <Dataset | DataHub transformer documentation> 1 button 0 button Hint: Mention <@U06TM7M40RK> in the thread for followups.
<@U06TM7M40RK> here
To ingest data domains using DataHub with dbt, you need to configure your ingestion recipe correctly. The dbt meta automated mappings
feature does not support direct domain mappings, but you can achieve this using transformers in your ingestion recipe.
Here are the steps to configure your ingestion recipe to include domain mappings:
- Ingestion Recipe Configuration:
You need to use thepattern_add_dataset_domain
transformer to map datasets to domains based on patterns in their URNs. Below is an example configuration:
pipeline_name: "my_dbt_pipeline" # This is mandatory when stateful ingestion is enabled
source:
type: dbt
config:
manifest_path: "${DBT_PROJECT_ROOT}/target/manifest.json"
catalog_path: "${DBT_PROJECT_ROOT}/target/catalog.json"
sources_path: "${DBT_PROJECT_ROOT}/target/sources.json"
test_results_path: "${DBT_PROJECT_ROOT}/target/run_results.json"
env: DEV
target_platform: postgres
stateful_ingestion:
enabled: true
remove_stale_metadata: true # This will help in removing metadata of deleted models
transformers:
- type: "pattern_add_dataset_domain"
config:
semantics: OVERWRITE
domain_pattern:
rules:
'urn:li:dataset:\(urn:li:dataPlatform:postgres,postgres\.public\.n.*': ["hr"]
'urn:li:dataset:\(urn:li:dataPlatform:postgres,postgres\.public\.t.*': ["urn:li:domain:finance"]
sink:
type: "datahub-rest"
config:
server: '<http://localhost:8080>'
-
Verify Domain Provisioning:
Ensure that the domains you are trying to use are provisioned on DataHub. You can verify this by searching for the domain in the DataHub UI or using the GraphQL API to fetch the domain details. -
Run the Ingestion:
Execute the ingestion using the DataHub CLI with your configured recipe.
Would you like more detailed instructions or have any other specific questions?
Sources:
- <Dataset | DataHub transformer documentation> 1 button 0 button Hint: Mention <@U06TM7M40RK> in the thread for followups.
Hi ! Did this interaction resolve your issue or do you still require assistance ?
Please tag @ RunLLM if you still need further assistance !
Or tag @Datahub Community Support if you require a person to take a look
Hope this helps ! ><