<@U06TM7M40RK> what takes precedence in the dbt recipe? is it always the meta_mapping configuration, or can it also be the transformers configuration?
It seems likely that in a dbt ingestion recipe, both meta_mapping
and transformers
configurations can be used, but they serve different purposes and may not directly conflict with each other. According to the <https://datahubproject.io/docs/0.13.1/generated/ingestion/sources/dbt#dbt-meta-automated-mappings|dbt meta automated mappings documentation>, meta_mapping
is used to define actions such as adding tags, terms, or owners based on the meta properties defined in dbt models. On the other hand, transformers are used to modify data before it reaches the ingestion sinks, allowing for more complex transformations and custom logic.
The precedence between meta_mapping
and transformers
is not explicitly mentioned in the provided data sources. However, it is likely that they operate independently, with meta_mapping
handling specific meta property mappings and transformers applying additional transformations as needed.
If you have specific scenarios where you think there might be a conflict or need clarification on how they interact, please let me know, and I can provide more detailed guidance.
Sources:
- <https://datahubproject.io/docs/0.13.1/generated/ingestion/sources/dbt#dbt-meta-automated-mappings|dbt meta automated mappings documentation>
0 button
0 button
Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK> I want to know the order of execution once the ingestion starts
The DataHub’s documentation I have ingested does not mention the specific order of execution once the ingestion starts. You might want to provide more details about the type of ingestion (e.g., UI, CLI, or SDK-based) and the specific data source you are using, as this could affect the execution order. If this doesn’t help, please reach out to the DataHub support team for further assistance. 0 button
0 button
Hint: Mention @runllm in the thread for followups.
type: dbt
config:
meta_mapping:
data_domain:
match: ".*"
operation: "add_tag"
config:
tag: "{{ $match }}"
transformers:
- type: "domain_mapping_based_on_tags"
config:
domain_mapping:
'behaviour': "urn:li:domain:behaviour"
'marketing': "urn:li:domain:marketing"
'sales': "urn:li:domain:sales"
'subscription': "urn:li:domain:subscription"
'tv': "urn:li:domain:tv"
'video': "urn:li:domain:video"
pipeline_name: dbt_snowflake_pipeline_ingestion
sink:
type: datahub-rest
config:
server: ""```
does the meta mapping always take precedence? <@U06TM7M40RK>
does the meta mapping always take precedence? <@U06TM7M40RK>
type: dbt
config:
meta_mapping:
data_domain:
match: ".*"
operation: "add_tag"
config:
tag: "{{ $match }}"
transformers:
- type: "domain_mapping_based_on_tags"
config:
domain_mapping:
'behaviour': "urn:li:domain:behaviour"
'marketing': "urn:li:domain:marketing"
'sales': "urn:li:domain:sales"
'subscription': "urn:li:domain:subscription"
'tv': "urn:li:domain:tv"
'video': "urn:li:domain:video"
pipeline_name: dbt_snowflake_pipeline_ingestion
sink:
type: datahub-rest
config:
server: ""```
The DataHub’s documentation I have ingested does not mention how the execution order or precedence between meta_mapping
and transformers
is determined in a dbt ingestion pipeline. You might want to consider specifying the expected behavior or any specific conditions under which you are observing precedence issues. If this doesn’t help, please reach out to the DataHub support team for further assistance. 0 button
0 button
Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK> I configured data_domain
under meta:
. to map it to a domain in datahub, we first need to create a tag, which we can then map to a domain.
To map a data_domain
defined under meta:
in dbt to a domain in DataHub, you can follow a two-step process:
-
Create a Tag from the Meta Property: Use the
meta_mapping
configuration to create a tag based on thedata_domain
meta property. This involves matching thedata_domain
value and creating a tag with the same name. -
Map the Tag to a Domain: Use the
domain_mapping_based_on_tags
transformer to map the created tag to a specific domain in DataHub.
Here’s how you can configure this in your ingestion recipe:
-
Meta Mapping Configuration:
This configuration will create a tag from thedata_domain
meta property.source: type: dbt config: meta_mapping: data_domain: match: ".*" operation: "add_tag" config: tag: "{{ $match }}"
-
Transformer Configuration:
This configuration will map the created tag to a domain.transformers: - type: "domain_mapping_based_on_tags" config: domain_mapping: 'behaviour': "urn:li:domain:behaviour" 'marketing': "urn:li:domain:marketing" 'sales': "urn:li:domain:sales" 'subscription': "urn:li:domain:subscription" 'tv': "urn:li:domain:tv" 'video': "urn:li:domain:video"
This setup ensures that the data_domain
meta property is first converted into a tag, and then the tag is used to map the dataset to the appropriate domain in DataHub.
Would you like more detailed instructions or have any other specific questions?
Sources:
- <https://datahubproject.io/docs/0.13.1/generated/ingestion/sources/dbt#dbt-meta-automated-mappings|dbt meta automated mappings documentation>
- <https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/docs/transformer/dataset_transformer.md|domain_mapping_based_on_tags transformer documentation>
0 button
0 button
Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK> i just want to make sure that first the tag is added and then the transformer logic is executed
The DataHub’s documentation I have ingested does not mention the specific execution order between adding tags via meta_mapping
and executing transformer logic. You might want to provide more details about any observed behavior or specific configurations that could affect the order of operations. If this doesn’t help, please reach out to the DataHub support team for further assistance. 0 button
0 button
Hint: Mention @runllm in the thread for followups.
Hi ! Did this interaction resolve your issue or do you still require assistance ?
Please tag @ RunLLM if you still need further assistance !
Or tag @Datahub Community Support if you require a person to take a look
Hope this helps ! ><