Precedence of `meta_mapping` and `transformers` in dbt ingestion pipeline configuration

Original Slack Thread

<@U06TM7M40RK> what takes precedence in the dbt recipe? is it always the meta_mapping configuration, or can it also be the transformers configuration?

It seems likely that in a dbt ingestion recipe, both meta_mapping and transformers configurations can be used, but they serve different purposes and may not directly conflict with each other. According to the <https://datahubproject.io/docs/0.13.1/generated/ingestion/sources/dbt#dbt-meta-automated-mappings|dbt meta automated mappings documentation>, meta_mapping is used to define actions such as adding tags, terms, or owners based on the meta properties defined in dbt models. On the other hand, transformers are used to modify data before it reaches the ingestion sinks, allowing for more complex transformations and custom logic.

The precedence between meta_mapping and transformers is not explicitly mentioned in the provided data sources. However, it is likely that they operate independently, with meta_mapping handling specific meta property mappings and transformers applying additional transformations as needed.

If you have specific scenarios where you think there might be a conflict or need clarification on how they interact, please let me know, and I can provide more detailed guidance.

Sources:

<@U06TM7M40RK> I want to know the order of execution once the ingestion starts

The DataHub’s documentation I have ingested does not mention the specific order of execution once the ingestion starts. You might want to provide more details about the type of ingestion (e.g., UI, CLI, or SDK-based) and the specific data source you are using, as this could affect the execution order. If this doesn’t help, please reach out to the DataHub support team for further assistance. :+1: 0 button :-1: 0 button :light_bulb: Hint: Mention @runllm in the thread for followups.

  type: dbt
  config:
        meta_mapping:
            data_domain:
                match: ".*"
                operation: "add_tag"
                config:
                    tag: "{{ $match }}"

transformers:
  - type: "domain_mapping_based_on_tags"
    config:
      domain_mapping:
        'behaviour': "urn:li:domain:behaviour"
        'marketing': "urn:li:domain:marketing"
        'sales': "urn:li:domain:sales"
        'subscription': "urn:li:domain:subscription"
        'tv': "urn:li:domain:tv"
        'video': "urn:li:domain:video"

pipeline_name: dbt_snowflake_pipeline_ingestion

sink:
    type: datahub-rest
    config:
        server: ""```
does the meta mapping always take precedence? <@U06TM7M40RK>

does the meta mapping always take precedence? <@U06TM7M40RK>

  type: dbt
  config:
        meta_mapping:
            data_domain:
                match: ".*"
                operation: "add_tag"
                config:
                    tag: "{{ $match }}"

transformers:
  - type: "domain_mapping_based_on_tags"
    config:
      domain_mapping:
        'behaviour': "urn:li:domain:behaviour"
        'marketing': "urn:li:domain:marketing"
        'sales': "urn:li:domain:sales"
        'subscription': "urn:li:domain:subscription"
        'tv': "urn:li:domain:tv"
        'video': "urn:li:domain:video"

pipeline_name: dbt_snowflake_pipeline_ingestion

sink:
    type: datahub-rest
    config:
        server: ""```

The DataHub’s documentation I have ingested does not mention how the execution order or precedence between meta_mapping and transformers is determined in a dbt ingestion pipeline. You might want to consider specifying the expected behavior or any specific conditions under which you are observing precedence issues. If this doesn’t help, please reach out to the DataHub support team for further assistance. :+1: 0 button :-1: 0 button :light_bulb: Hint: Mention @runllm in the thread for followups.

<@U06TM7M40RK> I configured data_domain under meta:. to map it to a domain in datahub, we first need to create a tag, which we can then map to a domain.

To map a data_domain defined under meta: in dbt to a domain in DataHub, you can follow a two-step process:

  1. Create a Tag from the Meta Property: Use the meta_mapping configuration to create a tag based on the data_domain meta property. This involves matching the data_domain value and creating a tag with the same name.

  2. Map the Tag to a Domain: Use the domain_mapping_based_on_tags transformer to map the created tag to a specific domain in DataHub.

Here’s how you can configure this in your ingestion recipe:

  1. Meta Mapping Configuration:
    This configuration will create a tag from the data_domain meta property.

    source:
      type: dbt
      config:
        meta_mapping:
          data_domain:
            match: ".*"
            operation: "add_tag"
            config:
              tag: "{{ $match }}"
    
  2. Transformer Configuration:
    This configuration will map the created tag to a domain.

    transformers:
      - type: "domain_mapping_based_on_tags"
        config:
          domain_mapping:
            'behaviour': "urn:li:domain:behaviour"
            'marketing': "urn:li:domain:marketing"
            'sales': "urn:li:domain:sales"
            'subscription': "urn:li:domain:subscription"
            'tv': "urn:li:domain:tv"
            'video': "urn:li:domain:video"
    

This setup ensures that the data_domain meta property is first converted into a tag, and then the tag is used to map the dataset to the appropriate domain in DataHub.

Would you like more detailed instructions or have any other specific questions?

Sources:

<@U06TM7M40RK> i just want to make sure that first the tag is added and then the transformer logic is executed

The DataHub’s documentation I have ingested does not mention the specific execution order between adding tags via meta_mapping and executing transformer logic. You might want to provide more details about any observed behavior or specific configurations that could affect the order of operations. If this doesn’t help, please reach out to the DataHub support team for further assistance. :+1: 0 button :-1: 0 button :light_bulb: Hint: Mention @runllm in the thread for followups.

Hi :slight_smile: ! Did this interaction resolve your issue or do you still require assistance ?

Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look :slight_smile:

Hope this helps ! ><