Troubleshooting Transformers Not Being Applied in DataHub CLI Ingestion

Original Slack Thread

Hi All - Transformers not being applied ingestion via CLI.
I run the following recipe using DataHub 0.12.1 CLI, but I don’t see the transformers being sent as API calls in the debug logs (and also don’t see them in the UI against the data sets) . Any advice as far as what I’m missing?

    type: snowflake
    config:
        account_id: <redacted>
        include_table_lineage: false
        include_view_lineage: false
        convert_urns_to_lowercase: true
        include_usage_stats: false
        profiling:
            enabled: false
            profile_table_level_only: true
            turn_off_expensive_profiling_metrics: true
        warehouse: <redacted>
        username: <redacted>
        role: <redacted>
        password: '${SNOWFLAKE_DATAHUB}'

transformers:
    - type: "simple_add_dataset_ownership"
      config:
          owner_urns:
              - "urn:li:corpGroup:data-usa"
              - "urn:li:corpGroup:squad-team-usa-analytics"
          ownership_type: TECHNICAL_OWNER```

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

<@U06TM7M40RK> Can you help with the questions in this thread?


:information_source: We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a :thumbsup: or :thumbsdown: so other folks in the Community know if you’re still looking for help on the topic!

To address your issue with transformers not being applied during ingestion via CLI, let’s break down the potential causes and solutions based on the provided data sources.

Potential Issues and Solutions

  1. Recipe Structure:
    Ensure that the transformers are correctly defined in the recipe. The transformers should not be nested under the source section but should be at the same level as source.

    source:
        type: snowflake
        config:
            account_id: <redacted>
            include_table_lineage: false
            include_view_lineage: false
            convert_urns_to_lowercase: true
            include_usage_stats: false
            profiling:
                enabled: false
                profile_table_level_only: true
                turn_off_expensive_profiling_metrics: true
            warehouse: <redacted>
            username: <redacted>
            role: <redacted>
            password: '${SNOWFLAKE_DATAHUB}'
    
    transformers:
        - type: "simple_add_dataset_ownership"
          config:
              owner_urns:
                  - "urn:li:corpGroup:data-usa"
                  - "urn:li:corpGroup:squad-team-usa-analytics"
              ownership_type: TECHNICAL_OWNER
    
  2. Custom Transformer Issues:
    If you are using a custom transformer, ensure that it is correctly registered and accessible. According to <Issues · datahub-project/datahub · GitHub GitHub issue>, custom transformers need to be placed in the same directory as your recipe or properly referenced.

  3. Version Compatibility:
    Ensure that you are using a compatible version of DataHub. Some issues with transformers have been resolved in later versions. For example, <Issues · datahub-project/datahub · GitHub GitHub issue> mentions fixes in version 0.9.3.2. Make sure you are using the latest stable version.

  4. Debugging Logs:
    Check the debug logs for any errors or warnings related to the transformers. Sometimes, the logs can provide insights into what might be going wrong.

Example Recipe

Here is an example of a correctly structured recipe:

source:
    type: snowflake
    config:
        account_id: <redacted>
        include_table_lineage: false
        include_view_lineage: false
        convert_urns_to_lowercase: true
        include_usage_stats: false
        profiling:
            enabled: false
            profile_table_level_only: true
            turn_off_expensive_profiling_metrics: true
        warehouse: <redacted>
        username: <redacted>
        role: <redacted>
        password: '${SNOWFLAKE_DATAHUB}'

transformers:
    - type: "simple_add_dataset_ownership"
      config:
          owner_urns:
              - "urn:li:corpGroup:data-usa"
              - "urn:li:corpGroup:squad-team-usa-analytics"
          ownership_type: TECHNICAL_OWNER

Would you like more detailed steps on any of these points?

Sources: