Troubleshooting dbt Models Not Combining with Snowflake Tables in Datahub Production Instance

user-1 · March 4, 2024, 3:37pm

I’ve got an issue with some dbt models not combining with their respective Snowflake tables. The issue is only affecting some tables (eg works for RAW_ANSWERS but not DIM_ANSWERS in the screenshot). It is also only affecting our production instance of Datahub, it works fine on dev instance ingesting the same data and using the same server version (0.10.2).

I can’t see why certain tables wouldn’t combine and don’t have many ideas for debugging. Does anyone have ideas for how to correct? I’ve tried rerunning the ingestions with dbt first and with Snowflake first but saw no change. I see no difference in naming or URNs between our prod and dev instances.

datahub_team · March 4, 2024, 3:37pm

Hi <@U04M79B3CN9>! My first thought is it might be an issue with URN casing; can you share your recipes for both sources? Pls omit any sensitive info

user-3 · March 4, 2024, 3:37pm

Are you able to upgrade to the newest version, 0.10.5? We had a bug with the creation of siblings (combining entities) that was fixed somewhat recently

user-4 · March 4, 2024, 3:37pm

Have you tried to have same cases for both dbt and snowflake dataset? Urns are case sensitive

user-1 · March 4, 2024, 3:37pm

I’d already checked the URNs and they match

urn:li:dataset:(urn:li:dataPlatform:snowflake,mydb.myschema.dim_answers,PROD)```
This is especially confusing as the urns and, from what I can tell, all the other configs, seem to the same on our dev instance of Datahub, but the dim_answers table and model do combine there.

Unfortunately I can’t easily upgrade the server version right now. The recipes are below, but I think the only thing changing between the runs is the rest endpoint.

dbt cloud
```source:
  type: "dbt-cloud"
  config:
    token: x
    account_id: x
    project_id: x
    job_id: x
    metadata_endpoint: <https://metadata.emea.dbt.com/graphql>
    target_platform: snowflake
    stateful_ingestion:
      enabled: false
    env: PROD```
snowflake
```pipeline_name: snowflake__policy_domain

source:
  type: snowflake
  config:
    account_id: x
    warehouse: x
    username: x
    password: x
    role: x
    database_pattern:
      allow:
        - ^...$
        - ...
    schema_pattern:
      allow:
        - ...
    profiling:
      enabled: false
      turn_off_expensive_profiling_metrics: true
    ignore_start_time_lineage: true
    stateful_ingestion:
      enabled: false
      remove_stale_metadata: false
    env: PROD

transformers:
    - type: pattern_add_dataset_terms
      config:
        semantics: PATCH
        term_pattern:
          rules:
            ...```

Topic		Replies	Views
Datahub Lineage Issue between DBT Model and Snowflake Sources ingestion	1	74	March 4, 2024
Troubleshooting Snowflake Classification Issue in DataHub Version 0.13.1.2 ingestion	4	14	September 2, 2024
Ingesting Snowflake for Higher Quality Column Lineage in dbt ingestion	1	52	March 4, 2024
Snowflake Lineage and DataHub Ingestion Challenges ingestion	8	62	March 4, 2024
Resolving Duplicate Model Issues in DataHub When Using dbt and BigQuery Ingestion Recipes ingestion	6	18	October 14, 2024

Troubleshooting dbt Models Not Combining with Snowflake Tables in Datahub Production Instance

Related topics