Troubleshooting dbt Test Results Ingestion for BigQuery Alignment

Original Slack Thread

Hi folks,
I’m trying to ingest dbt test results into BigQuery sources, but instead of updating the BigQuery source, it creates a dbt source in DataHub. As a result, “composition of” is not working as expected.
I’m not sure what I’m missing—any help would be appreciated!attachment

Hey there! :wave:

<@U06TM7M40RK> Can you help with the questions in this thread?

:information_source: Please continue to tag RunLLM (e.g <@U06TM7M40RK>) to have it continue generating responses!

To address the issue of dbt test results creating a separate dbt source instead of updating the existing BigQuery source in DataHub, it’s important to ensure that the configuration is set up correctly to align the metadata from dbt with the existing BigQuery datasets. Here are some steps and considerations:

  1. Target Platform Configuration: Ensure that the target_platform in your dbt ingestion configuration is set to bigquery. This helps in aligning the dbt metadata with the BigQuery datasets.

    source:
      type: dbt
      config:
        manifest_path: "path/to/manifest.json"
        catalog_path: "path/to/catalog.json"
        run_results_paths: ["path/to/run_results.json"]
        target_platform: "bigquery"
    
  2. URN Consistency: Make sure that the URNs (Uniform Resource Names) generated by dbt match those of the BigQuery datasets. This can be affected by casing issues. According to <https://github.com/datahub-project/datahub/issues/7377|this GitHub issue>, dbt ingestion might lowercase table identifiers, which can cause mismatches if BigQuery URNs are not lowercased. Consider using the convert_urns_to_lowercase option in both dbt and BigQuery ingestion configurations to ensure consistency.

  3. Sibling Relationships: DataHub supports creating “sibling” relationships between dbt nodes and the target platform nodes. This should automatically align the dbt models with the BigQuery tables if the URNs match. Ensure that both dbt and BigQuery ingestions are configured to recognize these relationships.

  4. Ingestion Order: The order of ingestion can sometimes affect how metadata is linked. Try ingesting the BigQuery metadata first, followed by the dbt metadata, to see if this resolves the issue.

  5. Check for Duplicates: If you have multiple ingestion sources writing to the same URNs, ensure that they are not overwriting each other’s metadata. As per <Slack Slack discussion>, having two sources writing the same aspect can cause conflicts.

Would you like more detailed steps or further assistance on any specific part of this process?

Sources:

<@U06TM7M40RK> this my recipe output from ui

  type: dbt
  config:
    manifest_path: /tmp/dbt_runsx_95thwq/data_quality/target/manifest.json
    catalog_path: /tmp/dbt_runsx_95thwq/data_quality/target/catalog.json
    test_results_path: /tmp/dbt_runsx_95thwq/data_quality/target/run_results.json
    target_platform: bigquery
  extractor: generic
  extractor_config: {}
sink:
  type: datahub-rest
  config:
    server: '<http://11.11.11.11:31048>'
    token: '********'
transformers: null
flags:
  generate_browse_path_v2: true
  generate_browse_path_v2_dry_run: false
  generate_memory_profiles: null
  set_system_metadata: true
  set_system_metadata_pipeline_name: true
reporting:
  -
    type: datahub
    config: null
    required: false
run_id: dbt-2025_01_29-14_34_13-376gij
datahub_api: null
pipeline_name: null
failure_log:
  enabled: false
  log_config: null```
also this urns for same table in diffrent source
<http://11.11.11.11:32601/dataset/urn:li:dataset:(urn:li:dataPlatform:bigquery,hb-dat[…]_table_2025,PROD)/Schema?is_lineage_mode=false&amp;schemaFilter=>

<http://11.11.11.11:32601/dataset/urn:li:dataset:(urn:li:dataPlatform:dbt,hb-datalake[…]_table_2025,PROD)/Schema?is_lineage_mode=false&amp;schemaFilter=>

Based on the provided recipe and URNs, it seems that the issue arises from the mismatch in the URNs between the dbt and BigQuery sources. Here are some steps to address this:

  1. Ensure URN Consistency: The URNs for the same table in different sources should match in terms of platform and naming conventions. In your case, the URNs are:

    • BigQuery: urn:li:dataset:(urn:li:dataPlatform:bigquery,hb-datalake..._table_2025,PROD)
    • dbt: urn:li:dataset:(urn:li:dataPlatform:dbt,hb-datalake..._table_2025,PROD)

    The platform part of the URN should be consistent. Since you want to align dbt metadata with BigQuery, ensure that the target_platform in your dbt configuration is set to bigquery, which you have already done. However, the platform in the URN still shows dbt. This indicates that the dbt ingestion is not aligning with the BigQuery URN.

  2. Check for Case Sensitivity: URNs are case-sensitive. Ensure that the case used in both dbt and BigQuery configurations is consistent. If necessary, use the convert_urns_to_lowercase option in both configurations to ensure consistency.

  3. Platform Instance: If you have multiple instances or environments, consider using the platform_instance field to differentiate them. This can help in aligning the metadata correctly.

  4. Review Naming Conventions: Ensure that the naming conventions used in dbt and BigQuery are consistent. This includes database, schema, and table names.

  5. Ingestion Order: Try ingesting the BigQuery metadata first, followed by the dbt metadata. This can sometimes help in aligning the metadata correctly.

  6. Debugging: Enable debug logs during ingestion to get more insights into what might be going wrong. This can help identify any discrepancies in the URNs or other configurations.

Would you like more detailed steps or further assistance on any specific part of this process?

Sources:

<@U06TM7M40RK> But iam using target_platform as bigquery, all other things handle on ingestion backend
this myi ingestiın code
try:
pipeline = Pipeline.create(
{
“source”: {
“type”: “dbt”,
“config”: {
“manifest_path”: dbt_dst_path + “/target/manifest.json”,
“catalog_path”: dbt_dst_path + “/target/catalog.json”,
“test_results_path”: dbt_dst_path + “/target/run_results.json”, # optional for recording dbt test results after running dbt test
“target_platform”: “bigquery”,
},
},
“sink”: {
“type”: “datahub-rest”,
“config”: {“server”: host,“token”:token},
}
}
)

urn needs to be bigquery platform but its not

If the URN is not reflecting the bigquery platform despite setting target_platform to bigquery in your dbt ingestion configuration, there might be a few areas to investigate:

  1. Configuration Verification: Double-check that the target_platform is correctly set to bigquery in your dbt ingestion configuration. It seems correct in your provided code, but ensure there are no overrides or misconfigurations elsewhere in your setup.

  2. DataHub Version: Ensure you are using a compatible version of DataHub that supports the target_platform configuration for dbt. Sometimes, bugs or missing features in older versions can cause such issues.

  3. Ingestion Logs: Enable detailed logging for the ingestion process to see if there are any warnings or errors related to the platform setting. This can provide insights into why the URN is not being set as expected.

  4. URN Mapping Logic: Review the logic in the DataHub ingestion framework that maps dbt nodes to DataHub datasets. According to the <https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/dbt#config-details|DataHub documentation>, the target_platform should dictate the platform part of the URN. If this is not happening, there might be an issue in the mapping logic.

  5. Custom Mapping: If the default behavior is not aligning the URNs as expected, consider implementing a custom mapping logic in your ingestion pipeline. This can be done by modifying the ingestion source code or using a custom ingestion recipe.

  6. Community and Support: If the issue persists, consider reaching out to the DataHub community or support channels. There might be known issues or patches available that address this specific problem.

Would you like more detailed steps or further assistance on any specific part of this process?

Sources:

<@U06TM7M40RK> WHAT is the suitable verison for acryl-datahub[dbt] for datahub 14.0.1 and dbt-core==1.8.2
dbt-bigquery==1.8.1.

I’m not able to find any data sources to answer your question. You might want to specify the compatibility requirements or constraints for the versions you are using. If this doesn’t help, please reach out to the DataHub support team for further assistance. :+1: 0 button :-1: 0 button :bulb: Hint: Mention @runllm in the thread for followups.

<@U06TM7M40RK> can i add debug on dbt ingestion?

Yes, you can enable debug logging for dbt ingestion in DataHub to gain more insights into the ingestion process. Here’s how you can do it:

  1. Enable Debug Logs in the CLI: When running the DataHub CLI for ingestion, you can enable debug logs by setting the DATAHUB_DEBUG environment variable to true. This will provide detailed logging output during the ingestion process.

    export DATAHUB_DEBUG=true
    datahub ingest -c your_ingestion_recipe.yml
    
  2. Check Logs for Errors: Once debug logging is enabled, you can check the logs for any errors or warnings that might indicate issues with the ingestion process. This can help you identify and resolve any problems.

  3. Advanced Debugging: If you are using the DataHub UI, there might be an option to enable advanced or debug mode for ingestion. This can also provide more detailed logs.

  4. Review Ingestion Configuration: Ensure that your ingestion configuration is correctly set up. Sometimes, misconfigurations can lead to issues that are revealed through debug logs.

Would you like more detailed steps or further assistance on any specific part of this process?

Sources:

  • <Slack Slack Discussion on Debugging> :+1: 0 button :-1: 0 button :bulb: Hint: Mention @runllm in the thread for followups.

<@U06TM7M40RK> can i add to recipe*

To add debug logging directly to your ingestion recipe, you can include specific configurations that enable detailed logging. However, the primary way to enable debug logging is through environment variables or command-line options, as mentioned earlier. If you want to include logging configurations in your recipe, you can try the following approach:

  1. Add Debug Flag in Recipe: While the recipe itself doesn’t have a direct debug flag, you can use the flags section to include any custom flags or configurations that your ingestion setup might recognize.

  2. Use Environment Variables: As mentioned, setting the DATAHUB_DEBUG environment variable to true is the most straightforward way to enable debug logging.

  3. Custom Logging Configuration: If your ingestion setup supports custom logging configurations, you can specify them in the recipe. This might involve adding a section for logging configuration if your ingestion framework supports it.

Here’s an example of how you might structure your recipe to include custom flags:

source:
  type: dbt
  config:
    manifest_path: "path/to/manifest.json"
    catalog_path: "path/to/catalog.json"
    run_results_paths: ["path/to/run_results.json"]
    target_platform: "bigquery"

sink:
  type: datahub-rest
  config:
    server: "http://localhost:8080"
    token: "your-auth-token"

flags:
  custom_debug: true  # Custom flag for enabling debug, if supported

Please note that the custom_debug flag is hypothetical and would only work if your ingestion setup is configured to recognize and act on it.

Would you like more detailed steps or further assistance on any specific part of this process?

Sources:

Hi :slight_smile: ! Did this interaction resolve your issue or do you still require assistance ?

Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look :slight_smile:

Hope this helps ! ><