Hi folks,
I’m trying to ingest dbt test results into BigQuery sources, but instead of updating the BigQuery source, it creates a dbt source in DataHub. As a result, “composition of” is not working as expected.
I’m not sure what I’m missing—any help would be appreciated!
Hey there! 
<@U06TM7M40RK> Can you help with the questions in this thread?
Please continue to tag RunLLM (e.g <@U06TM7M40RK>) to have it continue generating responses!
To address the issue of dbt test results creating a separate dbt source instead of updating the existing BigQuery source in DataHub, it’s important to ensure that the configuration is set up correctly to align the metadata from dbt with the existing BigQuery datasets. Here are some steps and considerations:
-
Target Platform Configuration: Ensure that the
target_platform
in your dbt ingestion configuration is set tobigquery
. This helps in aligning the dbt metadata with the BigQuery datasets.source: type: dbt config: manifest_path: "path/to/manifest.json" catalog_path: "path/to/catalog.json" run_results_paths: ["path/to/run_results.json"] target_platform: "bigquery"
-
URN Consistency: Make sure that the URNs (Uniform Resource Names) generated by dbt match those of the BigQuery datasets. This can be affected by casing issues. According to <https://github.com/datahub-project/datahub/issues/7377|this GitHub issue>, dbt ingestion might lowercase table identifiers, which can cause mismatches if BigQuery URNs are not lowercased. Consider using the
convert_urns_to_lowercase
option in both dbt and BigQuery ingestion configurations to ensure consistency. -
Sibling Relationships: DataHub supports creating “sibling” relationships between dbt nodes and the target platform nodes. This should automatically align the dbt models with the BigQuery tables if the URNs match. Ensure that both dbt and BigQuery ingestions are configured to recognize these relationships.
-
Ingestion Order: The order of ingestion can sometimes affect how metadata is linked. Try ingesting the BigQuery metadata first, followed by the dbt metadata, to see if this resolves the issue.
-
Check for Duplicates: If you have multiple ingestion sources writing to the same URNs, ensure that they are not overwriting each other’s metadata. As per <Slack Slack discussion>, having two sources writing the same aspect can cause conflicts.
Would you like more detailed steps or further assistance on any specific part of this process?
Sources:
- <https://github.com/datahub-project/datahub/issues/7377|this GitHub issue>
- <Slack Slack discussion>
0 button
0 button
Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK> this my recipe output from ui
type: dbt
config:
manifest_path: /tmp/dbt_runsx_95thwq/data_quality/target/manifest.json
catalog_path: /tmp/dbt_runsx_95thwq/data_quality/target/catalog.json
test_results_path: /tmp/dbt_runsx_95thwq/data_quality/target/run_results.json
target_platform: bigquery
extractor: generic
extractor_config: {}
sink:
type: datahub-rest
config:
server: '<http://11.11.11.11:31048>'
token: '********'
transformers: null
flags:
generate_browse_path_v2: true
generate_browse_path_v2_dry_run: false
generate_memory_profiles: null
set_system_metadata: true
set_system_metadata_pipeline_name: true
reporting:
-
type: datahub
config: null
required: false
run_id: dbt-2025_01_29-14_34_13-376gij
datahub_api: null
pipeline_name: null
failure_log:
enabled: false
log_config: null```
also this urns for same table in diffrent source
<http://11.11.11.11:32601/dataset/urn:li:dataset:(urn:li:dataPlatform:bigquery,hb-dat[…]_table_2025,PROD)/Schema?is_lineage_mode=false&schemaFilter=>
<http://11.11.11.11:32601/dataset/urn:li:dataset:(urn:li:dataPlatform:dbt,hb-datalake[…]_table_2025,PROD)/Schema?is_lineage_mode=false&schemaFilter=>
Based on the provided recipe and URNs, it seems that the issue arises from the mismatch in the URNs between the dbt and BigQuery sources. Here are some steps to address this:
-
Ensure URN Consistency: The URNs for the same table in different sources should match in terms of platform and naming conventions. In your case, the URNs are:
- BigQuery:
urn:li:dataset:(urn:li:dataPlatform:bigquery,hb-datalake..._table_2025,PROD)
- dbt:
urn:li:dataset:(urn:li:dataPlatform:dbt,hb-datalake..._table_2025,PROD)
The platform part of the URN should be consistent. Since you want to align dbt metadata with BigQuery, ensure that the
target_platform
in your dbt configuration is set tobigquery
, which you have already done. However, the platform in the URN still showsdbt
. This indicates that the dbt ingestion is not aligning with the BigQuery URN. - BigQuery:
-
Check for Case Sensitivity: URNs are case-sensitive. Ensure that the case used in both dbt and BigQuery configurations is consistent. If necessary, use the
convert_urns_to_lowercase
option in both configurations to ensure consistency. -
Platform Instance: If you have multiple instances or environments, consider using the
platform_instance
field to differentiate them. This can help in aligning the metadata correctly. -
Review Naming Conventions: Ensure that the naming conventions used in dbt and BigQuery are consistent. This includes database, schema, and table names.
-
Ingestion Order: Try ingesting the BigQuery metadata first, followed by the dbt metadata. This can sometimes help in aligning the metadata correctly.
-
Debugging: Enable debug logs during ingestion to get more insights into what might be going wrong. This can help identify any discrepancies in the URNs or other configurations.
Would you like more detailed steps or further assistance on any specific part of this process?
Sources:
- https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/dbt#stripping-out-leading--sign|https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/dbt#stripping-out-leading--sign
- https://datahubspace.slack.com/archives/CUMUWQU66/p1700709176.002999|https://datahubspace.slack.com/archives/CUMUWQU66/p1700709176.002999
0 button
0 button
Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK> But iam using target_platform as bigquery, all other things handle on ingestion backend
this myi ingestiın code
try:
pipeline = Pipeline.create(
{
“source”: {
“type”: “dbt”,
“config”: {
“manifest_path”: dbt_dst_path + “/target/manifest.json”,
“catalog_path”: dbt_dst_path + “/target/catalog.json”,
“test_results_path”: dbt_dst_path + “/target/run_results.json”, # optional for recording dbt test results after running dbt test
“target_platform”: “bigquery”,
},
},
“sink”: {
“type”: “datahub-rest”,
“config”: {“server”: host,“token”:token},
}
}
)
urn needs to be bigquery platform but its not
If the URN is not reflecting the bigquery
platform despite setting target_platform
to bigquery
in your dbt ingestion configuration, there might be a few areas to investigate:
-
Configuration Verification: Double-check that the
target_platform
is correctly set tobigquery
in your dbt ingestion configuration. It seems correct in your provided code, but ensure there are no overrides or misconfigurations elsewhere in your setup. -
DataHub Version: Ensure you are using a compatible version of DataHub that supports the
target_platform
configuration for dbt. Sometimes, bugs or missing features in older versions can cause such issues. -
Ingestion Logs: Enable detailed logging for the ingestion process to see if there are any warnings or errors related to the platform setting. This can provide insights into why the URN is not being set as expected.
-
URN Mapping Logic: Review the logic in the DataHub ingestion framework that maps dbt nodes to DataHub datasets. According to the <https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/dbt#config-details|DataHub documentation>, the
target_platform
should dictate the platform part of the URN. If this is not happening, there might be an issue in the mapping logic. -
Custom Mapping: If the default behavior is not aligning the URNs as expected, consider implementing a custom mapping logic in your ingestion pipeline. This can be done by modifying the ingestion source code or using a custom ingestion recipe.
-
Community and Support: If the issue persists, consider reaching out to the DataHub community or support channels. There might be known issues or patches available that address this specific problem.
Would you like more detailed steps or further assistance on any specific part of this process?
Sources:
- <https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/dbt#config-details|DataHub dbt Ingestion Configuration>
0 button
0 button
Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK> WHAT is the suitable verison for acryl-datahub[dbt] for datahub 14.0.1 and dbt-core==1.8.2
dbt-bigquery==1.8.1.
I’m not able to find any data sources to answer your question. You might want to specify the compatibility requirements or constraints for the versions you are using. If this doesn’t help, please reach out to the DataHub support team for further assistance. 0 button
0 button
Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK> can i add debug on dbt ingestion?
Yes, you can enable debug logging for dbt ingestion in DataHub to gain more insights into the ingestion process. Here’s how you can do it:
-
Enable Debug Logs in the CLI: When running the DataHub CLI for ingestion, you can enable debug logs by setting the
DATAHUB_DEBUG
environment variable totrue
. This will provide detailed logging output during the ingestion process.export DATAHUB_DEBUG=true datahub ingest -c your_ingestion_recipe.yml
-
Check Logs for Errors: Once debug logging is enabled, you can check the logs for any errors or warnings that might indicate issues with the ingestion process. This can help you identify and resolve any problems.
-
Advanced Debugging: If you are using the DataHub UI, there might be an option to enable advanced or debug mode for ingestion. This can also provide more detailed logs.
-
Review Ingestion Configuration: Ensure that your ingestion configuration is correctly set up. Sometimes, misconfigurations can lead to issues that are revealed through debug logs.
Would you like more detailed steps or further assistance on any specific part of this process?
Sources:
- <Slack Slack Discussion on Debugging>
0 button
0 button
Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK> can i add to recipe*
To add debug logging directly to your ingestion recipe, you can include specific configurations that enable detailed logging. However, the primary way to enable debug logging is through environment variables or command-line options, as mentioned earlier. If you want to include logging configurations in your recipe, you can try the following approach:
-
Add Debug Flag in Recipe: While the recipe itself doesn’t have a direct debug flag, you can use the
flags
section to include any custom flags or configurations that your ingestion setup might recognize. -
Use Environment Variables: As mentioned, setting the
DATAHUB_DEBUG
environment variable totrue
is the most straightforward way to enable debug logging. -
Custom Logging Configuration: If your ingestion setup supports custom logging configurations, you can specify them in the recipe. This might involve adding a section for logging configuration if your ingestion framework supports it.
Here’s an example of how you might structure your recipe to include custom flags:
source:
type: dbt
config:
manifest_path: "path/to/manifest.json"
catalog_path: "path/to/catalog.json"
run_results_paths: ["path/to/run_results.json"]
target_platform: "bigquery"
sink:
type: datahub-rest
config:
server: "http://localhost:8080"
token: "your-auth-token"
flags:
custom_debug: true # Custom flag for enabling debug, if supported
Please note that the custom_debug
flag is hypothetical and would only work if your ingestion setup is configured to recognize and act on it.
Would you like more detailed steps or further assistance on any specific part of this process?
Sources:
- https://datahubspace.slack.com/archives/C029A3M079U/p1725483331.450899|https://datahubspace.slack.com/archives/C029A3M079U/p1725483331.450899
- https://runllm.slack.com/archives/C06SQ733HGQ/p1717518371.919619|https://runllm.slack.com/archives/C06SQ733HGQ/p1717518371.919619
0 button
0 button
Hint: Mention @runllm in the thread for followups.
Hi ! Did this interaction resolve your issue or do you still require assistance ?
Please tag @ RunLLM if you still need further assistance !
Or tag @Datahub Community Support if you require a person to take a look
Hope this helps ! ><