Ingesting Data Domains with DataHub and dbt: Configuration and Recipe Guidance

Original Slack Thread

Hi Team. Someone Know how to ingest data domain . data_domain . Is this correct ? dbt + datahub
dbt + data domain + receipeattachmentattachment

Hey there! :wave:

<@U06TM7M40RK> Can you help with the questions in this thread?

:information_source: Please continue to tag RunLLM (e.g <@U06TM7M40RK>) to have it continue generating responses!

To ingest data domains using DataHub with dbt, you need to configure your ingestion recipe correctly. The dbt meta automated mappings feature does not support direct domain mappings, but you can achieve this using transformers in your ingestion recipe.

Here are the steps to configure your ingestion recipe to include domain mappings:

  1. Ingestion Recipe Configuration:
    You need to use the pattern_add_dataset_domain transformer to map datasets to domains based on patterns in their URNs. Below is an example configuration:
source:
  type: dbt
  config:
    manifest_path: '${DBT_PROJECT_ROOT}/target/manifest.json'
    catalog_path: '${DBT_PROJECT_ROOT}/target/catalog.json'
    sources_path: '${DBT_PROJECT_ROOT}/target/sources.json'
    test_results_path: '${DBT_PROJECT_ROOT}/target/run_results.json'
    env: DEV
    target_platform: postgres

transformers:
  - type: "pattern_add_dataset_domain"
    config:
      semantics: OVERWRITE
      domain_pattern:
        rules:
          'urn:li:dataset:\(urn:li:dataPlatform:postgres,postgres\.public\.n.*': ["hr"]
          'urn:li:dataset:\(urn:li:dataPlatform:postgres,postgres\.public\.t.*': ["urn:li:domain:finance"]
  1. Verify Domain Provisioning:
    Ensure that the domains you are trying to use are provisioned on DataHub. You can verify this by searching for the domain in the DataHub UI or using the GraphQL API to fetch the domain details.

  2. Run the Ingestion:
    Execute the ingestion using the DataHub CLI with your configured recipe.

Would you like more detailed instructions or have any other specific questions?

Sources:

  • <Dataset | DataHub transformer documentation> :+1: 1 button :-1: 0 button :bulb: Hint: Mention <@U06TM7M40RK> in the thread for followups.

<@U06TM7M40RK> here

To ingest data domains using DataHub with dbt, you need to configure your ingestion recipe correctly. The dbt meta automated mappings feature does not support direct domain mappings, but you can achieve this using transformers in your ingestion recipe.

Here are the steps to configure your ingestion recipe to include domain mappings:

  1. Ingestion Recipe Configuration:
    You need to use the pattern_add_dataset_domain transformer to map datasets to domains based on patterns in their URNs. Below is an example configuration:
pipeline_name: "my_dbt_pipeline"  # This is mandatory when stateful ingestion is enabled

source:
  type: dbt
  config:
    manifest_path: "${DBT_PROJECT_ROOT}/target/manifest.json"
    catalog_path: "${DBT_PROJECT_ROOT}/target/catalog.json"
    sources_path: "${DBT_PROJECT_ROOT}/target/sources.json"
    test_results_path: "${DBT_PROJECT_ROOT}/target/run_results.json"
    env: DEV
    target_platform: postgres
    stateful_ingestion:
      enabled: true
      remove_stale_metadata: true  # This will help in removing metadata of deleted models

transformers:
  - type: "pattern_add_dataset_domain"
    config:
      semantics: OVERWRITE
      domain_pattern:
        rules:
          'urn:li:dataset:\(urn:li:dataPlatform:postgres,postgres\.public\.n.*': ["hr"]
          'urn:li:dataset:\(urn:li:dataPlatform:postgres,postgres\.public\.t.*': ["urn:li:domain:finance"]

sink:
  type: "datahub-rest"
  config:
    server: '<http://localhost:8080>'
  1. Verify Domain Provisioning:
    Ensure that the domains you are trying to use are provisioned on DataHub. You can verify this by searching for the domain in the DataHub UI or using the GraphQL API to fetch the domain details.

  2. Run the Ingestion:
    Execute the ingestion using the DataHub CLI with your configured recipe.

Would you like more detailed instructions or have any other specific questions?

Sources:

  • <Dataset | DataHub transformer documentation> :+1: 1 button :-1: 0 button :bulb: Hint: Mention <@U06TM7M40RK> in the thread for followups.

Hi :slight_smile: ! Did this interaction resolve your issue or do you still require assistance ?

Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look :slight_smile:

Hope this helps ! ><