Adding Constraints with Primary Key in dbt for DataHub Integration

Original Slack Thread

<@U06TM7M40RK> Could you help me to identify how to add constrains with primary_key in dbt and after that create a recipe to see like constrains primary key in datahub

To add constraints with a primary key in dbt and then create a recipe to see these constraints in DataHub, you can follow these steps:

  1. Define Primary Key in dbt: In dbt, you can define constraints such as primary keys using tests. For example, you can use the unique and not_null tests to ensure that a column is a primary key. Here’s an example of how you might define this in a dbt model:

    models:
      - name: your_model_name
        columns:
          - name: your_primary_key_column
            tests:
              - unique
              - not_null
    
  2. Ingest dbt Metadata into DataHub: To ingest dbt metadata, including the constraints, into DataHub, you need to use the dbt ingestion source. You can configure the ingestion recipe to point to your dbt manifest and catalog files. Here’s an example of a dbt ingestion recipe:

    source:
      type: "dbt"
      config:
        manifest_path: "/path/to/your/manifest.json"
        catalog_path: "/path/to/your/catalog.json"
        target_platform: "your_target_platform"
    
    sink:
      type: "datahub-rest"
      config:
        server: "http://localhost:8080"
    
  3. View Constraints in DataHub: Once the metadata is ingested, you should be able to view the primary key constraints in DataHub. The constraints will be represented as part of the dataset’s schema and lineage information.

For more detailed information on how to set up dbt ingestion, you can refer to the <https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/dbt|dbt ingestion documentation>.

Would you like more detailed guidance on any of these steps?

Sources:

How to add constrins in dbt for datahub

<@U06TM7M40RK> How to add constrins in dbt for datahub

It seems likely that to add constraints in dbt for DataHub, you would typically define these constraints using dbt’s testing framework, which can then be ingested into DataHub as part of the metadata. Here’s a general approach:

  1. Define Constraints in dbt: In dbt, constraints such as uniqueness, not-null, and relationships can be defined using tests. These tests can be tagged to indicate they are part of a data contract or constraint. For example:

    models:
      - name: your_model_name
        columns:
          - name: your_column_name
            tests:
              - unique
              - not_null
    

    You can also use custom tags to indicate that these tests are part of a data contract:

    models:
      - name: your_model_name
        columns:
          - name: your_column_name
            tests:
              - unique
                tags: ['contract']
              - not_null
    
  2. Ingest dbt Metadata into DataHub: To ingest these constraints into DataHub, you need to configure a dbt ingestion source in your DataHub ingestion recipe. This involves pointing to the dbt manifest and catalog files, which contain the metadata about your dbt models and tests.

    source:
      type: "dbt"
      config:
        manifest_path: "/path/to/your/manifest.json"
        catalog_path: "/path/to/your/catalog.json"
        target_platform: "your_target_platform"
    
    sink:
      type: "datahub-rest"
      config:
        server: "http://localhost:8080"
    
  3. View Constraints in DataHub: Once the metadata is ingested, the constraints defined in dbt will be visible in DataHub as part of the dataset’s schema and lineage information.

For more detailed information on integrating dbt with DataHub, you can refer to the <https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/dbt|dbt ingestion documentation>.

Would you like more detailed guidance on any of these steps?

Sources:

<@U06TM7M40RK> how create a recipe in dbt to ingest constrains (prmary_key) in datahub

To create a recipe in dbt to ingest constraints such as primary keys into DataHub, you can follow these steps:

  1. Define Constraints in dbt: Use dbt’s testing framework to define constraints like primary keys. This is typically done using the unique and not_null tests, which together can represent a primary key constraint. Here’s an example of how you might define this in a dbt model:

    models:
      - name: your_model_name
        columns:
          - name: your_primary_key_column
            tests:
              - unique
              - not_null
    
  2. Configure dbt Ingestion Recipe for DataHub: You need to set up a DataHub ingestion recipe that points to your dbt project’s manifest and catalog files. These files contain the metadata about your dbt models and tests, including the constraints you’ve defined.

    Here’s an example of a dbt ingestion recipe:

    source:
      type: "dbt"
      config:
        manifest_path: "/path/to/your/manifest.json"
        catalog_path: "/path/to/your/catalog.json"
        target_platform: "your_target_platform"
    
    sink:
      type: "datahub-rest"
      config:
        server: "http://localhost:8080"
    
  3. Run the Ingestion: Use the DataHub CLI to run the ingestion process with your configured recipe. This will ingest the dbt metadata, including the constraints, into DataHub.

    datahub ingest -c your_ingestion_recipe.yml
    
  4. Verify in DataHub: After ingestion, you should be able to see the primary key constraints reflected in DataHub as part of the dataset’s schema and metadata.

For more detailed information on setting up dbt ingestion, you can refer to the <https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/dbt|dbt ingestion documentation>.

Would you like more detailed guidance on any of these steps?

Sources: