Troubleshooting Missing Domain Tags in Data Ingestion Recipe

Original Slack Thread

Hello, when running the ingestion recipe in :thread: , none of the ingested items are tagged with the domain specified in the Transformer. Could anybody help me troubleshoot? thank you.

Hey there! :wave:

<@U06TM7M40RK> Can you help with the questions in this thread?

:information_source: Please continue to tag RunLLM (e.g <@U06TM7M40RK>) to have it continue generating responses!

To troubleshoot the issue of your ingested items not being tagged with the specified domain in the transformer, let’s go through some potential steps and configurations based on the provided data sources.

Steps to Troubleshoot and Resolve

  1. Verify Ingestion Recipe Configuration:
    Ensure that your ingestion recipe is correctly configured. Here is an example configuration for using the pattern_add_dataset_domain transformer:

    pipeline_name: "my_dbt_pipeline"  # This is mandatory when stateful ingestion is enabled
    
    source:
      type: dbt
      config:
        manifest_path: "<path_to_your_manifest.json>"
        catalog_path: "<path_to_your_catalog.json>"
        sources_path: "<path_to_your_sources.json>"
        stateful_ingestion:
          enabled: true
          remove_stale_metadata: true  # This will help in removing metadata of deleted models
    
    transformers:
      - type: "pattern_add_dataset_domain"
        config:
          semantics: OVERWRITE
          domain_pattern:
            rules:
              'urn:li:dataset:\(urn:li:dataPlatform:postgres,postgres\.public\.n.*': ["hr"]
              'urn:li:dataset:\(urn:li:dataPlatform:postgres,postgres\.public\.t.*': ["urn:li:domain:finance"]
    
    sink:
      type: "datahub-rest"
      config:
        server: 'http://localhost:8080'
    
  2. Check for Entity Deletion:
    If the entities were previously deleted, they might be marked as deleted in DataHub. You can undelete these entities using the DataHub Python library. Here is an example function to undelete an entity:

    from datahub.emitter.mce_builder import make_dataset_urn
    from datahub.emitter.rest_emitter import DatahubRestEmitter
    from datahub.metadata.schema_classes import StatusClass, MetadataChangeProposalWrapper
    
    def undelete_entity(entity_urn: str):
        emitter = DatahubRestEmitter(gms_server='http://localhost:8080')
        mcp_wrapper = MetadataChangeProposalWrapper(
            entityUrn=entity_urn,
            aspect=StatusClass(removed=False),
            changeType="UPSERT",
        )
        emitter.emit(mcp_wrapper)
        print(f"Entity {entity_urn} has been unmarked for deletion.")
    
    # Example usage
    undelete_entity("urn:li:dataset:(urn:li:dataPlatform:dbt,example_dataset,PROD)")
    
  3. Run DataHub CLI with Debug Logs:
    Running the DataHub CLI with debug logs enabled can provide more insights into what might be going wrong. Set the DATAHUB_DEBUG environment variable to true and run the ingestion command.

    export DATAHUB_DEBUG=true
    datahub ingest -c your_ingestion_recipe.yml
    
  type: "looker"
  config:
    # Coordinates
    base_url: "<https://looker-api>.&lt;domain&gt;"

    # Credentials
    client_id: "&lt;redacted&gt;"
    client_secret: "&lt;redacted&gt;"

    external_base_url: "<https://looker>.&lt;domain&gt;"
    skip_personal_folders: true

transformers:
  - type: "simple_add_dataset_domain"
    config:
      semantics: OVERWRITE
      domains:
        - "MyDomain"```
  1. Inspect DBT Artifacts:
    Ensure that the manifest.json, catalog.json, and run_results.json files are correctly generated and contain the expected metadata. You can manually inspect these files to ensure they include the necessary information about your models.

  2. Check for Version Compatibility:
    Ensure that you are using compatible versions of DataHub and dbt. Sometimes, changes in either tool can cause issues with metadata ingestion.

Additional Resources

Would you like more detailed steps on any of these points or further assistance on a specific part of the process?

Sources:

<@U06ERQ9DH45> please have a look at this when you get a minute - thank you!

Hi :slight_smile: ! Did this interaction resolve your issue or do you still require assistance ?

Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look :slight_smile:

Hope this helps ! ><