Troubleshooting Steps for Ingesting Local DBT Metadata

Original Slack Thread

<@U06TM7M40RK> please share a recipe to ingest local DBT metadata

To ingest local DBT metadata into DataHub, you can use the following starter recipe. This recipe assumes that you have the necessary DBT artifact files (manifest.json, catalog.json, sources.json, and run_results.json) generated in the target directory of your DBT project. Here’s a basic configuration:

source:
  type: "dbt"
  config:
    # Paths to DBT artifact files
    manifest_path: "${DBT_PROJECT_ROOT}/target/manifest.json"
    catalog_path: "${DBT_PROJECT_ROOT}/target/catalog.json"
    sources_path: "${DBT_PROJECT_ROOT}/target/sources.json"  # optional for freshness
    test_results_path: "${DBT_PROJECT_ROOT}/target/run_results.json"  # optional for recording dbt test results

    # Target platform
    target_platform: "your_target_platform"  # e.g., postgres, bigquery, etc.

sink:
  type: "datahub-rest"
  config:
    server: "<http://localhost:8080>"  # Replace with your DataHub server URL

Key Points:

  • DBT Artifacts: Ensure that the paths to your DBT artifact files are correctly specified. These files contain the necessary metadata for ingestion.
  • Target Platform: Specify the target platform you are using (e.g., postgres, bigquery).
  • DataHub Server: Update the server URL to point to your DataHub instance.

For more detailed information on configuring DBT ingestion, you can refer to the <dbt | DataHub DBT Ingestion Documentation>.

Would you like more detailed steps or further assistance on any specific part of this process?

Sources:

  • <dbt | DataHub DBT Ingestion Documentation> :+1: 1 button :-1: 0 button :bulb: Hint: Mention <@U06TM7M40RK> in the thread for followups.

<@U06TM7M40RK> I am getting the error No such file or directory: '/Users/frabacal/git/dbt/target/manifest.json' but I can see the file is there

The error message No such file or directory: '/Users/frabacal/git/dbt/target/manifest.json' suggests that the path specified in your configuration might not be accessible from the environment where the ingestion is being executed. Here are a few steps to troubleshoot and resolve this issue:

  1. Verify File Path: Double-check the file path to ensure there are no typos or incorrect directory names. You can do this by navigating to the directory in a terminal and using the ls command to list the files.

  2. Check Permissions: Ensure that the user running the ingestion process has the necessary permissions to access the file. You can check and modify permissions using the chmod and chown commands.

  3. Environment Context: If you are running the ingestion from a different environment (e.g., a Docker container), ensure that the file path is accessible within that environment. You might need to mount the directory containing the file into the container.

  4. Absolute vs. Relative Path: Make sure you are using the correct path type. If the ingestion is running in a different working directory, an absolute path is recommended.

  5. Debugging: Run the ingestion command with additional logging or debugging options to get more insights into what might be going wrong. For example, if using the DataHub CLI, you can add the --debug flag.

If you continue to face issues, please provide more details about your setup, such as whether you are using a Docker container or a virtual environment, and any additional error logs.

Would you like more detailed steps or further assistance on any specific part of this process?

Sources:

<@U06TM7M40RK> which docker container runs the ingestion?

The ingestion process in DataHub is typically handled by the datahub-actions container. This container is responsible for executing ingestion jobs when they are triggered via the UI or scheduled.

If you are using Docker, you can verify that the datahub-actions container is running by executing the following command:

docker ps | grep datahub-actions

This command will list all running containers and filter for the datahub-actions container. If the container is not running, you may need to start it to ensure that ingestion jobs can be executed.

For more information on setting up and troubleshooting the datahub-actions container, you can refer to the <https://github.com/datahub-project/datahub/blob/master/docs/ui-ingestion.md|UI Ingestion Documentation>.

Would you like more detailed steps or further assistance on any specific part of this process?

Sources:

I did copy the dbt files into my the datahub-actions container and it worked :ok_hand: