Troubleshooting Tokens and Secrets Issue in DataHub Ingestion

Original Slack Thread

Hello everyone, everything good?
I’m having problems with tokens and secrets in ingestion. Today, I’m using DataHub in Kubernetes, and every day at ingestion time, DataHub can’t find the previously created secrets or the token, giving a connection error. However, the connection usually works by recreating the token and re-registering the secret.

I’m using DataHub 0.13.0. The ingestion was done via CLI and UI; the error is the same daily.

Does anyone know how to solve it or at least how to perform troubleshooting?

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

<@U06TM7M40RK> Can you help with the questions in this thread?


:information_source: We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a :thumbsup: or :thumbsdown: so other folks in the Community know if you’re still looking for help on the topic!

Hello! Let’s try to troubleshoot your issue with tokens and secrets in DataHub ingestion.

Key Points to Address:

  1. Version and Environment:

    • You’re using DataHub version 0.13.0.
    • Deployed on Kubernetes.
    • Ingestion is done via both CLI and UI.
  2. Problem Description:

    • Daily ingestion fails to find previously created secrets or tokens.
    • Recreating the token and re-registering the secret temporarily resolves the issue.

Possible Causes and Solutions:

  1. Secret Persistence:

    • According to <Slack discussion>, secrets might not persist correctly across pod restarts or redeployments. Secrets are typically stored in the container and not in a database, which could lead to loss upon redeployment.
    • Solution: Ensure that secrets are properly managed and persisted across deployments. You might need to use a more robust secret management system integrated with Kubernetes, such as Kubernetes Secrets or an external secret manager like HashiCorp Vault.
  2. Environment Variables:

    • If environment variables are being used for secrets and tokens, ensure they are correctly passed to all relevant pods.
    • Solution: Verify that the environment variables are correctly set in your Helm charts and Kubernetes manifests. Check that these variables are available in all pods involved in the ingestion process.
  3. Token and Secret Configuration:

    • Ensure that the token and secrets are correctly configured in your ingestion recipes.
    • Solution: Follow the <Ingestion | DataHub documentation> for creating and referencing secrets in your recipes. Make sure the token is included in the datahub-rest sink configuration.

Steps to Troubleshoot:

  1. Verify Secret Creation and Reference:

    • Ensure secrets are created and referenced correctly. Refer to the <Ingestion | DataHub a Secret> section.
    • Example:
      source:
        type: mysql
        config:
          host_port: 'localhost:3306'
          database: my_db
          username: ${MYSQL_USERNAME}
          password: ${MYSQL_PASSWORD}
      sink:
        type: datahub-rest
        config:
          server: 'http://datahub-gms:8080'
          token: ${DATAHUB_TOKEN}
      
  2. Check Kubernetes Secrets:

    • If using Kubernetes Secrets, ensure they are correctly mounted into your pods.
    • Example Kubernetes manifest snippet:
      apiVersion: v1
      kind: Secret
      metadata:
        name: my-secret
      type: Opaque
      data:
        MYSQL_USERNAME: base64-encoded-username
        MYSQL_PASSWORD: base64-encoded-password
        DATAHUB_TOKEN: base64-encoded-token
      
  3. Inspect Logs:

  • Check the logs of the DataHub ingestion executor and related pods for any errors or warnings that might indicate why secrets are not being resolved.
    • You can use kubectl logs &lt;pod-name&gt; to inspect logs.

Would you like more detailed steps or specific configurations for any of these points?

Sources:

  • <Slack Slack Discussion on Secret Persistence>
  • <Ingestion | DataHub UI Ingestion Documentation> :+1: 0 button :-1: 0 button :bulb: Hint: Mention @runllm in the thread for followups.