Troubleshooting Unauthorized Error in DataHub Metadata Service Ingestion

Original Slack Thread

<@U06TM7M40RK> Help me with this error while running an ingestion: 401 Client Error: Unauthorized for url: http://datahub-datahub-gms:8080|http://datahub-datahub-gms:8080

The “401 Client Error: Unauthorized” error typically indicates an issue with authentication when trying to communicate with the DataHub Metadata Service (GMS). Here are some steps to troubleshoot and resolve this issue:

  1. Ensure Metadata Service Authentication is Enabled:
    Make sure that the METADATA_SERVICE_AUTH_ENABLED environment variable is set to “true” for both the datahub-gms and datahub-frontend containers. This can also be set in the application.yml configuration file.

    authentication:
      enabled: true
    

    According to the <https://github.com/datahub-project/datahub/blob/master/docs/authentication/introducing-metadata-service-authentication.md|DataHub documentation>, this is necessary to enforce authentication.

  2. Configure the Token in the Ingestion Recipe:
    Ensure that the token is correctly configured in your ingestion recipe. The token should be included in the datahub-rest sink configuration.

    sink:
      type: datahub-rest
      config:
        server: 'http://datahub-gms:8080'
        token: '<your-personal-access-token>'
    

    This is also mentioned in the <https://github.com/datahub-project/datahub/blob/master/docs/authentication/introducing-metadata-service-authentication.md|DataHub documentation>.

  3. Check the Authentication Chain:
    The error message indicates that the authentication chain is failing. Ensure that the Authorization header is correctly set and that the token is valid.

    Authorization: Bearer <your-personal-access-token>
    

    This is necessary as per the <https://github.com/datahub-project/datahub/blob/master/docs/authentication/introducing-metadata-service-authentication.md|DataHub authentication guide>.

  4. Verify Kubernetes Deployment:
    Ensure that your Kubernetes deployment is correctly configured to pass the necessary environment variables and configurations to the DataHub containers.

Here is an example of how your application.yml might look:

datahub:
  gms:
    authentication:
      enabled: true
  frontend:
    authentication:
      enabled: true

And your Kubernetes deployment might include environment variables like this:

env:
  - name: METADATA_SERVICE_AUTH_ENABLED
    value: "true"

Would you like more detailed steps or specific configurations for any of these points?

Sources:

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

<@U06TM7M40RK> Can you help with the questions in this thread?


:information_source: We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a :thumbsup: or :thumbsdown: so other folks in the Community know if you’re still looking for help on the topic!

I’m here to help with questions specifically about DataHub. Could you please provide more details about your issue? Specifically:

  1. Are you using the UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g., 0.12.0)?
  3. What data source(s) are you integrating with DataHub (e.g., BigQuery)?

This information will help me assist you more effectively. :+1: 0 button :-1: 0 button :bulb: Hint: Mention @runllm in the thread for followups.

Sure thing! I’m looking through the available data to answer your question. Hold on a second… :hourglass_flowing_sand: