<@U06TM7M40RK>
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File “/datahub-ingestion/.venv/lib/python3.10/site-packages/acryl/executor/dispatcher/default_dispatcher.py”, line 34, in dispatch_async
res = executor.execute(request)
File “/datahub-ingestion/.venv/lib/python3.10/site-packages/acryl/executor/execution/reporting_executor.py”, line 89, in execute
self._datahub_graph.emit_mcp(kickoff_mcp, async_flag = False)
File “/datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/emitter/rest_emitter.py”, line 266, in emit_mcp
self._emit_generic(url, payload)
datahub.configuration.common.OperationalError: (‘Unable to emit metadata to DataHub GMS’, {‘message’: '401 Client Error: Unauthorized for url: http://datahub-datahub-gms:8080/aspects?action=ingestProposal’})
2024-11-14T11:45:45.621213009Z
The error message you are encountering, 401 Client Error: Unauthorized, indicates that your request to the DataHub GMS (Global Metadata Service) server is not authenticated. This typically happens when the required authentication token is missing or incorrect.
To resolve this issue, you need to ensure that you are using a valid Personal Access Token (PAT) in your request headers. Here are the steps to generate and use a PAT:
Generate a Personal Access Token:
Navigate to ‘Settings’ > ‘Access Tokens’ in the DataHub UI.
Click on ‘Generate new token’ and fill out the required information.
Save the generated token securely.
Use the Access Token in Your Requests:
Include the generated token in the Authorization header of your HTTP requests as follows:
To determine where your configuration file is saved, it depends on how you have set up your DataHub environment. Here are some common locations and methods to find your configuration file:
If you have set environment variables to specify a different configuration path, you should check those variables. The relevant environment variables include DATAHUB_CONFIG_PATH and CONDENSED_DATAHUB_CONFIG_PATH.
Docker or Kubernetes Deployments:
If you are using Docker or Kubernetes, the configuration might be mounted as a volume or specified in a values.yaml file for Helm deployments. For Docker Compose, you might have a docker-compose.yml file that specifies the configuration file path.
Custom Paths:
If you have customized the location of your configuration file, you would need to refer to your deployment scripts or documentation to find the exact path.
If you need more detailed steps on how to locate or configure your setup, please let me know!
When using Helm to deploy DataHub on Kubernetes, you typically configure your settings in the values.yaml file. This file is used to customize the deployment of various components of DataHub. Here’s how you can configure it:
Locate the values.yaml File:
This file is part of the Helm chart you are using to deploy DataHub. It contains configuration settings for all components of the DataHub deployment.
Configure the DataHub GMS (Global Metadata Service):
If you need to configure authentication or other settings for the DataHub GMS, you would typically do this under the datahub section in the values.yaml file.
Example Configuration:
Here is an example of how you might configure the DataHub GMS with a personal access token for authentication:
datahub:
gms:
enabled: true
image:
repository: acryldata/datahub-gms
tag: "v0.8.40"
config:
# Add your configuration here
authentication:
enabled: true
token: "<your-personal-access-token>"
Apply the Configuration:
After updating the values.yaml file, you need to apply the changes by upgrading your Helm release:
Check the status of your Kubernetes pods to ensure that the changes have been applied successfully:
kubectl get pods
This setup ensures that your DataHub deployment is configured with the necessary settings for authentication and other components. If you need more detailed steps or have specific configurations in mind, feel free to ask!
If you want to configure DataHub to allow open access without requiring authentication tokens, you can disable authentication in your deployment. Here’s how you can achieve this when using Helm in a Kubernetes environment:
Disable Authentication in values.yaml:
You need to modify the values.yaml file to disable authentication for the DataHub GMS (Global Metadata Service). This can be done by setting the authentication.enabled flag to false.
Example Configuration:
Here is an example of how you might configure the values.yaml file to disable authentication:
Check the status of your Kubernetes pods to ensure that the changes have been applied successfully:
kubectl get pods
By disabling authentication, you allow all users to access DataHub without needing to provide tokens. This setup is suitable for environments where security is not a concern, such as development or testing environments.
If you need further assistance or have specific configurations in mind, feel free to ask!
The default configuration for DataHub regarding authentication is that it is disabled. This means that, out of the box, DataHub does not require authentication tokens for accessing its services. This setup is intended to simplify initial setup and development processes.
According to the <Metadata Service Authentication | DataHub documentation>, authentication is disabled by default. It is only enabled if you explicitly set the METADATA_SERVICE_AUTH_ENABLED environment variable to true in the datahub-gms container or change the authentication.enabled setting to true in your DataHub Metadata Service configuration (application.yml).
Would you like more detailed information on how to configure or change these settings?
When a DataHub ingestion source remains in a “pending” state, it typically indicates that there might be an issue with the configuration or connectivity. Here are some steps you can take to troubleshoot and resolve this issue:
Check Logs:
Start by checking the logs of the DataHub ingestion pod to see if there are any error messages or warnings that can provide more context about the issue. You can do this by running:
kubectl logs <ingestion-pod-name>
Look for any errors related to network connectivity, authentication, or configuration.
Verify Looker Configuration:
Ensure that the Looker source configuration in your values.yaml file is correct. This includes verifying the Looker base URL, client ID, and client secret.
Ensure that your Kubernetes cluster can reach the Looker instance. This might involve checking network policies, firewall rules, or any proxies that might be in place.
Check Kubernetes Resources:
Verify that all necessary Kubernetes resources (pods, services, etc.) are running correctly. You can do this by running:
kubectl get pods
kubectl get services
Ensure that there are no pending or failed pods that might indicate resource constraints or misconfigurations.
Resource Allocation:
Check if the ingestion pod has sufficient resources (CPU, memory) allocated. Sometimes, insufficient resources can cause the pod to remain in a pending state.
Helm Chart Version:
Ensure that you are using a compatible version of the DataHub Helm chart with your Looker setup. Sometimes, upgrading to a newer version can resolve compatibility issues.
If you need more detailed steps or have specific configurations in mind, feel free to ask!
If you are unable to see the specific pod related to your DataHub ingestion job, it might be due to several reasons such as namespace issues, pod naming conventions, or the pod not being created due to configuration errors. Here are some steps to help you identify the pod:
Check the Namespace:
Ensure that you are looking in the correct Kubernetes namespace. By default, the kubectl command operates in the default namespace unless specified otherwise. If your DataHub deployment is in a different namespace, you need to specify it using the -n flag.
List all namespaces to find the correct one:
kubectl get namespaces
Once you know the correct namespace, list the pods in that namespace:
kubectl get pods -n <your-namespace>
Filter Pods by Name:
If you know part of the pod name, you can filter the pods using grep:
kubectl get pods -n <your-namespace> | grep datahub
Check for Pending Pods:
Sometimes, pods might be in a pending state due to resource constraints or scheduling issues. You can list all pods and check their status:
kubectl get pods -n <your-namespace> -o wide
Describe the Deployment:
If you still cannot find the pod, you can describe the deployment to see if there are any issues preventing the pod from being created:
<@U06TM7M40RK>
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File “/datahub-ingestion/.venv/lib/python3.10/site-packages/acryl/executor/dispatcher/default_dispatcher.py”, line 34, in dispatch_async
res = executor.execute(request)
File “/datahub-ingestion/.venv/lib/python3.10/site-packages/acryl/executor/execution/reporting_executor.py”, line 89, in execute
self._datahub_graph.emit_mcp(kickoff_mcp, async_flag = False)
File “/datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/emitter/rest_emitter.py”, line 266, in emit_mcp
self._emit_generic(url, payload)
datahub.configuration.common.OperationalError: (‘Unable to emit metadata to DataHub GMS’, {‘message’: '401 Client Error: Unauthorized for url: http://datahub-datahub-gms:8080/aspects?action=ingestProposalhttp://datahub-datahub-gms:8080/aspects?action=ingestProposal’}|'})
2024-11-14T11:45:45.621213009Z
what is this error?
I check i dont have the authentication enabled so what else can it be?
If you are encountering a 401 Client Error: Unauthorized error even though you have confirmed that authentication is disabled, there might be other factors at play. Here are some potential causes and steps to troubleshoot the issue:
Configuration Mismatch:
Double-check your values.yaml or application.yml configuration files to ensure that the authentication.enabled setting is indeed set to false for both the DataHub GMS and any other relevant components.
Environment Variables:
Verify that there are no environment variables overriding your configuration settings. Environment variables can sometimes take precedence over configuration files.
Network Issues:
Ensure that there are no network issues preventing the ingestion service from reaching the DataHub GMS. This includes checking for any firewalls, proxies, or network policies that might be blocking access.
Service Availability:
Confirm that the DataHub GMS service is running and accessible. You can do this by checking the status of the GMS pod and ensuring it is in a Running state:
kubectl get pods -n <your-namespace>
Logs and Events:
Check the logs of the DataHub GMS pod for any error messages or warnings that might provide more context about the issue:
kubectl logs <gms-pod-name> -n <your-namespace>
Additionally, check Kubernetes events for any issues related to pod scheduling or resource allocation:
kubectl get events -n <your-namespace>
Ingress and Service Configuration:
Ensure that the ingress and service configurations are correctly set up to route traffic to the DataHub GMS. Misconfigurations here can lead to connectivity issues.
Helm Chart Version:
Make sure you are using a compatible version of the DataHub Helm chart. Sometimes, upgrading to a newer version can resolve underlying issues.
If you need further assistance or have specific configurations in mind, feel free to ask!
Sources:
<Slack Discussion on 401 Unauthorized Error> 0 button 0 button Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK>
here is a more detailed :
2024/11/12 14:50:27 Ready: http://datahub-datahub-gms:8080/health.
No user action configurations found. Not starting user actions.
ls: cannot access ‘/etc/datahub/actions/conf/’: No such file or directory
[2024-11-12 14:50:28,486] INFO {datahub_actions.cli.actions:80} - DataHub Actions version: unavailable (installed editable via git)
[2024-11-12 14:50:28,536] INFO {datahub_actions.plugin.action.propagation.docs.propagation_action:192} - Doc Propagation Config action configured with enabled=True columns_enabled=True datasets_enabled=False
[2024-11-12 14:50:28,541] WARNING {datahub_actions.plugin.action.propagation.docs.propagation_action:434} - Error fetching global settings for doc propagation. Will try again in 1 minute.
Traceback (most recent call last):
File “/actions-src/src/datahub_actions/plugin/action/propagation/docs/propagation_action.py”, line 401, in refresh_config
entity_dict = self.ctx.graph.graph.get_entity_raw(
File “/datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/ingestion/graph/client.py”, line 481, in get_entity_raw
response.raise_for_status()
File “/datahub-ingestion/.venv/lib/python3.10/site-packages/requests/models.py”, line 1024, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: http://datahub-datahub-gms:8080/entitiesV2/urn%3Ali%3AglobalSettings%3A0?aspects=List(globalSettingsInfo)
[2024-11-12 14:50:28,944] WARNING {datahub_actions.cli.actions:107} - Skipping pipeline datahub_slack_action as it is not enabled
[2024-11-12 14:50:28,947] WARNING {datahub_actions.cli.actions:107} - Skipping pipeline datahub_teams_action as it is not enabled
[2024-11-12 14:50:28,947] INFO {datahub_actions.cli.actions:123} - Action Pipeline with name ‘datahub_doc_propagation_action’ is now running.
[2024-11-12 14:50:28,947] INFO {datahub_actions.cli.actions:123} - Action Pipeline with name ‘ingestion_executor’ is now running.
[2024-11-12 14:56:20,589] WARNING {datahub_actions.plugin.action.propagation.docs.propagation_action:434} - Error fetching global settings for doc propagation. Will try again in 1 minute.
Traceback (most recent call last):
File “/actions-src/src/datahub_actions/plugin/action/propagation/docs/propagation_action.py”, line 401, in refresh_config
entity_dict = self.ctx.graph.graph.get_entity_raw(
File “/datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/ingestion/graph/client.py”, line 481, in get_entity_raw
response.raise_for_status()
File “/datahub-ingestion/.venv/lib/python3.10/site-packages/requests/models.py”, line 1024, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: http://datahub-datahub-gms:8080/entitiesV2/urn%3Ali%3AglobalSettings%3A0?aspects=List(globalSettingsInfo)
[2024-11-12 15:06:25,117] WARNING {datahub_actions.plugin.action.propagation.docs.propagation_action:434} - Error fetching global settings for doc propagation. Will try again in 1 minute.
Traceback (most recent call last):
File “/actions-src/src/datahub_actions/plugin/action/propagation/docs/propagation_action.py”, line 401, in refresh_config
entity_dict = self.ctx.graph.graph.get_entity_raw(
File “/datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/ingestion/graph/client.py”, line 481, in get_entity_raw
response.raise_for_status()
File “/datahub-ingestion/.venv/lib/python3.10/site-packages/requests/models.py”, line 1024, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: http://datahub-datahub-gms:8080/entitiesV2/urn%3Ali%3AglobalSettings%3A0?aspects=List(globalSettingsInfo)
[2024-11-12 15:06:27,778] DEBUG {acryl.executor.dispatcher.default_dispatcher:75} - Started thread <Thread(Thread-3 (dispatch_async), started 139690220959296)> for 188df420-78b6-48e8-824a-078c8ea6d1f9
[2024-11-12 15:06:27,781] ERROR {acryl.executor.dispatcher.default_dispatcher:37} - Failed dispatch for 188df420-78b6-48e8-824a-078c8ea6d1f9: Traceback (most recent call last):
File “/datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/emitter/rest_emitter.py”, line 306, in _emit_generic
File “/datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/emitter/rest_emitter.py”, line 306, in _emit_generic
response.raise_for_status()
File “/datahub-ingestion/.venv/lib/python3.10/site-packages/requests/models.py”, line 1024, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: http://datahub-datahub-gms:8080/aspects?action=ingestProposal
The detailed logs you provided indicate that the 401 Client Error: Unauthorized error is occurring when trying to access certain endpoints on the DataHub GMS (Global Metadata Service). Here are some steps to troubleshoot and resolve this issue:
Verify Authentication Settings:
Double-check that authentication is indeed disabled in your configuration. Ensure that the METADATA_SERVICE_AUTH_ENABLED environment variable is not set to true in your Helm chart or Kubernetes deployment configuration.
Check for Environment Variable Overrides:
Ensure that there are no environment variables set in your Kubernetes deployment that might override the configuration files. You can check this by inspecting the environment variables of the running pods.
Inspect Helm Values:
Review your values.yaml file to ensure that authentication is not enabled. Look for any settings related to authentication and ensure they are set to false.
Network and Connectivity:
Verify that there are no network issues preventing the ingestion service from reaching the DataHub GMS. This includes checking for any firewalls, proxies, or network policies that might be blocking access.
Service and Ingress Configuration:
Ensure that the service and ingress configurations are correctly set up to route traffic to the DataHub GMS. Misconfigurations here can lead to connectivity issues.
Check Logs for Additional Context:
Continue to monitor the logs for any additional error messages or warnings that might provide more context about the issue. This can help identify specific problems with the service startup or network configuration.
Restart Services:
Sometimes, simply restarting the services can resolve transient issues. You can restart the DataHub GMS and related services to see if the issue persists.
Review Documentation:
Refer to the <Metadata Service Authentication | DataHub documentation on authentication> to ensure that all necessary steps have been followed to disable authentication.
If you need further assistance or have specific configurations in mind, feel free to ask!
The METADATA_SERVICE_AUTH_ENABLED environment variable should be defined in the configuration for the DataHub GMS (Global Metadata Service) component. This setting controls whether authentication is enabled for the GMS, which is the backend service responsible for storing and serving metadata.
Here’s how you can configure it:
DataHub GMS Configuration:
Ensure that the METADATA_SERVICE_AUTH_ENABLED environment variable is set in the deployment configuration for the DataHub GMS. This can be done in the values.yaml file if you are using Helm for deployment.
Example Configuration in values.yaml:
You can specify this environment variable under the GMS configuration section in your values.yaml file:
datahub:
gms:
enabled: true
image:
repository: acryldata/datahub-gms
tag: "v0.8.40"
env:
- name: METADATA_SERVICE_AUTH_ENABLED
value: "true" # Set to "false" if you want to disable authentication
Apply the Configuration:
After updating the values.yaml file, apply the changes by upgrading your Helm release:
Check the status of your Kubernetes pods to ensure that the changes have been applied successfully:
kubectl get pods
By setting the METADATA_SERVICE_AUTH_ENABLED environment variable in the GMS component, you control whether authentication is required for accessing the metadata service.
If you need further assistance or have specific configurations in mind, feel free to ask!