<@U06TM7M40RK> I am trying to run an ingestion from UI but I cannot see it run in the UI even though I get the message that successfully created ingestion source. Why is this happening?
It seems like your ingestion request is being submitted but not executed, which can be due to several reasons. Here are some steps to troubleshoot the issue:
-
Check
datahub-actions
Container: Thedatahub-actions
container is responsible for executing ingestion requests. If this container is down or not functioning correctly, it could cause the ingestion request to fail. You can verify the health of the container usingdocker ps
or by checking the logs of thedatahub-actions
container. According to the <https://github.com/datahub-project/datahub/blob/master/docs/ui-ingestion.md|DataHub UI Ingestion documentation>, if you see ‘N/A’ and the ingestion run state never changes to ‘Running’, this may mean that your executor (datahub-actions
) container is down. -
Inspect Logs: Check the logs of the
datahub-actions
container for any errors or warnings. This can provide more context about what might be going wrong. You can do this by running:docker logs <container-id>
Replace
<container-id>
with the actual ID of thedatahub-actions
container. -
Verify Configuration: Ensure that your ingestion configuration is correct. Double-check fields like
account_id
,username
,password
,role
, andwarehouse
if applicable. -
Network Connectivity: Ensure that your DataHub instance can reach the data source. Network issues can cause the connection to fail. Verify DNS resolution and network policies that might be blocking the connection.
-
Resource Allocation: Ensure that your environment has sufficient resources allocated to the DataHub components, especially the
datahub-actions
pod.
Would you like more detailed steps on any of these troubleshooting points or further assistance on a specific part of the process?
Sources:
- <https://github.com/datahub-project/datahub/blob/master/docs/ui-ingestion.md|DataHub UI Ingestion documentation> 0 button 0 button Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK> This seems to be the error when I check datahub-actions logs. urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host=‘datahub-gms’, port=8080): Max retries exceeded with url: /aspects?action=ingestProposal (Caused by NewConnectionError(‘<urllib3.connection.HTTPConnection object at 0x7f94086385e0>: Failed to establish a new connection: [Errno -2] Name or service not known’))
The error message you’re encountering indicates that the datahub-actions
container is unable to connect to the DataHub GMS (Graph Metadata Service) at datahub-gms:8080
. This is often due to network connectivity issues or incorrect configuration. Here are some steps to resolve this issue:
-
Check Hostname Resolution: The error suggests that the hostname
datahub-gms
cannot be resolved. Ensure that the hostname is correctly configured and that your network setup allows for proper DNS resolution. If you’re running DataHub in a Docker environment, make sure that the Docker network is set up correctly and that thedatahub-gms
service is accessible from thedatahub-actions
container. -
Verify Service Availability: Ensure that the DataHub GMS service is running and accessible. You can check this by trying to access the GMS service from within the
datahub-actions
container. You can do this by executing a shell in the container and using tools likecurl
orping
to test connectivity:docker exec -it <datahub-actions-container-id> /bin/sh curl http://datahub-gms:8080/aspects
-
Update Configuration: If you’re running DataHub outside of Docker or in a different network setup, you might need to update the configuration to use the correct hostname or IP address. For example, if you’re running the ingestion from your local machine, you might need to use
localhost:8080
instead ofdatahub-gms:8080
. According to the <Issues · datahub-project/datahub · GitHub GitHub issue>, the sink configuration should point tolocalhost:8080
if the ingestion is running outside of Docker. -
Check Network Policies: Ensure that there are no network policies or firewalls blocking the connection between the
datahub-actions
container and the GMS service. -
Restart Services: Sometimes, simply restarting the services can resolve transient network issues. Try restarting the DataHub services and see if the issue persists.
Would you like more detailed guidance on any of these steps or further assistance with your setup?
Sources:
- <Issues · datahub-project/datahub · GitHub GitHub issue> 0 button 0 button Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK> While ingesting from cli, i had to specify a token in the sink config for the ingestion to work. But while ingesting from UI, I followed the onscreen prompts and there was no prompt that asked for token, so I did not provide it. Is that what’s causing the above error?
When ingesting data using the DataHub UI, the system is designed to handle authentication internally, and you typically do not need to manually provide a token. The UI should automatically manage the authentication process, including the use of tokens, as part of its internal configuration. This is different from CLI-based ingestion, where you need to explicitly specify a token in the sink configuration if metadata service authentication is enabled.
However, if you are encountering connectivity issues like the one you described earlier, it is unlikely that the absence of a token is the direct cause, since the UI should handle this aspect. Instead, the issue seems to be related to network connectivity or service configuration, as indicated by the MaxRetryError
and inability to resolve the datahub-gms
hostname.
Here are some additional considerations:
-
UI Configuration: Ensure that the DataHub UI is correctly configured to communicate with the GMS service. This includes verifying that the environment variables and network settings are correctly set up to allow the UI to access the GMS.
-
Service Health: Double-check that all DataHub services, including the GMS and
datahub-actions
, are running and healthy. Any service downtime or misconfiguration could lead to connectivity issues. -
Logs and Debugging: Continue to monitor the logs for any additional error messages or warnings that might provide more context about the issue. This can help identify whether the problem is related to authentication, network, or service configuration.
-
Token Management: If you suspect that authentication might still be an issue, you can manually add a token to the UI ingestion configuration as a workaround. This involves editing the YAML configuration in the UI to include a
sink
section with the token, similar to how you would configure it for CLI ingestion. However, this should not be necessary if the UI is functioning as expected.
If you would like more detailed guidance on any of these points or further assistance with your setup, please let me know.
Sources:
- <Slack Slack Discussion>
- <https://github.com/datahub-project/datahub/blob/master/docs/authentication/introducing-metadata-service-authentication.md|DataHub Authentication Guide> 0 button 0 button Hint: Mention @runllm in the thread for followups.
Hi ! Did this interaction resolve your issue or do you still require assistance ?
Please tag @ RunLLM if you still need further assistance !
Or tag @Datahub Community Support if you require a person to take a look
Hope this helps ! ><