Resolving Connection Timeout Issues in AWS Glue when Connecting to DataHub and Identifying Correct GMS URL

Original Slack Thread

<@U06TM7M40RK> I have DataHub deployed on AWS and can access its pods and Swagger UI. I’m trying to connect to it from AWS Glue using the Python SDK with a generated token, but I’m getting timeout errors when using datahub.emitter.rest_emitter.DataHubRestEmitter. I’m not sure about the correct GMS URL - could someone help me identify it and resolve these connection timeout issues?

To resolve the connection timeout issues when using DataHubRestEmitter from AWS Glue, you need to ensure that the GMS URL is correctly configured and accessible from your AWS Glue environment. Here are some steps and considerations:

  1. Identify the Correct GMS URL:

    • If your DataHub is deployed on AWS, the GMS URL should be accessible from the AWS Glue environment. Typically, the GMS URL would be something like http://<your-datahub-gms-host>:8080 or https://<your-datahub-gms-host>:8443 if SSL is enabled.
    • If DataHub is deployed within a Kubernetes cluster, you might need to use the Kubernetes service name, such as http://datahub-datahub-gms.datahub.svc.cluster.local:8080 as mentioned in the <https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/sink_docs/datahub.md|DataHub documentation>.
  2. Network Configuration:

    • Ensure that the AWS Glue environment has network access to the DataHub GMS endpoint. This might involve configuring security groups, VPC settings, or firewall rules to allow traffic from AWS Glue to the DataHub GMS.
  3. Timeout Settings:

  4. Verify Token and Headers:

    • Ensure that the token used for authentication is correctly set in the Authorization header. The token should be a valid bearer token generated from your DataHub instance.
  5. Test Connectivity:

    • Before running the ingestion job, test the connectivity from a similar environment (e.g., an EC2 instance in the same VPC) to ensure that the GMS URL is reachable.

If you need more detailed steps or further assistance, please let me know!

Sources:

Hi <@U086CV6HJ1K>, how did my answer work out for you? Need any more assistance? You can also tag <@U07FV3UHE4D> to let your fellow community members know you’re still looking for guidance.

Hi :slight_smile: ! Did this interaction resolve your issue or do you still require assistance ?

Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look :slight_smile:

Hope this helps ! ><