Troubleshooting AWS Glue Ingestion Connectivity Issue

Original Slack Thread

Hi,

When trying to ingest AWS Glue using UI, I get the following error:

File “/tmp/datahub/ingest/venv-glue-03575587e416950c/lib/python3.10/site-packages/botocore/httpsession.py”, line 499, in send
raise ConnectTimeoutError(endpoint_url=request.url, error=e)
botocore.exceptions.ConnectTimeoutError: Connect timeout on endpoint URL: “https://glue.eu-central-1.amazonaws.com/

I am using DataHub CLI version: 0.13.2.4.

Any ideas why this is happening?

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

<@U06TM7M40RK> Can you help with the questions in this thread?


:information_source: We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a :thumbsup: or :thumbsdown: so other folks in the Community know if you’re still looking for help on the topic!

To address the ConnectTimeoutError when trying to ingest AWS Glue using the UI, follow these steps:

  1. Check AWS Region Configuration: Ensure that the AWS region specified in your configuration matches the region of your Glue service. For example, if your Glue service is in eu-central-1, your configuration should reflect that.

  2. Network Connectivity: Verify that your network allows outbound connections to the AWS Glue endpoint. Sometimes, firewall rules or network policies can block such connections.

  3. Increase Timeout Settings: You can increase the timeout settings to give more time for the connection to establish. This can be done by modifying the ingestion recipe to include a higher timeout value.

  4. AWS Credentials: Ensure that your AWS credentials (access key and secret key) are correctly configured and have the necessary permissions to access the Glue service.

Here is an example of how you can modify your ingestion recipe to include a higher timeout setting:

source:
  type: glue
  config:
    aws_region: "eu-central-1"
    aws_access_key_id: "&lt;YOUR_ACCESS_KEY&gt;"
    aws_secret_access_key: "&lt;YOUR_SECRET_KEY&gt;"
    options:
      connect_args:
        connect_timeout: 30  # Increase the timeout value as needed

If you are still facing issues, you can refer to the <Glue | DataHub documentation for Glue configuration> for more details.

Example Configuration

source:
  type: glue
  config:
    aws_region: "eu-central-1"
    aws_access_key_id: "&lt;YOUR_ACCESS_KEY&gt;"
    aws_secret_access_key: "&lt;YOUR_SECRET_KEY&gt;"
    options:
      connect_args:
        connect_timeout: 30  # Increase the timeout value as needed

sink:
  type: "datahub-rest"
  config:
    server: "<http://localhost:8080>"

Additional Steps

  • Check AWS Glue Endpoint: Ensure that the endpoint URL is correct and accessible.
  • Use UI Ingestion: If possible, use the UI ingestion feature, which runs within the DataHub actions pod and may handle network configurations more gracefully.

For more detailed troubleshooting, you can refer to the <Slack Slack thread> where similar issues have been discussed.

Sources:

  • <Slack Slack Thread 1>
  • <Slack Slack Thread 2>
  • <Slack Slack Thread 3>
  • <Glue | DataHub documentation for Glue configuration> :+1: 2 button :-1: 0 button :bulb: Hint: Mention <@U06TM7M40RK> in the thread for followups.

<@U06TM7M40RK> This did not help me

I apologize for the inconvenience. Let’s try to narrow down the issue further. Here are some additional steps and checks you can perform:

1. Verify AWS Glue Endpoint and Region
Ensure that the AWS Glue endpoint and region are correctly specified in your configuration. The error message indicates a timeout on the endpoint URL https://glue.eu-central-1.amazonaws.com/. Double-check that this is the correct region for your Glue service.

2. Network Connectivity
Ensure that your network allows outbound connections to the AWS Glue endpoint. Sometimes, firewall rules or network policies can block such connections. You can test connectivity to the Glue endpoint using a simple curl command from the machine where the ingestion is running:

curl https://glue.eu-central-1.amazonaws.com/

3. Increase Timeout Settings
You can increase the timeout settings to give more time for the connection to establish. This can be done by modifying the ingestion recipe to include a higher timeout value.

4. AWS Credentials and Permissions
Ensure that your AWS credentials (access key and secret key) are correctly configured and have the necessary permissions to access the Glue service.

5. Use UI Ingestion
If you are running the ingestion from the UI, ensure that the DataHub actions pod has the necessary network access and permissions. You can also try running the ingestion from the CLI to see if the issue persists.

Example Configuration with Increased Timeout
Here is an example of how you can modify your ingestion recipe to include a higher timeout setting:

source:
  type: glue
  config:
    aws_region: "eu-central-1"
    aws_access_key_id: "<YOUR_ACCESS_KEY>"
    aws_secret_access_key: "<YOUR_SECRET_KEY>"
    options:
      connect_args:
        connect_timeout: 60  # Increase the timeout value as needed

sink:
  type: "datahub-rest"
  config:
    server: "http://localhost:8080"

Additional Debugging Steps

  1. Check Logs: Review the logs for any additional error messages or clues. You can find detailed logs in the DataHub UI under the ingestion run details.
  2. Test with Different Region: If possible, try configuring the ingestion for a different AWS region to see if the issue is specific to the eu-central-1 region.
  3. AWS Support: If the issue persists, consider reaching out to AWS support to check if there are any known issues with the Glue service in your region.

References

Sources:

It clearly seems like the Datahub Action pod cannot connect to the glue endpoint.
Please, can you check network rules/ security groups, etc, and make sure the pod can connect to this endpoint?