Resolving ReadTimeoutError on Long-running Ingestions After Upgrading to v1.0.0

Original Slack Thread

<@U06TM7M40RK> After upgrading to v1.0.0 we are seeing ReadTimeoutError on long-running ingestions.

The issue you’re experiencing with ReadTimeoutError on long-running ingestions after upgrading to v1.0.0 could be related to the default timeout settings in your ingestion configuration. This is a common issue when the ingestion process takes longer than the default timeout period.

According to the <https://github.com/datahub-project/datahub/issues/8986|GitHub issue>, a similar problem was resolved by increasing the read timeout in the aws_common.py file. The solution involved modifying the configuration to include a read_timeout parameter with a higher value, such as 1800 seconds.

Here’s a step-by-step approach to address this issue:

  1. Modify Timeout Settings: You can increase the timeout settings in your ingestion configuration. If you are using AWS S3 as a source, you can modify the aws_common.py file to include a read_timeout parameter.

  2. Edit Ingestion Recipe: If you are using a recipe for ingestion, ensure that the timeout_sec parameter is set to a higher value in the sink configuration. This parameter specifies the per-HTTP request timeout in seconds.

    sink:
      type: datahub-rest
      config:
        server: '<URL_of_your_Datahub_GMS>'
        timeout_sec: 1800  # Set the timeout to 1800 seconds (30 minutes)
    
  3. Check for Global Timeout Settings: If the timeout_sec parameter is not effective, consider setting a global timeout for the entire ingestion job, especially if you are running the ingestion in a container or Kubernetes.

  4. Kubernetes Job Timeout: If you are using Kubernetes, you can set a timeout for the job by specifying the activeDeadlineSeconds field in the Job or CronJob specification.

Would you like more detailed steps on any of these points or further assistance on a specific part of the process?

Sources:

Hi <@U06PJCYPCJX>, how did my answer work out for you? Need any more assistance? You can also tag <@U07FV3UHE4D> to let your fellow community members know you’re still looking for guidance.

Hi :slight_smile: ! Did this interaction resolve your issue or do you still require assistance ?

Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look :slight_smile:

Hope this helps ! ><