<@U06TM7M40RK> After upgrading to v1.0.0 we are seeing ReadTimeoutError on long-running ingestions.
The issue you’re experiencing with ReadTimeoutError
on long-running ingestions after upgrading to v1.0.0 could be related to the default timeout settings in your ingestion configuration. This is a common issue when the ingestion process takes longer than the default timeout period.
According to the <https://github.com/datahub-project/datahub/issues/8986|GitHub issue>, a similar problem was resolved by increasing the read timeout in the aws_common.py
file. The solution involved modifying the configuration to include a read_timeout
parameter with a higher value, such as 1800 seconds.
Here’s a step-by-step approach to address this issue:
-
Modify Timeout Settings: You can increase the timeout settings in your ingestion configuration. If you are using AWS S3 as a source, you can modify the
aws_common.py
file to include aread_timeout
parameter. -
Edit Ingestion Recipe: If you are using a recipe for ingestion, ensure that the
timeout_sec
parameter is set to a higher value in the sink configuration. This parameter specifies the per-HTTP request timeout in seconds.sink: type: datahub-rest config: server: '<URL_of_your_Datahub_GMS>' timeout_sec: 1800 # Set the timeout to 1800 seconds (30 minutes)
-
Check for Global Timeout Settings: If the
timeout_sec
parameter is not effective, consider setting a global timeout for the entire ingestion job, especially if you are running the ingestion in a container or Kubernetes. -
Kubernetes Job Timeout: If you are using Kubernetes, you can set a timeout for the job by specifying the
activeDeadlineSeconds
field in theJob
orCronJob
specification.
Would you like more detailed steps on any of these points or further assistance on a specific part of the process?
Sources:
- <https://github.com/datahub-project/datahub/issues/8986|GitHub issue on ReadTimeoutError>
0 button
0 button
Hint: Mention @runllm in the thread for followups.
Hi <@U06PJCYPCJX>, how did my answer work out for you? Need any more assistance? You can also tag <@U07FV3UHE4D> to let your fellow community members know you’re still looking for guidance.
Hi ! Did this interaction resolve your issue or do you still require assistance ?
Please tag @ RunLLM if you still need further assistance !
Or tag @Datahub Community Support if you require a person to take a look
Hope this helps ! ><