Configuring Retry Mechanism for DataHub Plugin in Airflow

Original Slack Thread

Hi Team, I encountered an issue where my airflow to datahub plugin retried for over 40 minutes before failing due to the datahub instance being down is there a way to prevent this long-running retry for the datahub plugin as the delay retry prevents downstream tasks in airflow from starting. I am using Datahub 0.13.0 and airflow 2.9.1

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

<@U06TM7M40RK> Can you help with the questions in this thread?


:information_source: We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a :thumbsup: or :thumbsdown: so other folks in the Community know if you’re still looking for help on the topic!

To address the issue of long-running retries for the DataHub plugin in Airflow, you can configure the retry mechanism to limit the number of retries and the interval between retries. This can help prevent the delay in downstream tasks when the DataHub instance is down.

Based on the provided data sources, you can adjust the following configurations in your Airflow setup:

  1. Max Retries: Set the maximum number of retries for failed requests.
  2. Retry Interval: Set the interval between retries.

According to the <https://github.com/datahub-project/datahub/blob/master/metadata-integration/java/spark-lineage-beta/README.md|DataHub Spark Lineage README>, you can configure these settings using the following properties:

  • spark.datahub.rest.rest.max_retries: Number of times a request is retried if it fails.
  • spark.datahub.rest.rest.retry_interval: Number of seconds to wait between retries.

Here is an example configuration:

spark.conf.set("spark.datahub.rest.rest.max_retries", 3)
spark.conf.set("spark.datahub.rest.rest.retry_interval", 10)

This configuration will limit the retries to 3 attempts with a 10-second interval between each retry.

Would you like more detailed instructions on how to apply these settings in your Airflow setup, or do you have any other specific questions?

Sources: