Original Slack Thread
<@U06TM7M40RK> Ldap source recipe creates a lot of logs, the file size is more than 8GB.
recipe is called in an airflow DAG.
Airflow is unable to render logs of this much size.
What can be done ?
Here’s how the recipe is called
pipeline = Pipeline.create(pipeline_config)
pipeline.run()
pipeline.pretty_print_summary()
pipeline.raise_from_status()
It seems likely that the large log size issue you’re experiencing with the LDAP source recipe in your Airflow DAG could be due to the verbosity of the logging configuration. To address this, you might consider adjusting the logging level or configuration to reduce the amount of log data generated.
Based on the information from the <Slack Slack discussion>, one approach to manage logging is to modify the logback configuration file used by DataHub. You can replace the logback configuration file in the Docker image or add a new configuration at runtime. This can be done by adjusting the JVM parameters using the JAVA_OPTS
environment variable to specify a custom logback configuration file via -Dlogback.configurationFile=/mycustom/logback.xml
.
Additionally, you can consider the following steps:
-
Reduce Log Level: Change the log level to a less verbose level such as WARN
or ERROR
to reduce the amount of log data generated. This can be done by modifying the logging configuration in your custom logback file.
-
Log Rotation: Implement log rotation to manage log file sizes. This can be configured in your logging setup to archive old logs and keep the log file size manageable.
-
Filter Logs: If possible, filter out unnecessary log messages that are not critical for your monitoring or debugging purposes.
-
Use Airflow’s Log Management: Ensure that Airflow’s log management settings are optimized for handling large logs. This might include configuring Airflow to store logs in a more scalable storage solution if necessary.
Would you like more detailed guidance on any of these steps or further assistance with configuring the logging setup?
Sources:
- <Slack Slack discussion>
0 button
0 button
Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK> Recipe is called using
pipeline = Pipeline.create(pipeline_config)
pipeline.run()
pipeline.pretty_print_summary()
pipeline.raise_from_status()