<@U06TM7M40RK> when we are ingesting from RDS, we are seeing a huge spike in memory usage from our database. When we check the log on datahub, it shows a lot of profiling on the tables. We don’t have table profiling enable, is there a way to reduce the load on our database when ingesting from RDS?
It seems likely that the memory spike you’re experiencing during ingestion from RDS is related to the profiling process, even if you believe table profiling is not enabled. Profiling can be resource-intensive, and there are a few steps you can take to mitigate this issue:
-
Verify Profiling Configuration: Double-check your ingestion recipe to ensure that profiling is indeed disabled. According to the <https://datahubproject.io/docs/0.14.0/metadata-ingestion/docs/dev_guides/sql_profiles|DataHub documentation>, profiling can be disabled by setting the
profiling.enabled
parameter tofalse
. -
Limit Profiling Scope: If profiling is necessary for some tables, consider limiting the scope by using the
profile_pattern
configuration to specify only the tables that require profiling. This can help reduce the load on your database. -
Adjust Batch Size and Delay: You might also consider adjusting the
batchSize
andbatchDelayMs
parameters to reduce the load on the system. This can help manage the memory usage more effectively. -
Increase Memory Allocation: If the above steps do not resolve the issue, you may need to increase the memory allocation for the
datahub-actions
container. According to <Slack Slack discussion>, increasing the memory allocation to 12-16GB can help resolve memory-related issues. -
Memory Profiling: To better understand the memory usage, you can perform memory profiling of the ingestion process. The <https://datahubproject.io/docs/0.14.0/metadata-ingestion/docs/dev_guides/profiling_ingestions|DataHub profiling guide> provides instructions on how to generate a memory dump of your ingestion process and analyze it using tools like
memray
.
Would you like more detailed steps on any of these points or further assistance on a specific part of the process?
Sources:
- <https://datahubproject.io/docs/0.14.0/metadata-ingestion/docs/dev_guides/sql_profiles|DataHub documentation>
- <https://datahubproject.io/docs/0.14.0/metadata-ingestion/docs/dev_guides/profiling_ingestions|DataHub profiling guide>
0 button
0 button
Hint: Mention @runllm in the thread for followups.
Hi <@U08B11U7F6V>, how did my answer work out for you? Need any more assistance? You can also tag <@U07FV3UHE4D> to let your fellow community members know you’re still looking for guidance.