Addressing Elasticsearch Circuit Breaking Errors in DataHub Production Cluster

Original Slack Thread

Good morning, I have a question regarding the Elasticsearch cluster, and specifically ES circuit breaking errors. Our production DataHub is currently serving constant 500 errors when attempting to load the UI because of this. Other threads on this slack seem to indicate either increasing JVM heap or the container memory will fix the issue, so while we’re trying to reconfigure that, I had some other questions in thread. We are using DH v0.11.0 with the default helm chart deployment

  1. Is the amount of data in the ES query directly related to the number of entities in our catalog? The error seems to throw when the UI attempts to load either the frontpage or sidebar, suggesting that those queries are attempting to cache some or all of the catalog for easy searching
  2. If yes to the above, how does that scale with number of entities? As our catalog grows will we need to monitor and scale out our ES container in addition to our MySQL container? We currently have approximately 2 million entities in our prod catalog. Does the DH implementation of ES allow us to just add more ES nodes rather than vertically scaling our containers?
    The specific error we’re getting is:
    {"error":{"root_cause":[{"type":"circuit_breaking_exception","reason":"[parent] Data too large, data for [indices:data/read/search[phase/query]] would be [387076296/369.1mb], which is larger than the limit of [382520524/364.7mb], real usage: [387075816/369.1mb], new bytes reserved: [480/480b], usages [request=0/0b, fielddata=0/0b, in_flight_requests=480/480b, model_inference=0/0b, eql_sequence=0/0b, accounting=6653368/6.3mb]","bytes_wanted":387076296,"bytes_limit":382520524,"durability":"PERMANENT"}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"containerindex_v2_1701706359277","node":"HFmWtb91QgG2Na3m19ffEg","reason":{"type":"circuit_breaking_exception","reason":"[parent] Data too large, data for [indices:data/read/search[phase/query]] would be [387076296/369.1mb], which is larger than the limit of [382520524/364.7mb], real usage: [387075816/369.1mb], new bytes reserved: [480/480b], usages [request=0/0b, fielddata=0/0b, in_flight_requests=480/480b, model_inference=0/0b, eql_sequence=0/0b, accounting=6653368/6.3mb]","bytes_wanted":387076296,"bytes_limit":382520524,"durability":"PERMANENT"}}]},"status":429}

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Which DataHub version are you using? (e.g. 0.12.0)
  2. Please post any relevant error logs on the thread!

<@UV5UEC3LN> might be able to help you here!