<@U06TM7M40RK> I have a freezing UI when I execute multiple searches at once. I have a total of 90 000 data items in datahub. How can I improve search and UI performance
To improve search and UI performance in DataHub, especially when dealing with a large number of data items (90,000 in your case), you can take several steps. Here are some recommendations based on the provided data sources:
-
Scale Elasticsearch/OpenSearch:
- Vertical Scaling: Increase the instance size of your OpenSearch nodes to handle more queries and data.
- Horizontal Scaling: Add more data nodes to distribute the load. Ensure you have enough replicas to handle read requests efficiently.
- Shard Configuration: Adjust the shard size and number of shards to optimize search performance. According to <Slack Slack thread>, increasing the shard size can help with performance.
-
Enable Search Cache:
- Enabling the search cache can significantly improve search performance. You can enable it by setting the
SEARCH_SERVICE_ENABLE_CACHE=true
environment variable on the GMS docker container. More details can be found https://datahubspace.slack.com/archives/C029A3M079U/p1674721050.312649|here.
- Enabling the search cache can significantly improve search performance. You can enable it by setting the
-
Scale GMS (Graph Metadata Service):
- Increase the number of replicas for the GMS service to handle more concurrent requests. Ensure that each replica has sufficient CPU and memory resources. As per <Slack Slack thread>, scaling GMS pods can help with search slowness issues.
-
Optimize Elasticsearch Queries:
- Use the
scrollAcrossEntities
GraphQL API for queries that return more than 10k entities. This API helps in paginating through large result sets efficiently. More details can be found https://datahubproject.io/docs/0.14.0/how/search#searching-at-scale|here.
- Use the
-
Review and Optimize Ingestion Processes:
- Review your ingestion recipes and configurations to ensure they are optimized for performance. Adjust the number of threads for ingestion if necessary.
-
Monitor and Adjust Resource Allocation:
- Ensure that your Elasticsearch and GMS services have adequate CPU and memory resources. Monitor their usage and adjust as needed. For example, you can increase the memory limit for Elasticsearch as mentioned https://datahubspace.slack.com/archives/C029A3M079U/p1674721050.312649|here.
Would you like more detailed steps on any of these approaches?
Sources:
- <Slack Thread 1>
- <Slack Thread 2>
- <Slack Thread 3>
- <Search | DataHub Search Documentation>
0 button
0 button
Hint: Mention @runllm in the thread for followups.