Strategies to Improve Search and UI Performance with 90,000 Data Items

user-2 · October 7, 2024, 12:02am

<@U06TM7M40RK> I have a freezing UI when I execute multiple searches at once. I have a total of 90 000 data items in datahub. How can I improve search and UI performance

user-1 · October 7, 2024, 12:02am

To improve search and UI performance in DataHub, especially when dealing with a large number of data items (90,000 in your case), you can take several steps. Here are some recommendations based on the provided data sources:

Scale Elasticsearch/OpenSearch:
- Vertical Scaling: Increase the instance size of your OpenSearch nodes to handle more queries and data.
- Horizontal Scaling: Add more data nodes to distribute the load. Ensure you have enough replicas to handle read requests efficiently.
- Shard Configuration: Adjust the shard size and number of shards to optimize search performance. According to <Slack Slack thread>, increasing the shard size can help with performance.
Enable Search Cache:
- Enabling the search cache can significantly improve search performance. You can enable it by setting the SEARCH_SERVICE_ENABLE_CACHE=true environment variable on the GMS docker container. More details can be found https://datahubspace.slack.com/archives/C029A3M079U/p1674721050.312649|here.
Scale GMS (Graph Metadata Service):
- Increase the number of replicas for the GMS service to handle more concurrent requests. Ensure that each replica has sufficient CPU and memory resources. As per <Slack Slack thread>, scaling GMS pods can help with search slowness issues.
Optimize Elasticsearch Queries:
- Use the scrollAcrossEntities GraphQL API for queries that return more than 10k entities. This API helps in paginating through large result sets efficiently. More details can be found https://datahubproject.io/docs/0.14.0/how/search#searching-at-scale|here.
Review and Optimize Ingestion Processes:
- Review your ingestion recipes and configurations to ensure they are optimized for performance. Adjust the number of threads for ingestion if necessary.
Monitor and Adjust Resource Allocation:
- Ensure that your Elasticsearch and GMS services have adequate CPU and memory resources. Monitor their usage and adjust as needed. For example, you can increase the memory limit for Elasticsearch as mentioned https://datahubspace.slack.com/archives/C029A3M079U/p1674721050.312649|here.

Would you like more detailed steps on any of these approaches?

Sources:

<Slack Thread 1>
<Slack Thread 2>
<Slack Thread 3>
<Search | DataHub Search Documentation> 0 button 0 button Hint: Mention @runllm in the thread for followups.

Topic		Replies	Views
Troubleshooting Slow Elasticsearch Setup with UI Errors troubleshoot	5	54	May 6, 2024
Troubleshooting Slow and Failing Datahub Search with EKS Deployment troubleshoot	14	36	March 4, 2024
Optimizing Datahub UI performance by troubleshooting slow `listRecommendations` graphql query troubleshoot	5	85	March 4, 2024
Challenges with Datahub Installation and Elasticsearch Component Management in Helm Deployment getting-started	2	60	March 4, 2024
Troubleshooting Slow Ingestion Performance After Upgrading DataHub ingestion	4	97	May 27, 2024

Strategies to Improve Search and UI Performance with 90,000 Data Items

Related topics