Strategies to Improve Search and UI Performance with 90,000 Data Items

Original Slack Thread

<@U06TM7M40RK> I have a freezing UI when I execute multiple searches at once. I have a total of 90 000 data items in datahub. How can I improve search and UI performance

To improve search and UI performance in DataHub, especially when dealing with a large number of data items (90,000 in your case), you can take several steps. Here are some recommendations based on the provided data sources:

  1. Scale Elasticsearch/OpenSearch:

    • Vertical Scaling: Increase the instance size of your OpenSearch nodes to handle more queries and data.
    • Horizontal Scaling: Add more data nodes to distribute the load. Ensure you have enough replicas to handle read requests efficiently.
    • Shard Configuration: Adjust the shard size and number of shards to optimize search performance. According to <Slack Slack thread>, increasing the shard size can help with performance.
  2. Enable Search Cache:

  3. Scale GMS (Graph Metadata Service):

    • Increase the number of replicas for the GMS service to handle more concurrent requests. Ensure that each replica has sufficient CPU and memory resources. As per <Slack Slack thread>, scaling GMS pods can help with search slowness issues.
  4. Optimize Elasticsearch Queries:

  5. Review and Optimize Ingestion Processes:

    • Review your ingestion recipes and configurations to ensure they are optimized for performance. Adjust the number of threads for ingestion if necessary.
  6. Monitor and Adjust Resource Allocation:

Would you like more detailed steps on any of these approaches?

Sources:

  • <Slack Thread 1>
  • <Slack Thread 2>
  • <Slack Thread 3>
  • <Search | DataHub Search Documentation> :+1: 0 button :-1: 0 button :bulb: Hint: Mention @runllm in the thread for followups.