Improving Re-indexing Speed and Addressing Bottlenecks in Elasticsearch Deployment

Original Slack Thread

Hi everyone! We frequently need to run a re-index when we notice the data is not correct on the UI or when we need to restore from the backup due to a problem with docker (using the quickstart/docker-compose deployment on a VM).
But the re-index is veeeerrry slow, it can take 2 days. We have about 8k data assets in the DB (Bigquery + Tableau). How can I expedite the re-index? I noticed that the Elasticsearch container only uses 1GB of memory, even though I configured it to use up to 2GB, so this doesn’t seem to be bottleneck. What else could be then? What is usually the bottleneck of the indexing process?
I appreciate any suggestions! :pray::skin-tone-4: :slightly_smiling_face:

<@U05SKM6KGGK> might be able to speak to this!

There’s a section in the Elasticsearch manual about tuning for indexing speed: https://www.elastic.co/guide/en/elasticsearch/reference/current/tune-for-indexing-speed.html

But it’s likely that the limiting factor here might be the deployment. You might want to look at the options for Docker storage drivers or something like that.

You could consider running Elasticsearch externally as well (e.g. on AWS)

And of course <Acryl Data Data> provides a fully managed setup where we manage these services for you

Nice, thank you <@U04QRNY4ZHA> and <@U05SKM6KGGK>. I’ll look at the options for Docker storage drivers so I can try to figure it out. Thank you!