Optimizing Elastic Search Bulk API Calls for Quicker Reindexing

Original Slack Thread

Hi, I ran the restore indices job, but it is taking of lot of time for the reindexing, mae consumer is processing around 20-30 events per elastic search bulk api call, is there a way to make it quicker?

With it already running, it’s probably not a good idea to restart it with this, but the urnBasedPagination option should speed up the job for larger data sets:
https://github.com/datahub-project/datahub/blob/master/docker/datahub-upgrade/README.md

In general, RestoreIndices is a slow running job to fixup corruption issues and is not intended for regular running. If you have smaller subsets of data that you know need to be fixed up then you can specify which aspects or entities to run for rather than running it for everything.