Handling Pagination of Large Data in Datahub

Original Slack Thread

Hey All! I have a question around how Datahub handles paginating through large amounts of data. On the <Getting Started With GraphQL | DataHub started with GraphQL> page it says

> Note that by default Elasticsearch only allows pagination through 10,000 entities via the search API. If you need to paginate through more, you can change the default value for the index.max_result_window setting in Elasticsearch, or using the scroll API to read from the index directly.
I’m curious how Datahub displays results >10,000 in the UI if index.max_result_window is left at the default. Does it just display 10,000+ or does it retrieve the actual total count at the expense of performance?


<@U01GCJKA8P9> Can you jump in here?

I believe I have found the answer to my question here https://github.com/datahub-project/datahub/issues/5928

Looks like this was fixed with the scrollAcrossEntities graphql endpoint. But looks like the UI will still show 10,000+ . Correct me if I’m wrong here :slightly_smiling_face:

This is correct and correct!