Handling Pagination of Large Data in Datahub

user-2 · March 4, 2024, 3:09pm

Hey All! I have a question around how Datahub handles paginating through large amounts of data. On the <Getting Started With GraphQL | DataHub started with GraphQL> page it says

> Note that by default Elasticsearch only allows pagination through 10,000 entities via the search API. If you need to paginate through more, you can change the default value for the index.max_result_window setting in Elasticsearch, or using the scroll API to read from the index directly.
I’m curious how Datahub displays results >10,000 in the UI if index.max_result_window is left at the default. Does it just display 10,000+ or does it retrieve the actual total count at the expense of performance?

Thanks! attachment

datahub_team · March 4, 2024, 3:09pm

<@U01GCJKA8P9> Can you jump in here?

user-2 · March 4, 2024, 3:09pm

I believe I have found the answer to my question here https://github.com/datahub-project/datahub/issues/5928

Looks like this was fixed with the scrollAcrossEntities graphql endpoint. But looks like the UI will still show 10,000+ . Correct me if I’m wrong here

user-1 · March 4, 2024, 3:09pm

This is correct and correct!
’

Topic		Replies	Views
Addressing Elasticsearch Circuit Breaking Errors in DataHub Production Cluster troubleshoot	3	69	March 4, 2024
Increasing Data Lineage Columns Display in DataHub UI ui	1	16	October 7, 2024
Strategies to Improve Search and UI Performance with 90,000 Data Items ui	1	47	October 7, 2024
Restricting Run Records Display on DataHub Ingestion Page ui	1	4	December 9, 2024
Datahub 0.12.1 Docker ingestion issue with 350 tables limit ingestion	3	33	June 3, 2024

Handling Pagination of Large Data in Datahub

Related topics