Troubleshooting `topSqlQueries` Issue Causing 500 Error on Dataset Detail Pages

Original Slack Thread

:hello-dog: We have been impacted with an issue related to topSqlQueries in dataset_datasetusagestatisticsaspect_v1 index that caused a 500 on every dataset detail page, thus basically making datahub unusable.
The exception thrown was com.linkedin.data.template.RequiredFieldNotPresentException: Field "value" is required but it is not present and that can be tracked down to a problem in ES

│  SORTED_SET]). Re-index with correct docvalues type.```
We have found out some reports of the same issue <https://datahubspace.slack.com/archives/C029A3M079U/p1692958291125559|1>, <https://datahubspace.slack.com/archives/C029A3M079U/p1692202286014619|2>, <https://datahubspace.slack.com/archives/C029A3M079U/p1683949129461639|3> without any real solution.
We tried truncating the index, but that failed with a `com.linkedin.restli.server.RestLiServiceException [HTTP Status:400]: Async reindex failed`.
Our only solution has been manually deleting and recreating the index. We have backed up the data and would very much like to be able to restore it if we can find the underlying issue.

v0.11.0 and this was out of the blue. I’m assuming some ingestion cronjob added some bad data, or we hit some kind of limit, but I don’t really know

Hey David, many apologies for the delayed response here… were you able to get this resolved? If not, <@U03MF8MU5P0> might be able to help out

No we did not :crying_cat_face: we had to delete the index. It would be great if we could find the issue so we don’t run into it again, and maybe even restore some of the data