Troubleshooting Slow Elasticsearch Setup with UI Errors

Original Slack Thread

Hi All,

Would like some assistance in figuring out where the problem could be -

we have had several issues with our Elasticsearch setup where sometime the page reload is super slow and at other times the search query throws 500 error and “Something went wrong” shows up on the UI. We have experienced slowness when searching,ingesting,updating datasets and they dont show up on the UI.
We have tried the below -

  1. Move away from the Datahub provided Elasticsearch helm setup with 3 master and nodes (3 nodes doing both the job)
  2. Setup our own ES cluster on physical nodes with 3 master and 4 data nodes. This has proved to be somewhat successful.
    But recently we had another issue where navigating on the UI throws 500 Error and in the backend we find GraphQL errors for getSearchResultsAcrossMultiple query. Firstly it would help if datahub could log more details about whats wrong with the graphql query rather than just a 500 SERVER ERROR.
    We spent long time figuring it out but in the end we tried -
  3. Redeployment of helm datahub (didnt work)
  4. Restore Indices k8s job and via curl API (didnt work)
  5. Wiped out entire metadata storage and started fresh installation
    In all these cases we observed that the metadata starts to update well, but after some time it completely stops, for the whole day. RestoreIndices doesnt trigger the update again. Our ES has plenty of resources available but there is no movement on the indices. The entity counts on the UI dont match with the metadata DB.
    We are at a loss about what happens between GMS - metadata DB - ES and how the indices update process works. What is the order of dataset platform count updates? Once it starts, why doesnt the restore indices process complete until all counts are updated ?
    Can someone help us? FYI the datahub release is v0.13.0 and it doesnt matter, we have had this since 0.9.6.1.
    <@U03BEML16LB> <@U0348BYAS56> <@U03MF8MU5P0> <@U01GCJKA8P9> <@U01GZEETMEZ> <@U05C3CJDPD4>

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Which DataHub version are you using? (e.g. 0.12.0)
  2. Please post any relevant error logs on the thread!

<@U06MTUH6WPL> <@U068BKF6G84> <@U054C8Q7B2M> : following up on our last discussion, This is one of the main topic/challenge we are currently experiencing with our DataHub setup.

We would need actual logs in order to help here. In particular, debug logs which are stored in /tmp/datahub inside the GMS pod.

We are unable to extract logs from our GMS pod. But an example of the situation is this - I tried to change the entities displayed per page from 10 to 100 in the Domain page. The UI kept loading and in the end it threw the 500 Error message as below -
I could not find this in the GMS log but I can give you the network call details with graph query and inputs.
{
"input": {
"types": [],
"query": "*",
"start": 0,
"count": 100,
"orFilters": [
{
"and": [
{
"field": "domains",
"values": [
"urn:li:domain:Analytics_NO"
]
}
]
}
],
"searchFlags": {
"skipCache": true
}
}
}

&lt;html&gt;
&lt;head&gt;
&lt;meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/&gt;
&lt;title&gt;Error 503 Service Unavailable&lt;/title&gt;
&lt;/head&gt;
&lt;body&gt;&lt;h2&gt;HTTP ERROR 503 Service Unavailable&lt;/h2&gt;
&lt;table&gt;
&lt;tr&gt;&lt;th&gt;URI:&lt;/th&gt;&lt;td&gt;/api/graphql&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;th&gt;STATUS:&lt;/th&gt;&lt;td&gt;503&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;th&gt;MESSAGE:&lt;/th&gt;&lt;td&gt;Service Unavailable&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;th&gt;SERVLET:&lt;/th&gt;&lt;td&gt;apiServlet&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;hr/&gt;&lt;a href="<https://eclipse.org/jetty>"&gt;Powered by Jetty:// 11.0.19&lt;/a&gt;&lt;hr/&gt;

&lt;/body&gt;
&lt;/html&gt;
Note: there have been few times we’ve seen failure of the getSearchResultsForMultiple query failing w/o any explanation and throwing the 500 SERVER ERROR. At the same time, other actions like navigating through Airflow or Spark pipelines works fine.
Ques from my side - do the graphql queries call ES service or the backend metadata DB storage ? <@U05SKM6KGGK>![attachment]({‘ID’: ‘F0715DQ16JZ’, ‘EDITABLE’: False, ‘IS_EXTERNAL’: False, ‘USER_ID’: ‘U04UNS1T1JT’, ‘CREATED’: ‘2024-04-30 02:21:47+00:00’, ‘PERMALINK’: ‘Slack’, ‘EXTERNAL_TYPE’: ‘’, ‘TIMESTAMPS’: ‘2024-04-30 02:21:47+00:00’, ‘MODE’: ‘hosted’, ‘DISPLAY_AS_BOT’: False, ‘PRETTY_TYPE’: ‘PNG’, ‘NAME’: ‘image.png’, ‘IS_PUBLIC’: True, ‘PREVIEW_HIGHLIGHT’: None, ‘MIMETYPE’: ‘image/png’, ‘PERMALINK_PUBLIC’: ‘https://slack-files.com/TUMKD5EGJ-F0715DQ16JZ-626b6a6157’, ‘FILETYPE’: ‘png’, ‘EDIT_LINK’: None, ‘URL_PRIVATE’: ‘Slack’, ‘HAS_RICH_PREVIEW’: False, ‘TITLE’: ‘image.png’, ‘IS_STARRED’: False, ‘PREVIEW_IS_TRUNCATED’: None, ‘URL_PRIVATE_DOWNLOAD’: ‘Slack’, ‘PREVIEW’: None, ‘PUBLIC_URL_SHARED’: False, ‘MESSAGE_TS’: ‘1714444011.892429’, ‘PARENT_MESSAGE_TS’: ‘1714403648.203999’, ‘MESSAGE_CHANNEL_ID’: ‘C029A3M079U’, ‘_FIVETRAN_DELETED’: False, ‘LINES_MORE’: None, ‘LINES’: None, ‘SIZE’: 51699, ‘_FIVETRAN_SYNCED’: ‘2024-05-05 12:59:47.239000+00:00’})![attachment]({‘ID’: ‘F07187LE135’, ‘EDITABLE’: False, ‘IS_EXTERNAL’: False, ‘USER_ID’: ‘U04UNS1T1JT’, ‘CREATED’: ‘2024-04-30 02:20:11+00:00’, ‘PERMALINK’: ‘Slack’, ‘EXTERNAL_TYPE’: ‘’, ‘TIMESTAMPS’: ‘2024-04-30 02:20:11+00:00’, ‘MODE’: ‘hosted’, ‘DISPLAY_AS_BOT’: False, ‘PRETTY_TYPE’: ‘PNG’, ‘NAME’: ‘image.png’, ‘IS_PUBLIC’: True, ‘PREVIEW_HIGHLIGHT’: None, ‘MIMETYPE’: ‘image/png’, ‘PERMALINK_PUBLIC’: ‘https://slack-files.com/TUMKD5EGJ-F07187LE135-206a916fde’, ‘FILETYPE’: ‘png’, ‘EDIT_LINK’: None, ‘URL_PRIVATE’: ‘Slack’, ‘HAS_RICH_PREVIEW’: False, ‘TITLE’: ‘image.png’, ‘IS_STARRED’: False, ‘PREVIEW_IS_TRUNCATED’: None, ‘URL_PRIVATE_DOWNLOAD’: ‘Slack’, ‘PREVIEW’: None, ‘PUBLIC_URL_SHARED’: False, ‘MESSAGE_TS’: ‘1714444011.892429’, ‘PARENT_MESSAGE_TS’: ‘1714403648.203999’, ‘MESSAGE_CHANNEL_ID’: ‘C029A3M079U’, ‘_FIVETRAN_DELETED’: False, ‘LINES_MORE’: None, ‘LINES’: None, ‘SIZE’: 47336, ‘_FIVETRAN_SYNCED’: ‘2024-05-05 12:59:47.239000+00:00’})![attachment]({‘ID’: ‘F07187VLTHR’, ‘EDITABLE’: True, ‘IS_EXTERNAL’: False, ‘USER_ID’: ‘U04UNS1T1JT’, ‘CREATED’: ‘2024-04-30 02:23:33+00:00’, ‘PERMALINK’: ‘Slack’, ‘EXTERNAL_TYPE’: ‘’, ‘TIMESTAMPS’: ‘2024-04-30 02:23:33+00:00’, ‘MODE’: ‘snippet’, ‘DISPLAY_AS_BOT’: False, ‘PRETTY_TYPE’: ‘Plain Text’, ‘NAME’: ‘bad_graphql_query’, ‘IS_PUBLIC’: True, ‘PREVIEW_HIGHLIGHT’: ‘

\n
\n
query getSearchResultsForMultiple($input: SearchAcrossEntitiesInput!) {
\n
  searchAcrossEntities(input: $input) {
\n
    …searchResults
\n
    __typename
\n
  }
\n
\n
\n
\n’, ‘MIMETYPE’: ‘text/plain’, ‘PERMALINK_PUBLIC’: ‘https://slack-files.com/TUMKD5EGJ-F07187VLTHR-125aafc38f’, ‘FILETYPE’: ‘text’, ‘EDIT_LINK’: ‘Slack’, ‘URL_PRIVATE’: ‘Slack’, ‘HAS_RICH_PREVIEW’: False, ‘TITLE’: ‘bad_graphql_query’, ‘IS_STARRED’: False, ‘PREVIEW_IS_TRUNCATED’: True, ‘URL_PRIVATE_DOWNLOAD’: ‘Slack’, ‘PREVIEW’: ‘query getSearchResultsForMultiple($input: SearchAcrossEntitiesInput!) {\r\n searchAcrossEntities(input: $input) {\r\n …searchResults\r\n __typename\r\n }\r’, ‘PUBLIC_URL_SHARED’: False, ‘MESSAGE_TS’: ‘1714444011.892429’, ‘PARENT_MESSAGE_TS’: ‘1714403648.203999’, ‘MESSAGE_CHANNEL_ID’: ‘C029A3M079U’, ‘_FIVETRAN_DELETED’: False, ‘LINES_MORE’: 1717, ‘LINES’: 1722, ‘SIZE’: 28352, ‘_FIVETRAN_SYNCED’: ‘2024-05-05 12:59:47.239000+00:00’})

  • logs from the PODs![attachment]({‘ID’: ‘F0717KV429K’, ‘EDITABLE’: False, ‘IS_EXTERNAL’: False, ‘USER_ID’: ‘U06LB7J8KB5’, ‘CREATED’: ‘2024-04-30 11:21:57+00:00’, ‘PERMALINK’: ‘Slack’, ‘EXTERNAL_TYPE’: ‘’, ‘TIMESTAMPS’: ‘2024-04-30 11:21:57+00:00’, ‘MODE’: ‘hosted’, ‘DISPLAY_AS_BOT’: False, ‘PRETTY_TYPE’: ‘Zip’, ‘NAME’: ‘datahub_logs.zip’, ‘IS_PUBLIC’: True, ‘PREVIEW_HIGHLIGHT’: None, ‘MIMETYPE’: ‘application/zip’, ‘PERMALINK_PUBLIC’: ‘https://slack-files.com/TUMKD5EGJ-F0717KV429K-c8d58cbac0’, ‘FILETYPE’: ‘zip’, ‘EDIT_LINK’: None, ‘URL_PRIVATE’: ‘Slack’, ‘HAS_RICH_PREVIEW’: False, ‘TITLE’: ‘datahub_logs.zip’, ‘IS_STARRED’: False, ‘PREVIEW_IS_TRUNCATED’: None, ‘URL_PRIVATE_DOWNLOAD’: ‘Slack’, ‘PREVIEW’: None, ‘PUBLIC_URL_SHARED’: False, ‘MESSAGE_TS’: ‘1714476348.160789’, ‘PARENT_MESSAGE_TS’: ‘1714403648.203999’, ‘MESSAGE_CHANNEL_ID’: ‘C029A3M079U’, ‘_FIVETRAN_DELETED’: False, ‘LINES_MORE’: None, ‘LINES’: None, ‘SIZE’: 804398196, ‘_FIVETRAN_SYNCED’: ‘2024-05-05 12:59:47.282000+00:00’})