Troubleshooting Elasticsearch Indices Creation Issue in Datahub Deployment on AWS

user-3 · March 4, 2024, 5:35pm

Hello, I moved our POC datahub deployment to AWS-managed services (RDS, Opensearch, MSK)
But I’m getting the following error inside the upgrade job:
org.opensearch.OpenSearchStatusException: OpenSearch exception [type=index_not_found_exception, reason=no such index [datahubpolicyindex_v2]]
I’m running datahub v0.12.0
Any ideas? I’ve seen this error before but in those cases it was resolved by the datahub-update routine, not originating there.

user-3 · March 4, 2024, 5:35pm

attachment

user-3 · March 4, 2024, 5:35pm

I also made sure I had USE_AWS_ELASTICSEARCH=true set on the elasticsearch job attachment

user-1 · March 4, 2024, 5:35pm

what is your deployment framework for datahub?
have copied your elastic search data to the new deployment (OpenSearch)
if not you may have run the restore indices job once
https://datahubproject.io/docs/how/restore-indices/

with Kubernets
https://datahubproject.io/docs/how/restore-indices/#all-things-deployment

user-3 · March 4, 2024, 5:35pm

We’re using helm on k8s. Since we are in POC I was just starting things with fresh DBs.

I just tried the restore indices and it does not solve the issue. My understanding was that this restored the content of the indices by republishing MAE events. It doesn’t seem to ensure the indices exist

user-1 · March 4, 2024, 5:35pm

Yes, restoring indices won’t create indices. you may have to run the elasticsearch-setup, https://hub.docker.com/r/acryldata/datahub-kafka-setup/|kafka-setup and mysql-setup

user-3 · March 4, 2024, 5:35pm

Those all run successfully but the indices aren’t created. I posted the elasticsearch-setup log above

user-1 · March 4, 2024, 5:35pm

with Helm deployment enable these respective sections
https://github.com/acryldata/datahub-helm/blob/master/charts/datahub/values.yaml

user-1 · March 4, 2024, 5:35pm

how about datahub-system-update-job job ?

user-3 · March 4, 2024, 5:35pm

That is the first error log

user-1 · March 4, 2024, 5:35pm

<@U0667UL20SD> you have shared datahubUpgrade logs (first one), not datahub-system-update-job
datahub-system-update-job is where elasticsearch indices are created
https://github.com/acryldata/datahub-helm/blob/f04bcdeced701edca2e7fe5bfc5a8095415bab6e/charts/datahub/values.yaml#L351

user-3 · March 4, 2024, 5:35pm

Sorry, the first log I posted is actually from the system update job. I named it based off of the docker image before I realized that multiple jobs used that image. I just re-ran my deployment and the datahub-system-update-job failed with that same error

user-3 · March 4, 2024, 5:35pm

Here is the full log. Pretty much the same as last time.

When I /_cat/indices/ on my opensearch cluster only 2 indices are there:

green open .kibana_1                  Eh6BSDbYSbemo6VLfpVaYQ 1 1 0 0 416b 208b
green open datahub_usage_event-000001 aJmlTza_Tr6mOQgDsiXihw 5 1 0 0  2kb  1kb```![attachment](https://files.slack.com/files-pri/TUMKD5EGJ-F06F8S8A2KY/datahub-system-update-job.log?t=xoxe-973659184562-6705490291811-6708051934148-dd1595bd5f63266bc09e6166373c7a3c)

user-1 · March 4, 2024, 5:35pm

Yeah, somehow indices are not created, checking …

user-1 · March 4, 2024, 5:35pm

<@U0667UL20SD> It looks like the reason for the actual failure is

java.lang.RuntimeException: java.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-0 [ACTIVE]
	at com.linkedin.metadata.search.elasticsearch.indexbuilder.EntityIndexBuilders.reindexAll(EntityIndexBuilders.java:34)
	at com.linkedin.metadata.search.elasticsearch.ElasticSearchService.configure(ElasticSearchService.java:45)
	at com.linkedin.metadata.search.elasticsearch.ElasticSearchService.reindexAll(ElasticSearchService.java:55)
	at com.linkedin.datahub.upgrade.system.elasticsearch.steps.BuildIndicesStep.lambda$executable$0(BuildIndicesStep.java:36)
	at com.linkedin.datahub.upgrade.impl.DefaultUpgradeManager.executeStepInternal(DefaultUpgradeManager.java:110)
	at com.linkedin.datahub.upgrade.impl.DefaultUpgradeManager.executeInternal(DefaultUpgradeManager.java:68)
	at com.linkedin.datahub.upgrade.impl.DefaultUpgradeManager.executeInternal(DefaultUpgradeManager.java:42)
	at com.linkedin.datahub.upgrade.impl.DefaultUpgradeManager.execute(DefaultUpgradeManager.java:33)
	at com.linkedin.datahub.upgrade.UpgradeCli.run(UpgradeCli.java:80)
	at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:768)
	at org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:752)
	at org.springframework.boot.SpringApplication.run(SpringApplication.java:314)
	at org.springframework.boot.builder.SpringApplicationBuilder.run(SpringApplicationBuilder.java:164)
	at com.linkedin.datahub.upgrade.UpgradeCliApplication.main(UpgradeCliApplication.java:23)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:49)
	at org.springframework.boot.loader.Launcher.launch(Launcher.java:108)
	at org.springframework.boot.loader.Launcher.launch(Launcher.java:58)
	at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:65)
Caused by: java.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-0 [ACTIVE]
	at org.opensearch.client.RestClient.extractAndWrapCause(RestClient.java:936)
	at org.opensearch.client.RestClient.performRequest(RestClient.java:332)
	at org.opensearch.client.RestClient.performRequest(RestClient.java:320)
	at org.opensearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1918)
	at org.opensearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1901)
	at org.opensearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1865)
	at org.opensearch.client.IndicesClient.create(IndicesClient.java:159)
	at com.linkedin.metadata.search.elasticsearch.indexbuilder.ESIndexBuilder.createIndex(ESIndexBuilder.java:495)
	at com.linkedin.metadata.search.elasticsearch.indexbuilder.ESIndexBuilder.buildIndex(ESIndexBuilder.java:191)
	at com.linkedin.metadata.search.elasticsearch.indexbuilder.EntityIndexBuilders.reindexAll(EntityIndexBuilders.java:32)
	... 21 common frames omitted```
it indicates ElasticSearch is not responding to create index requests, you must be having some problem in ES, please take a look at ES logs when you running this deployment

user-3 · March 4, 2024, 5:36pm

That doesn’t fully make sense since the graph_service_v1 index is successfully created somehow. I’m also able to curl -XPUT <eshost>/test_index from the GMS pod to create a test index

user-3 · March 4, 2024, 5:36pm

Managed to pull this log attachment

user-2 · March 4, 2024, 5:36pm

What Opensearch version are you using?

user-2 · March 4, 2024, 5:36pm

Support for Openserach 2.x was only added recently, would check if that’s in availaable in v0.12.0

user-3 · March 4, 2024, 5:36pm

<@U05SKM6KGGK> I’m using the Elasticsearch_7.10 version on AWS Opensearch. Do you know if the USE_AWS_ELASTICSEARCH flag should still be set in that case?

Topic		Replies	Views
Troubleshooting datahub-upgrade SystemUpdate Exception and Java 17 Compatibility all-things-deployment	4	47	March 4, 2024
Troubleshooting GMS Crash Due to Elasticsearch Index Not Found Exception getting-started	3	81	March 4, 2024
Troubleshooting missing index creation in Datahub GMS service getting-started	1	117	March 4, 2024
Troubleshooting 'datacatalog-elasticsearch-setup-job' Errors troubleshoot	4	63	May 20, 2024
Troubleshooting ElasticSearch error with Datahub deployment on Kubernetes using Helm troubleshoot	2	73	March 4, 2024

Troubleshooting Elasticsearch Indices Creation Issue in Datahub Deployment on AWS

Related topics