Troubleshooting Elasticsearch Indices Creation Issue in Datahub Deployment on AWS

Original Slack Thread

Hello, I moved our POC datahub deployment to AWS-managed services (RDS, Opensearch, MSK)
But I’m getting the following error inside the upgrade job:
org.opensearch.OpenSearchStatusException: OpenSearch exception [type=index_not_found_exception, reason=no such index [datahubpolicyindex_v2]]
I’m running datahub v0.12.0
Any ideas? I’ve seen this error before but in those cases it was resolved by the datahub-update routine, not originating there.

attachment

I also made sure I had USE_AWS_ELASTICSEARCH=true set on the elasticsearch jobattachment

what is your deployment framework for datahub?
have copied your elastic search data to the new deployment (OpenSearch)
if not you may have run the restore indices job once
https://datahubproject.io/docs/how/restore-indices/

with Kubernets
https://datahubproject.io/docs/how/restore-indices/#all-things-deployment

We’re using helm on k8s. Since we are in POC I was just starting things with fresh DBs.

I just tried the restore indices and it does not solve the issue. My understanding was that this restored the content of the indices by republishing MAE events. It doesn’t seem to ensure the indices exist

Yes, restoring indices won’t create indices. you may have to run the elasticsearch-setup, https://hub.docker.com/r/acryldata/datahub-kafka-setup/|kafka-setup and mysql-setup

Those all run successfully but the indices aren’t created. I posted the elasticsearch-setup log above

with Helm deployment enable these respective sections
https://github.com/acryldata/datahub-helm/blob/master/charts/datahub/values.yaml

how about datahub-system-update-job job ?

That is the first error log

<@U0667UL20SD> you have shared datahubUpgrade logs (first one), not datahub-system-update-job
datahub-system-update-job is where elasticsearch indices are created
https://github.com/acryldata/datahub-helm/blob/f04bcdeced701edca2e7fe5bfc5a8095415bab6e/charts/datahub/values.yaml#L351

Sorry, the first log I posted is actually from the system update job. I named it based off of the docker image before I realized that multiple jobs used that image. I just re-ran my deployment and the datahub-system-update-job failed with that same error

Here is the full log. Pretty much the same as last time.

When I /_cat/indices/ on my opensearch cluster only 2 indices are there:

green open .kibana_1                  Eh6BSDbYSbemo6VLfpVaYQ 1 1 0 0 416b 208b
green open datahub_usage_event-000001 aJmlTza_Tr6mOQgDsiXihw 5 1 0 0  2kb  1kb```![attachment](https://files.slack.com/files-pri/TUMKD5EGJ-F06F8S8A2KY/datahub-system-update-job.log?t=xoxe-973659184562-6705490291811-6708051934148-dd1595bd5f63266bc09e6166373c7a3c)

Yeah, somehow indices are not created, checking …

<@U0667UL20SD> It looks like the reason for the actual failure is

java.lang.RuntimeException: java.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-0 [ACTIVE]
	at com.linkedin.metadata.search.elasticsearch.indexbuilder.EntityIndexBuilders.reindexAll(EntityIndexBuilders.java:34)
	at com.linkedin.metadata.search.elasticsearch.ElasticSearchService.configure(ElasticSearchService.java:45)
	at com.linkedin.metadata.search.elasticsearch.ElasticSearchService.reindexAll(ElasticSearchService.java:55)
	at com.linkedin.datahub.upgrade.system.elasticsearch.steps.BuildIndicesStep.lambda$executable$0(BuildIndicesStep.java:36)
	at com.linkedin.datahub.upgrade.impl.DefaultUpgradeManager.executeStepInternal(DefaultUpgradeManager.java:110)
	at com.linkedin.datahub.upgrade.impl.DefaultUpgradeManager.executeInternal(DefaultUpgradeManager.java:68)
	at com.linkedin.datahub.upgrade.impl.DefaultUpgradeManager.executeInternal(DefaultUpgradeManager.java:42)
	at com.linkedin.datahub.upgrade.impl.DefaultUpgradeManager.execute(DefaultUpgradeManager.java:33)
	at com.linkedin.datahub.upgrade.UpgradeCli.run(UpgradeCli.java:80)
	at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:768)
	at org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:752)
	at org.springframework.boot.SpringApplication.run(SpringApplication.java:314)
	at org.springframework.boot.builder.SpringApplicationBuilder.run(SpringApplicationBuilder.java:164)
	at com.linkedin.datahub.upgrade.UpgradeCliApplication.main(UpgradeCliApplication.java:23)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:49)
	at org.springframework.boot.loader.Launcher.launch(Launcher.java:108)
	at org.springframework.boot.loader.Launcher.launch(Launcher.java:58)
	at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:65)
Caused by: java.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-0 [ACTIVE]
	at org.opensearch.client.RestClient.extractAndWrapCause(RestClient.java:936)
	at org.opensearch.client.RestClient.performRequest(RestClient.java:332)
	at org.opensearch.client.RestClient.performRequest(RestClient.java:320)
	at org.opensearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1918)
	at org.opensearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1901)
	at org.opensearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1865)
	at org.opensearch.client.IndicesClient.create(IndicesClient.java:159)
	at com.linkedin.metadata.search.elasticsearch.indexbuilder.ESIndexBuilder.createIndex(ESIndexBuilder.java:495)
	at com.linkedin.metadata.search.elasticsearch.indexbuilder.ESIndexBuilder.buildIndex(ESIndexBuilder.java:191)
	at com.linkedin.metadata.search.elasticsearch.indexbuilder.EntityIndexBuilders.reindexAll(EntityIndexBuilders.java:32)
	... 21 common frames omitted```
it indicates ElasticSearch is not responding to create index requests, you must be having some problem in ES, please take a look at ES logs when you running this deployment

That doesn’t fully make sense since the graph_service_v1 index is successfully created somehow. I’m also able to curl -XPUT &lt;eshost&gt;/test_index from the GMS pod to create a test index

Managed to pull this logattachment

What Opensearch version are you using?

Support for Openserach 2.x was only added recently, would check if that’s in availaable in v0.12.0

<@U05SKM6KGGK> I’m using the Elasticsearch_7.10 version on AWS Opensearch. Do you know if the USE_AWS_ELASTICSEARCH flag should still be set in that case?