Good morning everyone,
we are migrating from 0.10.5
to 0.12
and we are facing some issues with the re-indexing of ElasticSearch. We migrated successfully in all environments, except one. The dataprocessinstanceindex_v2
seems to have almost 93 million entities. We tried several runs of the upgrade job and it fails at different points of reindexing. We saw it fail at 80%, 60%, 5%, 27%, 38%, etc… The job will restart from the beginning after each failure.
From the logs I cannot find anything that explains the failure.
"2024-01-28 02:04:53,526 [main] INFO c.l.d.u.impl.DefaultUpgradeReport:16 - Exiting upgrade SystemUpdate with failure."
"2024-01-28 02:04:53,526 [main] INFO c.l.d.u.impl.DefaultUpgradeReport:16 - Failed Step 2/6: BuildIndicesStep. Failed after 0 retries."
"java.lang.RuntimeException: java.lang.RuntimeException: Reindex from dataprocessinstanceindex_v2 to dataprocessinstanceindex_v2_1706378590787 failed. Document count 92367364 != 28309669\n\tat com.linkedin.metadata.search.elasticsearch.indexbuilder.ESIndexBuilder.buildIndex(ESIndexBuilder.java:227)\n\tat com.linkedin.metadata.search.elasticsearch.indexbuilder.EntityIndexBuilders.reindexAll(EntityIndexBuilders.java:32)\n\tat com.linkedin.metadata.search.elasticsearch.ElasticSearchService.configure(ElasticSearchService.java:45)\n\tat com.linkedin.metadata.search.elasticsearch.ElasticSearchService.reindexAll(ElasticSearchService.java:55)\n\tat com.linkedin.datahub.upgrade.system.elasticsearch.steps.BuildIndicesStep.lambda$executable$0(BuildIndicesStep.java:36)\n\tat com.linkedin.datahub.upgrade.impl.DefaultUpgradeManager.executeStepInternal(DefaultUpgradeManager.java:110)\n\tat com.linkedin.datahub.upgrade.impl.DefaultUpgradeManager.executeInternal(DefaultUpgradeManager.java:68)\n\tat com.linkedin.datahub.upgrade.impl.DefaultUpgradeManager.executeInternal(DefaultUpgradeManager.java:42)\n\tat com.linkedin.datahub.upgrade.impl.DefaultUpgradeManager.execute(DefaultUpgradeManager.java:33)\n\tat com.linkedin.datahub.upgrade.UpgradeCli.run(UpgradeCli.java:80)\n\tat org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:768)\n\tat org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:752)\n\tat org.springframework.boot.SpringApplication.run(SpringApplication.java:314)\n\tat org.springframework.boot.builder.SpringApplicationBuilder.run(SpringApplicationBuilder.java:164)\n\tat com.linkedin.datahub.upgrade.UpgradeCliApplication.main(UpgradeCliApplication.java:23)\n\tat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.base/java.lang.reflect.Method.invoke(Method.java:566)\n\tat org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:49)\n\tat org.springframework.boot.loader.Launcher.launch(Launcher.java:108)\n\tat org.springframework.boot.loader.Launcher.launch(Launcher.java:58)\n\tat org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:65)\nCaused by: java.lang.RuntimeException: Reindex from dataprocessinstanceindex_v2 to dataprocessinstanceindex_v2_1706378590787 failed. Document count 92367364 != 28309669\n\tat com.linkedin.metadata.search.elasticsearch.indexbuilder.ESIndexBuilder.reindex(ESIndexBuilder.java:352)\n\tat com.linkedin.metadata.search.elasticsearch.indexbuilder.ESIndexBuilder.buildIndex(ESIndexBuilder.java:225)\n\t... 22 common frames omitted"
"2024-01-28 02:04:53,521 [main] ERROR c.l.d.u.s.e.steps.BuildIndicesStep:39 - BuildIndicesStep failed."
"WARN [RestClient] request [DELETE <http://gott-datahub-deps:9200/dataprocessinstanceindex_v2_1706378590787?master_timeout=30s&ignore_unavailable=false&expand_wildcards=open%2Cclosed&allow_no_indices=true&ignore_throttled=false&timeout=30s>] returned 1 warnings: [299 Elasticsearch-7.17.6-f65e9d338dc1d07b642e14a27f338990148ee5b6 \"[ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices.\"]"
"[2024-01-28T02:04:51,428][INFO ][o.e.c.m.MetadataDeleteIndexService] [gott-datahub-deps-68bb56544-7p86j] [dataprocessinstanceindex_v2_1706378590787/_NxHG_PpQRSRGPAHFjJ3cw] deleting index"
"2024-01-28 02:04:51,372 [main] ERROR c.l.m.s.e.i.ESIndexBuilder:357 - Failed to reindex dataprocessinstanceindex_v2 to dataprocessinstanceindex_v2_1706378590787: Exception java.lang.RuntimeException: Reindex from dataprocessinstanceindex_v2 to dataprocessinstanceindex_v2_1706378590787 failed. Document count 92367364 != 28309669"
"2024-01-28 02:04:51,371 [main] ERROR c.l.m.s.e.i.ESIndexBuilder:349 - Index: dataprocessinstanceindex_v2 - Post-reindex document count is different, source_doc_count: 92367364 reindex_doc_count: 28309669"
"2024-01-28 02:03:51,369 [main] WARN c.l.m.s.e.i.ESIndexBuilder:321 - Task: RQTILb-VQ_OIK4SK5SE89g:1686 - Document counts do not match 92367364 != 28309669. Complete: 30.648994%"
"WARN [RestClient] request [POST <http://gott-datahub-deps:9200/dataprocessinstanceindex_v2_1706378590787/_count?ignore_throttled=true&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true>] returned 1 warnings: [299 Elasticsearch-7.17.6-f65e9d338dc1d07b642e14a27f338990148ee5b6 \"[ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices.\"]"
"WARN [RestClient] request [POST <http://gott-datahub-deps:9200/dataprocessinstanceindex_v2/_count?ignore_throttled=true&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true>] returned 1 warnings: [299 Elasticsearch-7.17.6-f65e9d338dc1d07b642e14a27f338990148ee5b6 \"[ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices.\"]"
"WARN [RestClient] request [POST <http://gott-datahub-deps:9200/datahubpolicyindex_v2/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true>] returned 1 warnings: [299 Elasticsearch-7.17.6-f65e9d338dc1d07b642e14a27f338990148ee5b6 \"[ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices.\"]"
"WARN [RestClient] request [POST <http://gott-datahub-deps:9200/dataprocessinstanceindex_v2_1706378590787/_count?ignore_throttled=true&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true>] returned 1 warnings: [299 Elasticsearch-7.17.6-f65e9d338dc1d07b642e14a27f338990148ee5b6 \"[ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices.\"]"
"WARN [RestClient] request [POST <http://gott-datahub-deps:9200/dataprocessinstanceindex_v2/_count?ignore_throttled=true&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true>] returned 1 warnings: [299 Elasticsearch-7.17.6-f65e9d338dc1d07b642e14a27f338990148ee5b6 \"[ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices.\"]"
"WARN [RestClient] request [POST <http://gott-datahub-deps:9200/dataprocessinstanceindex_v2_1706378590787/_count?ignore_throttled=true&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true>] returned 1 warnings: [299 Elasticsearch-7.17.6-f65e9d338dc1d07b642e14a27f338990148ee5b6 \"[ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices.\"]"
"WARN [RestClient] request [POST <http://gott-datahub-deps:9200/dataprocessinstanceindex_v2/_count?ignore_throttled=true&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true>] returned 1 warnings: [299 Elasticsearch-7.17.6-f65e9d338dc1d07b642e14a27f338990148ee5b6 \"[ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices.\"]"
"2024-01-28 02:02:51,341 [main] INFO c.l.m.s.e.i.ESIndexBuilder:307 - Task: RQTILb-VQ_OIK4SK5SE89g:1686 - Reindexing from dataprocessinstanceindex_v2 to dataprocessinstanceindex_v2_1706378590787 in progress..."
"2024-01-28 02:01:51,340 [main] WARN c.l.m.s.e.i.ESIndexBuilder:321 - Task: RQTILb-VQ_OIK4SK5SE89g:1686 - Document counts do not match 92367364 != 28196785. Complete: 30.526785%"
"WARN [RestClient] request [POST <http://gott-datahub-deps:9200/dataprocessinstanceindex_v2_1706378590787/_count?ignore_throttled=true&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true>] returned 1 warnings: [299 Elasticsearch-7.17.6-f65e9d338dc1d07b642e14a27f338990148ee5b6 \"[ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices.\"]"
"WARN [RestClient] request [POST <http://gott-datahub-deps:9200/dataprocessinstanceindex_v2/_count?ignore_throttled=true&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true>] returned 1 warnings: [299 Elasticsearch-7.17.6-f65e9d338dc1d07b642e14a27f338990148ee5b6 \"[ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices.\"]"
"WARN [RestClient] request [POST <http://gott-datahub-deps:9200/datahubpolicyindex_v2/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true>] returned 1 warnings: [299 Elasticsearch-7.17.6-f65e9d338dc1d07b642e14a27f338990148ee5b6 \"[ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices.\"]"```
Can some more familiar with re-index process give any idea of what might be happening?