Upgrading from 0.9.x to 0.10.x and troubleshooting Elasticsearch compatibility

Original Slack Thread

Has anyone here successfully upgraded from 0.9.x to 0.10.x?

There have definitely been some successful upgrades in the community. Are you running into any particular issue?

I tried upgrading from 0.9.6 to 0.10.4. I’m using ECS for the deploy so i don’t have the luxury of the helm setup.
I tried the following:
• Update the upgrade container to 0.10.4
• Shut down gms temporarily
• Run the systemUpdate job via the upgrade container, which ran successfully
• then i spun up the new version of gms 0.10.4, similar case for frontend
• I started seeing errors similar to this https://datahubspace.slack.com/archives/C029A3M079U/p1680091806766439|user . Something to do with the policy index being malformed, or not the structure expected by gms.

should i be stopping the old version of gms, while i’m running the systemUpdate job? After it is done, i launch the 0.10.4 version?

Just tried to do the upgrade steps via local docker compose setup and got the same issue:

com.datahub.util.exception.ESQueryException: Search query failed:
        at com.linkedin.metadata.search.elasticsearch.query.ESSearchDAO.executeAndExtract(ESSearchDAO.java:99)
        at com.linkedin.metadata.search.elasticsearch.query.ESSearchDAO.search(ESSearchDAO.java:202)
        at com.linkedin.metadata.search.elasticsearch.ElasticSearchService.search(ElasticSearchService.java:123)
        at com.linkedin.metadata.search.elasticsearch.ElasticSearchService.search(ElasticSearchService.java:113)
        at com.linkedin.metadata.client.JavaEntityClient.search(JavaEntityClient.java:304)
        at com.datahub.authorization.PolicyFetcher.fetchPolicies(PolicyFetcher.java:51)
        at com.datahub.authorization.PolicyFetcher.fetchPolicies(PolicyFetcher.java:43)
        at com.datahub.authorization.DataHubAuthorizer$PolicyRefreshRunnable.run(DataHubAuthorizer.java:223)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=x_content_parse_exception, reason=[1:938] [bool] failed to parse field [must]]
        at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:187)
        at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1911)
        at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:1888)
        at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1645)
        at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1602)
        at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1572)
        at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:1088)
        at com.linkedin.metadata.search.elasticsearch.query.ESSearchDAO.executeAndExtract(ESSearchDAO.java:92)
        ... 13 common frames omitted
        Suppressed: org.elasticsearch.client.ResponseException: method [POST], host [<http://elasticsearch:9200>], URI [/datahubpolicyindex_v2/_search?typed_keys=true&amp;max_concurrent_shard_requests=5&amp;ignore_unavailable=false&amp;expand_wildcards=open&amp;allow_no_indices=true&amp;ignore_throttled=true&amp;search_type=query_then_fetch&amp;batched_reduce_size=512&amp;ccs_minimize_roundtrips=true], status line [HTTP/1.1 400 Bad Request]
{"error":{"root_cause":[{"type":"parsing_exception","reason":"[match_phrase_prefix] query does not support [zero_terms_query]","line":1,"col":938}],"type":"x_content_parse_exception","reason":"[1:938] [bool] failed to parse field [must]","caused_by":{"type":"x_content_parse_exception","reason":"[1:938] [bool] failed to parse field [should]","caused_by":{"type":"x_content_parse_exception","reason":"[1:938] [bool] failed to parse field [should]","caused_by":{"type":"parsing_exception","reason":"[match_phrase_prefix] query does not support [zero_terms_query]","line":1,"col":938}}}},"status":400}```
ElasticSearch version is `7.9.3`

Unfortunately 10.1 + is incompatible with ElasticSearch 7.9, you’ll need to upgrade your ES Backend to 7.10 to upgrade: https://github.com/datahub-project/datahub/releases/tag/v0.10.1

Ah i see, i was using 7.9.3 for my local setup but in production we are on 7.17.5

i’m guessing 7.17.5 is not supported? Trying 7.10.1 locally seems to work, thanks.

Just as an update here, in production, we were using a much newer version of elastic search 7.17.5 which was working fine until we switched to the 0.10.x release. Although we weren’t getting the exact same error as above in production, it was still a 400 type error when fetching the policy index. Perhaps some features that are used in 10.x were deprecated in elastic search 7.17.5. I ended up switching to 7.10.1 and the upgrade worked okay — i did have to do a full reindex though, in my case.

Hmm, we definitely have deployments working with 7.17 so that’s interesting :thinking_face: what was the exact error there?