Hi all,
I’m new to DataHub and spent the last week stuck on a problem that started after upgrading to 0.12.0 using ArgoCD. I’m using all defaults for the config other than we had to set replicas to 2 to stop the ES cluster failing to reach quorum. At the moment, datahub-gms is unhealthy and I can’t figure out why. I’ll post some logs in a thread under this. Many thanks in advance!
Hey there! Make sure your message includes the following information if relevant, so we can help more effectively!
-
Which DataHub version are you using? (e.g. 0.12.0)
-
Please post any relevant error logs on the thread!
Warnings: [Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See <https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html> to enable security., [ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices.]
{"error":{"root_cause":[{"type":"query_shard_exception","reason":"[simple_query_string] analyzer [query_word_delimited] not found","index_uuid":"O6R7NsmaQjCFNZSUGrmkTg","index":"datahubpolicyindex_v2"}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"datahubpolicyindex_v2","node":"-8stBPQaRGWJ5ZkBIwW4yA","reason":{"type":"query_shard_exception","reason":"[simple_query_string] analyzer [query_word_delimited] not found","index_uuid":"O6R7NsmaQjCFNZSUGrmkTg","index":"datahubpolicyindex_v2"}}]},"status":400}
at org.opensearch.client.RestClient.convertResponse(RestClient.java:375)
at org.opensearch.client.RestClient.performRequest(RestClient.java:345)
at org.opensearch.client.RestClient.performRequest(RestClient.java:320)
at org.opensearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1918)
... 17 common frames omitted
2023-11-15 21:58:20,443 [pool-8-thread-1] ERROR c.d.authorization.DataHubAuthorizer:252 - Failed to retrieve policy urns! Skipping updating policy cache until next refresh. start: 0, count: 30
com.datahub.util.exception.ESQueryException: Search query failed:
at com.linkedin.metadata.search.elasticsearch.query.ESSearchDAO.executeAndExtract(ESSearchDAO.java:106)
at com.linkedin.metadata.search.elasticsearch.query.ESSearchDAO.search(ESSearchDAO.java:203)
at com.linkedin.metadata.search.elasticsearch.ElasticSearchService.search(ElasticSearchService.java:121)
at com.linkedin.metadata.search.elasticsearch.ElasticSearchService.search(ElasticSearchService.java:112)
at com.linkedin.metadata.client.JavaEntityClient.search(JavaEntityClient.java:336)
at com.datahub.authorization.PolicyFetcher.fetchPolicies(PolicyFetcher.java:51)
at com.datahub.authorization.PolicyFetcher.fetchPolicies(PolicyFetcher.java:43)
at com.datahub.authorization.DataHubAuthorizer$PolicyRefreshRunnable.run(DataHubAuthorizer.java:245)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.opensearch.OpenSearchStatusException: OpenSearch exception [type=search_phase_execution_exception, reason=all shards failed]
at org.opensearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:209)
at org.opensearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:2235)
at org.opensearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:2212)
at org.opensearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1931)
at org.opensearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1884)
at org.opensearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1852)
at org.opensearch.client.RestHighLevelClient.search(RestHighLevelClient.java:1095)
at com.linkedin.metadata.search.elasticsearch.query.ESSearchDAO.executeAndExtract(ESSearchDAO.java:99)
... 13 common frames omitted
Suppressed: org.opensearch.client.ResponseException: method [POST], host [<http://elasticsearch-master:9200>], URI [/datahubpolicyindex_v2/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true], status line [HTTP/1.1 400 Bad Request]
Warnings: [Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See <https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html> to enable security., [ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices.]
{"error":{"root_cause":[{"type":"query_shard_exception","reason":"[simple_query_string] analyzer [query_word_delimited] not found","index_uuid":"O6R7NsmaQjCFNZSUGrmkTg","index":"datahubpolicyindex_v2"}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"datahubpolicyindex_v2","node":"-8stBPQaRGWJ5ZkBIwW4yA","reason":{"type":"query_shard_exception","reason":"[simple_query_string] analyzer [query_word_delimited] not found","index_uuid":"O6R7NsmaQjCFNZSUGrmkTg","index":"datahubpolicyindex_v2"}}]},"status":400}
at org.opensearch.client.RestClient.convertResponse(RestClient.java:375)
at org.opensearch.client.RestClient.performRequest(RestClient.java:345)
at org.opensearch.client.RestClient.performRequest(RestClient.java:320)
at org.opensearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1918)
... 17 common frames omitted
2023-11-15 21:58:56,101 [R2 Nio Event Loop-1-1] WARN c.l.r.t.h.c.c.ChannelPoolLifecycle:139 - Failed to create channel, remote=localhost/127.0.0.1:8080
io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:8080
Caused by: java.net.ConnectException: Connection refused
at java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:777)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:337)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:776)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at java.base/java.lang.Thread.run(Thread.java:829)```
Did the datahub system update job succeed for you when you synced in argo? Failed to create channel...
is the log line that stands out to me, and if you search it in the slack it’s mostly issues relating to the system upgrade not succeeding.
Thanks for your assistance <@U05JJ9WESHL>. I did not know about the datahub-upgrade
job https://datahubproject.io/docs/docker/datahub-upgrade
I can’t, however, see any such job at all in ArgoCD. I can see it in the helm charts though. What am I missing ?
Is it enabled in your values file? Here’s what my entry looks like in values.yaml:
enabled: true
image:
repository: acryldata/datahub-upgrade
podSecurityContext: {}
securityContext: {}
annotations:
<http://helm.sh/hook|helm.sh/hook>: pre-install,pre-upgrade
<http://helm.sh/hook-weight|helm.sh/hook-weight>: "-4"
<http://helm.sh/hook-delete-policy|helm.sh/hook-delete-policy>: before-hook-creation
podAnnotations: {}
resources:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 300m
memory: 256Mi
extraSidecars: []
extraInitContainers: []```
And <datahub-helm/charts/datahub/templates/datahub-upgrade/datahub-system-update-job.yml at 6311fce06c11ce21c5c3edabf68b71fff88c027c · acryldata/datahub-helm · GitHub template> also requires it to be enabled in globals:
...
datahub:
systemUpdate:
enabled: true
...```
Thanks <@U05JJ9WESHL> I see those settings in the values file in version 0.3.10 of the chart, and then in our local values file:
enabled: true
image:
repository: acryldata/datahub-upgrade
tag: "v0.12.0"
noCodeDataMigration:
sqlDbType: "MYSQL"```
and no overrides in our local `global` section
I just tried syncing again and yeah there’s no system update or upgrade job seen.
Do the other jobs run? I have five: system-update, elasticsearch setup, kafka setup, nocode migration, postgres setup.
Oh hangon I’m blind. It’s called datahub-datahub-upgrade-job
It says it’s healthy
Mind you there don’t seem to be any logs for it since April