Troubleshooting AWS Load Balancer and Frontend Connectivity Issues in DataHub on EKS with Error Logs

Original Slack Thread

Hi - We have recently got datahub installed (version - 0.12.0) on EKS in AWS and have been running into multiple issues while trying to get AWS-load-balancer up and running and could not connect to the frontend. am attaching the error logs from GMS pod and logs from web UI instance, would appreciate any help to fix these error

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Which DataHub version are you using? (e.g. 0.12.0)
  2. Please post any relevant error logs on the thread!

DataHub Version : 0.12.0

we run into this error while trying to access after front end

logs from GMS — 2024-02-14 14:46:15,044 [pool-17-thread-1] WARN org.opensearch.client.RestClient:85 - request [POST http://elasticsearch-master:9200/datahubpolicyindex_v2/_search?typed_keys=true&max_concurrent_shard_requests=5&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=falsehttp://elasticsearch-master:9200/datahubpolicyindex_v2/_search?typed_keys=true&max_concurrent_shard_requests=5&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=false]|] returned 1 warnings: [299 Elasticsearch-7.17.3-5ad023604c8d7416c9eb6c0eadb62b14e766caff “Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html to enable security.”

logs from frontend instance - 23:34:49 [Thread-8] WARN n.g.e.SimpleDataFetcherExceptionHandler - Exception while fetching data (/corpUser) : java.lang.RuntimeException: Failed to retrieve entities of type CorpUser
java.util.concurrent.CompletionException: java.lang.RuntimeException: Failed to retrieve entities of type CorpUser
at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1592)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Failed to retrieve entities of type CorpUser
at com.linkedin.datahub.graphql.GmsGraphQLEngine.lambda$null$68(GmsGraphQLEngine.java:521)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
… 1 common frames omitted
Caused by: java.lang.RuntimeException: Failed to batch load Datasets
at com.linkedin.datahub.graphql.types.corpuser.CorpUserType.batchLoad(CorpUserType.java:67)
at com.linkedin.datahub.graphql.GmsGraphQLEngine.lambda$null$68(GmsGraphQLEngine.java:519)
… 2 common frames omitted
Caused by: com.linkedin.r2.RemoteInvocationException: com.linkedin.r2.RemoteInvocationException: Received error 401 from server for URI http://datahub-datahub-gms:8080/corpUsers
at com.linkedin.restli.internal.client.ExceptionUtil.wrapThrowable(ExceptionUtil.java:135)
at com.linkedin.restli.internal.client.ResponseFutureImpl.getResponseImpl(ResponseFutureImpl.java:130)
at com.linkedin.restli.internal.client.ResponseFutureImpl.getResponse(ResponseFutureImpl.java:94)
at com.linkedin.restli.internal.client.ResponseFutureImpl.getResponseEntity(ResponseFutureImpl.java:173)
at com.linkedin.BatchGetUtils.batchGet(BatchGetUtils.java:54)
at com.linkedin.identity.client.CorpUsers.batchGet(CorpUsers.java:71)
at com.linkedin.datahub.graphql.types.corpuser.CorpUserType.batchLoad(CorpUserType.java:57)
… 3 common frames omitted
Caused by: com.linkedin.r2.RemoteInvocationException: Received error 401 from server for URI http://datahub-datahub-gms:8080/corpUsers
at com.linkedin.restli.internal.client.ExceptionUtil.exceptionForThrowable(ExceptionUtil.java:98)
at com.linkedin.restli.client.RestLiCallbackAdapter.convertError(RestLiCallbackAdapter.java:66)
at com.linkedin.common.callback.CallbackAdapter.onError(CallbackAdapter.java:86)
at com.linkedin.r2.message.timing.TimingCallback.onError(TimingCallback.java:81)
at com.linkedin.r2.transport.common.bridge.client.TransportCallbackAdapter.onResponse(TransportCallbackAdapter.java:47)
at com.linkedin.r2.filter.transport.FilterChainClient.lambda$createWrappedClientTimingCallback$0(FilterChainClient.java:113)
at com.linkedin.r2.filter.transport.ResponseFilter.onRestError(ResponseFilter.java:79)
at com.linkedin.r2.filter.TimedRestFilter.onRestError(TimedRestFilter.java:92)
at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnError(FilterChainIterator.java:166)
at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnError(FilterChainIterator.java:132)
at com.linkedin.r2.filter.FilterChainIterator.onError(FilterChainIterator.java:101)
at com.linkedin.r2.filter.TimedNextFilter.onError(TimedNextFilter.java:48)
at com.linkedin.r2.filter.message.rest.RestFilter.onRestError(RestFilter.java:84)
at com.linkedin.r2.filter.TimedRestFilter.onRestError(TimedRestFilter.java:92)
at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnError(FilterChainIterator.java:166)
at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnError(FilterChainIterator.java:132)
at com.linkedin.r2.filter.FilterChainIterator.onError(FilterChainIterator.java:101)
at com.linkedin.r2.filter.TimedNextFilter.onError(TimedNextFilter.java:48)
at com.linkedin.r2.filter.message.rest.RestFilter.onRestError(RestFilter.java:84)
at com.linkedin.r2.filter.TimedRestFilter.onRestError(TimedRestFilter.java:92)
at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnError(FilterChainIterator.java:166)
at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnError(FilterChainIterator.java:132)
at com.linkedin.r2.filter.FilterChainIterator.onError(FilterChainIterator.java:101)
at com.linkedin.r2.filter.TimedNextFilter.onError(TimedNextFilter.java:48)
at com.linkedin.r2.filter.message.rest.RestFilter.onRestError(RestFilter.java:84)
at com.linkedin.r2.filter.TimedRestFilter.onRestError(TimedRestFilter.java:92)
at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnError(FilterChainIterator.java:166)
at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnError(FilterChainIterator.java:132)
at com.linkedin.r2.filter.FilterChainIterator.onError(FilterChainIterator.java:101)
at com.linkedin.r2.filter.TimedNextFilter.onError(TimedNextFilter.java:48)
at com.linkedin.r2.filter.transport.ClientRequestFilter.lambda$createCallback$0(ClientRequestFilter.java:102)
at com.linkedin.r2.transport.http.common.HttpBridge$1.onResponse(HttpBridge.java:82)
at com.linkedin.r2.transport.http.client.rest.ExecutionCallback.lambda$onResponse$0(ExecutionCallback.java:64)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
… 1 common frames omitted
Caused by: com.linkedin.r2.message.rest.RestException: Received error 401 from server for URI http://datahub-datahub-gms:8080/corpUsers
at com.linkedin.r2.transport.http.common.HttpBridge$1.onResponse(HttpBridge.java:76)
… 4 common frames omitted23:35:57 [Thread-37] WARN n.g.e.SimpleDataFetcherExceptionHandler - Exception while fetching data (/search) : java.lang.RuntimeException: Failed to execute search: entity
type DATASET, query *, filters: , start: 0, count: 20
java.util.concurrent.CompletionException: java.lang.RuntimeException: Failed to execute search: entity type DATASET, query *, filters: , start: 0, count: 20
at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1592)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Failed to execute search: entity type DATASET, query *, filters: , start: 0, count: 20
at com.linkedin.datahub.graphql.resolvers.search.SearchResolver.lambda$get$1(SearchResolver.java:62)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
… 1 common frames omitted
Caused by: com.linkedin.r2.RemoteInvocationException: com.linkedin.r2.RemoteInvocationException: Received error 401 from server for URI http://datahub-datahub-gms:8080/datasets
at com.linkedin.restli.internal.client.ExceptionUtil.wrapThrowable(ExceptionUtil.java:135)
at com.linkedin.restli.internal.client.ResponseFutureImpl.getResponseImpl(ResponseFutureImpl.java:130)
at com.linkedin.restli.internal.client.ResponseFutureImpl.getResponse(ResponseFutureImpl.java:94)
at com.linkedin.dataset.client.Datasets.search(Datasets.java:100)
at com.linkedin.dataset.client.Datasets.search(Datasets.java:85)
at com.linkedin.datahub.graphql.types.dataset.DatasetType.search(DatasetType.java:103)
at com.linkedin.datahub.graphql.resolvers.search.SearchResolver.lambda$get$1(SearchResolver.java:53)
… 2 common frames omitted
Caused by: com.linkedin.r2.RemoteInvocationException: Received error 401 from server for URI http://datahub-datahub-gms:8080/datasets
at com.linkedin.restli.internal.client.ExceptionUtil.exceptionForThrowable(ExceptionUtil.java:98)
at com.linkedin.restli.client.RestLiCallbackAdapter.convertError(RestLiCallbackAdapter.java:66)
at com.linkedin.common.callback.CallbackAdapter.onError(CallbackAdapter.java:86)
at com.linkedin.r2.message.timing.TimingCallback.onError(TimingCallback.java:81)
at com.linkedin.r2.transport.common.bridge.client.TransportCallbackAdapter.onResponse(TransportCallbackAdapter.java:47)
at com.linkedin.r2.filter.transport.FilterChainClient.lambda$createWrappedClientTimingCallback$0(FilterChainClient.java:113)
at com.linkedin.r2.filter.transport.ResponseFilter.onRestError(ResponseFilter.java:79)
at com.linkedin.r2.filter.TimedRestFilter.onRestError(TimedRestFilter.java:92)
at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnError(FilterChainIterator.java:166)
at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnError(FilterChainIterator.java:132)
at com.linkedin.r2.filter.FilterChainIterator.onError(FilterChainIterator.java:101)
at com.linkedin.r2.filter.TimedNextFilter.onError(TimedNextFilter.java:48)
at com.linkedin.r2.filter.message.rest.RestFilter.onRestError(RestFilter.java:84)
at com.linkedin.r2.filter.TimedRestFilter.onRestError(TimedRestFilter.java:92)
at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnError(FilterChainIterator.java:166)
at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnError(FilterChainIterator.java:132)
at com.linkedin.r2.filter.FilterChainIterator.onError(FilterChainIterator.java:101)
at com.linkedin.r2.filter.TimedNextFilter.onError(TimedNextFilter.java:48)
at com.linkedin.r2.filter.message.rest.RestFilter.onRestError(RestFilter.java:84)
at com.linkedin.r2.filter.TimedRestFilter.onRestError(TimedRestFilter.java:92)
at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnError(FilterChainIterator.java:166)
at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnError(FilterChainIterator.java:132)
at com.linkedin.r2.filter.FilterChainIterator.onError(FilterChainIterator.java:101)
at com.linkedin.r2.filter.TimedNextFilter.onError(TimedNextFilter.java:48)
at com.linkedin.r2.filter.message.rest.RestFilter.onRestError(RestFilter.java:84)
at com.linkedin.r2.filter.TimedRestFilter.onRestError(TimedRestFilter.java:92)
at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnError(FilterChainIterator.java:166)
at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnError(FilterChainIterator.java:132)
at com.linkedin.r2.filter.FilterChainIterator.onError(FilterChainIterator.java:101)
at com.linkedin.r2.filter.TimedNextFilter.onError(TimedNextFilter.java:48)
at com.linkedin.r2.filter.transport.ClientRequestFilter.lambda$createCallback$0(ClientRequestFilter.java:102)
at com.linkedin.r2.transport.http.common.HttpBridge$1.onResponse(HttpBridge.java:82)
at com.linkedin.r2.transport.http.client.rest.ExecutionCallback.lambda$onResponse$0(ExecutionCallback.java:64)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
… 1 common frames omitted
Caused by: com.linkedin.r2.message.rest.RestException: Received error 401 from server for URI http://datahub-datahub-gms:8080/datasets
at com.linkedin.r2.transport.http.common.HttpBridge$1.onResponse(HttpBridge.java:76)
… 4 common frames omitted

so in a nutshell, load balancer got deployed however we could not access the front end due the “backend service does not exist error”

Looks like it might be unrelated to load balancer:

> Received error 401 from server for URI
The token that the frontend is using to communicate with GMS is not being accepted by GMS

thank you <@U05SKM6KGGK>, will take a look, any additional insights would help too

Check if there were any changes to the frontend deployment, like if DATAHUB_SECRET changed

these two as well: https://github.com/acryldata/datahub-helm/blob/ecb168d9255435476311c6d6808bf214f587e72c/charts/datahub/subcharts/datahub-frontend/templates/deployment.yaml#L196-L202|https://github.com/acryldata/datahub-helm/blob/ecb168d9255435476311c6d6808bf214f58[…]ts/datahub/subcharts/datahub-frontend/templates/deployment.yaml

thanks, i will check these out