Troubleshooting ingestion sources retrieval issue in Datahub-GMS

Original Slack Thread

Hey team :wave:
I’m having an issue in datahub-gms.
Failed to retrieve ingestion sources! Skipping updating schedule cache until next refresh. start: 0, count: 30
I was running datahub-gms v0.10.4. This issue was starting in that version but I’ve upgraded to the latest one (just in case) I’m still getting the same issue. This is the full log
Thanks!

datahub-datahub-gms-5f6cf9db66-6bld2 datahub-gms 2023-11-02 14:54:43.134:INFO:oejshC.ROOT:main: 1 Spring WebApplicationInitializers detected on classpath
datahub-datahub-gms-5f6cf9db66-6bld2 datahub-gms 2023-11-02 14:54:43.158:INFO:oejs.session:main: DefaultSessionIdManager workerName=node0
datahub-datahub-gms-5f6cf9db66-6bld2 datahub-gms 2023-11-02 14:54:43.158:INFO:oejs.session:main: No SessionScavenger set, using defaults
datahub-datahub-gms-5f6cf9db66-6bld2 datahub-gms 2023-11-02 14:54:43.160:INFO:oejs.session:main: node0 Scavenging every 600000ms
datahub-datahub-gms-5f6cf9db66-6bld2 datahub-gms 2023-11-02 14:54:43.168:INFO:oejshC.ROOT:main: Initializing Spring root WebApplicationContext
datahub-datahub-gms-5f6cf9db66-6bld2 datahub-gms 2023-11-02 14:55:56,675 [pool-23-thread-1] ERROR c.d.m.ingestion.IngestionScheduler:244 - Failed to retrieve ingestion sources! Skipping updating schedule cache until next refresh. start: 0, count: 30
datahub-datahub-gms-5f6cf9db66-6bld2 datahub-gms com.linkedin.r2.RemoteInvocationException: com.linkedin.r2.RemoteInvocationException: Failed to get response from server for URI <http://localhost:8080/entities>
datahub-datahub-gms-5f6cf9db66-6bld2 datahub-gms 	at com.linkedin.restli.internal.client.ExceptionUtil.wrapThrowable(ExceptionUtil.java:135)
datahub-datahub-gms-5f6cf9db66-6bld2 datahub-gms 	at com.linkedin.restli.internal.client.ResponseFutureImpl.getResponseImpl(ResponseFutureImpl.java:130)
datahub-datahub-gms-5f6cf9db66-6bld2 datahub-gms 	at com.linkedin.restli.internal.client.ResponseFutureImpl.getResponse(ResponseFutureImpl.java:94)
datahub-datahub-gms-5f6cf9db66-6bld2 datahub-gms 	at com.linkedin.common.client.BaseClient.sendClientRequest(BaseClient.java:55)
datahub-datahub-gms-5f6cf9db66-6bld2 datahub-gms 	at com.linkedin.entity.client.RestliEntityClient.list(RestliEntityClient.java:393)
datahub-datahub-gms-5f6cf9db66-6bld2 datahub-gms 	at com.datahub.metadata.ingestion.IngestionScheduler$BatchRefreshSchedulesRunnable.run(IngestionScheduler.java:220)
datahub-datahub-gms-5f6cf9db66-6bld2 datahub-gms 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
datahub-datahub-gms-5f6cf9db66-6bld2 datahub-gms 	at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
datahub-datahub-gms-5f6cf9db66-6bld2 datahub-gms 	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
datahub-datahub-gms-5f6cf9db66-6bld2 datahub-gms 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
datahub-datahub-gms-5f6cf9db66-6bld2 datahub-gms 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
datahub-datahub-gms-5f6cf9db66-6bld2 datahub-gms 	at java.base/java.lang.Thread.run(Thread.java:829)
datahub-datahub-gms-5f6cf9db66-6bld2 datahub-gms Caused by: com.linkedin.r2.RemoteInvocationException: Failed to get response from server for URI <http://localhost:8080/entities>
datahub-datahub-gms-5f6cf9db66-6bld2 datahub-gms 	at com.linkedin.r2.transport.http.common.HttpBridge$1.onResponse(HttpBridge.java:67)
datahub-datahub-gms-5f6cf9db66-6bld2 datahub-gms 	at com.linkedin.r2.transport.http.client.rest.ExecutionCallback.lambda$onResponse$0(ExecutionCallback.java:64)
datahub-datahub-gms-5f6cf9db66-6bld2 datahub-gms 	... 3 common frames omitted
datahub-datahub-gms-5f6cf9db66-6bld2 datahub-gms Caused by: com.linkedin.r2.RetriableRequestException: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:8080```

This is the my current configuration:

  enabled: true
  image:
    repository: linkedin/datahub-gms
    tag: "v0.12.0"
  service:
    type: ClusterIP
  extraVolumes:
    - name: logback-configmap-volume
      configMap:
        name: logback-configmap
  extraVolumeMounts:
    - name: logback-configmap-volume
      mountPath: /config/logback.xml
      subPath: logback.xml
  extraEnvs:
    - name: JAVA_OPTS
      value: "-Dlogback.configurationFile=/config/logback.xml"```

It seems that there are few people reporting similar issues. Did anybody find any clue about this? :pray::skin-tone-2:
https://github.com/datahub-project/datahub/issues/7877
https://github.com/datahub-project/datahub/issues/7287

The workaround is to downgrade Datahub to v0.9.6.1 as shixiutao mentions https://github.com/datahub-project/datahub/issues/7877#issuecomment-1792165537|here. Sad not to be able to use the latest version :cry: