Troubleshooting GMS Pods Restarting and Not Reaching "Ready" State

Original Slack Thread


Some strange we have with our Datahub (v0.12.0).
First of all the GMS pods in our OpenShift are restarting even if they have enough resources for they work:

  - resources:
        cpu: '4'
        memory: 16Gi
        cpu: 400m
        memory: 16Gi```
But the main problem is that after the GMS Pods restarts they don't get into the "Ready" state (status is 0/1 in the OpenShift).
In the logs of this GMS Pods I see that the Pod cannot find the *datahub-gms-v0-12-0-hazelcast-svc* service:
```WARNING: []:5701 [dev] [5.3.1] DNS lookup for serviceDns 'datahub-gms-v0-12-0-hazelcast-svc' failed: unknown host```
 But this service exists:
```oc get service | grep datahub-gms-v0-12-0-hazelcast-svc

NAME          TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
datahub-gms-v0-12-0-hazelcast-svc       ClusterIP   None             <none>        5701/TCP                     10d```
I see that the GMS Pods after they restarts have only 2 open ports (in normal state there is also 8080 port opened):
```netstat -tulpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 :::4318                 :::*                    LISTEN      33/java
tcp        0      0 :::5701                 :::*                    LISTEN      33/java```
So, what I can do in this situation?

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Which DataHub version are you using? (e.g. 0.12.0)
  2. Please post any relevant error logs on the thread!

Your cluster is likely missing something like core-dns which provides DNS within your cluster. Please see if you can enable DNS for your cluster as it is used for the pods to communicate with each other.