Troubleshooting issues with k8s deployment and helm upgrade on eks

Original Slack Thread

Hi Team! We have been having some issues with a k8s deployment on eks on version 0.9.5. I am attempting to upgrade in hopes that it would resolve some of the ingestion problems we are having but hitting a few road blocks with the helm upgrade. Any insight here would be great.

Current Version: v0.9.5
Target Version: v0.9.6.1

Helm debug logs show that the following run successfully:

datahub-kafka-setup-job
datahub-postgresql-setup-job```
It seems to be stuck on:
```datahub-system-update```
Logs from the datahub-system-update pod shows the following errors:
```ERROR 1 --- [ool-10-thread-1] c.l.m.s.elasticsearch.query.ESSearchDAO  : Search query failed```
```ERROR 1 --- [ool-10-thread-1] c.d.authorization.DataHubAuthorizer      : Failed to retrieve policy urns! Skipping updating policy cache until next refresh. start: 0, count: 30

com.datahub.util.exception.ESQueryException: Search query failed```
Followed by a shutdown and then it cycles through those errors again.

The datahub-gms container is showing:
```ERROR c.l.m.d.producer.KafkaHealthChecker:67 - Kafka Health Check Failed.```
but I think its unrelated.
Any insight here would be great.

Are you by any chance using your own helm values?

I was facing this exact issue last week. For me, I made sure to
• helm repo update datahub.
• update my helm values so they include new fields from the v13 update for both prereq and datahub deployments (ex new changes to kafka values: https://github.com/acryldata/datahub-helm/blob/master/charts/prerequisites/values.yaml)
• helm uninstall, reinstall

Or since it looks like you want to deploy an earlier version of Datahub, make sure you are using the correct kafka chart version (https://github.com/acryldata/datahub-helm/blob/master/charts/prerequisites/Chart.yaml)

Yes, we are using our own helm values, and yeah we are doing a small minor update. Hmmm I’ll look compare the kafka chart versions to see if that is the issue. Thank you and I will post my results

<@U067A6FC00N> Looks like that was it. I was using the wrong chart version for my upgrade! Looks like my ingestions pipeline still isn’t working but I can rule out upgrading! Thank you again!