Troubleshooting ElasticsearchSetupJob on AWS OpenSearch and Terraforming Policy Creation

Do you think we will have to bump it to 7.10 ?

ES 6 is definitely too old to run DataHub these days. Currently you can run 7.10+ and OpenSearch 1.x. I am working on also supporting OpenSearch 2.x in this https://github.com/datahub-project/datahub/pull/8852|PR, next release.

Got it. I have upgraded our aws es to use 7.10

and bumped the upgrade and update resources to fix the ooms.

I will keep you guys posted soon once its done.

bumping the resources and updating the es version fixed the issue. Thanks for your help <@UV5UEC3LN> and <@U02TYQ4SPPD>

Just curious regarding the https://github.com/datahub-project/datahub/blob/99d7eb756c09a3313a4c1bda6f96a0953004b58c/metadata-service/restli-servlet-impl/src/main/resources/index/usage-event/aws_es_ism_policy.json|aws_es_ism_policy.json
Is there a way, we can terraform this? Our internal infra relies on terraform to deploy any policies

You’ll have to create the resources with terraform, not just the policy. The template would need to be created too. Then just for the elasticsearch-setup job DATAHUB_ANALYTICS_ENABLED=false but enable it for other pods.

Yeah the resources were created already and I am able to get the es instance provisioned. Now I will add the policy and the index template to it via the terraform as well.

Any specific reason why the flag should be enabled for other pods ?

It would disable tracking events being written to that index and the dashboards would show no data.

Got it. Thank q :+1:

<@U05KVNAL068>, could you share any custom settings you are using in the ElasticSearchSetupJob? I am specifically trying to configure the number of nodes and shards in OpenSearch on AWS.

resources:
{{- toYaml .Values.elasticsearchSetupJob.resources | nindent 12 }}
{{- with .Values.elasticsearchSetupJob.extraSidecars }}
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.elasticsearchSetupJob.nodeSelector }}
nodeSelector:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.elasticsearchSetupJob.affinity }}

Thanks!

<@U03E5L238D8> - The number of nodes in OpenSearch is not controlled by DataHub when deploying on AWS managed OpenSearch, just the shards are controllable via helm https://github.com/acryldata/datahub-helm/blob/master/charts/datahub/values.yaml#L347|here (just making sure these threads are tied up in case someone else comes by)

<@U03E5L238D8> our open search provisioning happens outside of datahub installation via terraform.

Let me know if you want me to share the terraform used by us to create the resource

Thanks <@U03MF8MU5P0> ! I was able to recover from the “999 out of 1000 shards in use” error. My AWS team deployed another node to our single node managed OpenSearch cluster and I was finally able to get the DataHubSystemUpdateJob to run successfully. I then checked OpenSearch and saw about 260 shards. <@U05KVNAL068> , thanks for also following up! I would appreciate the TerraForm script as I could share with my AWS team. Thanks!