Discussion on Upgrading DataHub from Elasticsearch 7x to Elasticsearch 8

Original Slack Thread

Hello community members,
Repost this question again.
I’m reaching out to inquire if anyone in our knowledgeable community has experience deploying DataHub (v0.10 to v0.12) with Elastic Search 8. We’re currently exploring the possibility of upgrading and would greatly appreciate any insights, tips, or guidance you may have on this front. Additionally, the reason for this inquiry is Elasticsearch 7x will not be supported once Elasticsearch 9 is released. Our organization has a security stance such that all deployed products must be fully supported.
<@U03MF8MU5P0> - able to provide your inputs?

bump this question again

Hi Junjie! <@U05SKM6KGGK> might be able to provide some guidance here

Unfortunately Elasticsearch 8.x is not supported. We use the Opensearch client which claims compatibility with only up to Elasticsearch 7

Then what will be the recommendation for community then? It is very likely that es 7.x will eol soon. Maybe as early as the first quarter of this year.

For now that’s more of a user decision at this point as it depends on your policies. OpenSearch could be used instead, or choose a vendor who can provide longer term support for Elasticsearch 7.x, etc.
On the Datahub side, there’s currently no timeline on Elasticsearch 8+ support afaik. We would need to support OpenSearch as well and doesn’t look like there’s an OpenSearch client compatible with Elasticsearch 8+. Might have to end up having two different clients.

For now that’s more of a user decision at this point as it depends on your policies. > The current Quickstart setup in the Docker folder and Helm chart for Datahub still utilizes Elastic Search. It would be beneficial for Datahub to offer clearer guidance on this matter. New users may not be aware the alternative option such as OpenSearch, and relying on the default configuration provided by Datahub could lead to concerns when they discover that support for their stack is reaching end-of-life soon.

Support for Elasticsearch 8+ is something that we would definitely accept as a community contribution or support at some point. We do not currently have guidance on production deployments of Elasticsearch, that’s hopefully something that will be covered in the future on the topic of deploying Datahub in production. Also fwiw quickstart is recommended for local deployment only, so EOL impact is lesser.

Would suggest creating an issue at https://github.com/datahub-project/datahub/issues requesting Elasticsearch 8+ support so we can better track and gauge interest from the community.

We do not currently have guidance on production deployments of Elasticsearch > the helm chart for datahub prequisites uses ES7. it’s opensearch integration that’s lacking guidance actually.

Just to summarize - right now the guidance is that you can use whatever version of Elasticsearch, MySQL or Kafka that is compatible with the clients that Datahub currently supports. For Elasticsearch, the supported versions are any compatible with Elasticsearch OSS 7 and OpenSearch 1.x and 2.x.

The Elasticsearch versions used in quickstart and prerequisites do NOT mean that’s the Elasticsearch vendor or version that we recommend users to use - these are meant for testing Datahub locally.

We have users using Elastic Cloud, Managed Elasticsearch on AWS, Amazon OpenSearch, etc. Which Elasticsearch vendor to use is your choice/whatever suits your requirements best - it just needs to be compatible the client libraries that Datahub uses.

If you would like us to support some specific Elasticsearch OSS version, please file an issue at https://github.com/datahub-project/datahub/issues