Do I Need the Kafka Schema Registry Component for DataHub Ingestion on Kubernetes with Strimzi Operator?

Original Slack Thread

Is the kafka schema registry component strictly required? It seems as though I’m able to successfully ingest from multiple sources (glue, redshift) using the datahub-rest sink without it. I still get tons of logs within datahub-gms related to avro deserialization failing. I’m deploying on kubernetes using the Strimzi operator to define kafka. Strimzi has no official schema registry support, so if I can avoid installing something like the confluent schema registry then I will.

similar issue, interested in the solution to your problem

<@U04UKA5L5LK> might be able to speak to this!

Hi, we do use the schema registry for the setup step e.g. creating the elasticsearch indices. This is also used in the upgrade step if the schema changes. So, I think you may not need it for product usage, but it is used during the setup / deployment step.

Got it - thanks for the response! The Helm charts reference INTERNAL , KAFKA and AWS_GLUE registry options - is it possible to select INTERNAL or AWS_GLUE to specifically avoid the confluent schema registry? What is the difference here?

In the last two releases, there are issues with INTERNAL and AWS_GLUE options. But, AWS Glue support will be fixed in the next release!

That’s another thing, I’m on chart release 0.2.179 - is that version affected by these issues? datahub-gms version in that chart is 0.2.150

Yes, that version is affected

I think this one should work: https://github.com/acryldata/datahub-helm/commit/f5d8134b0e978d30f9d30437b771af29f9cd1e7c

I just want to call out that this would be v0.10.3 of datahub and we are on v0.10.5 now. So you would not have the features from the last two releases

Gotcha. To clarify - if the INTERNAL and AWS_GLUE registries could be used, then datahub would not be using the confluent (or other kafka) schema registry component?

That’s right!

Thanks! I see that chart version https://github.com/acryldata/datahub-helm/releases/tag/datahub-0.2.186|0.2.186 was released, but I can’t seem to find a changelog. Any chance you could link me to one?

This is the change: https://github.com/acryldata/datahub-helm/pull/366 it is to set the application version in the chart to 0.11.0 , and here are the release notes for this release: https://github.com/datahub-project/datahub/releases

Specifically you may be interested in the “Important Bug Fixes” section:
• Glue Schema Registry fixed
:slightly_smiling_face:

Excellent, thank you!