Repopulating Kafka Data and Schema in DataHub After VPC Changes

Original Slack Thread

Hello Team, we do have datahub deployed in AWS and, after some VPC changes, we have lost all kafka data and all the schemas in the schema registry.

We do have confluent schema registry (set as type=Kafka in values.yml) and, as you may know, it stores the schemas in the _schema topic in kafka. That’s why we have lost all the schemas as well.

The data we can’t recover are the schemas. Any idea how to trigger datahub to populate again the schemas?

Datahub version is 0.11


Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Which DataHub version are you using? (e.g. 0.12.0)
  2. Please post any relevant error logs on the thread!

You can reinstall the same version of DataHub using the same helm chart you used to install this version of DataHub. This will preserve all components and will run the kafka-setup⁣ job which will recreate the topics. As messages are produced to the recreated topics, the schema registry will re-populate the _schema⁣ topic. The key point is to make sure the same helm chart and values are used. You can find the values used and chart version information by inspecting the helm release crds in your k8 namespace or using the helm cli. Refer helm|documentaion