Updating DataHub Version Using Helm Charts to Avoid Database Schema Breakage

Original Slack Thread

<@U06TM7M40RK> What is the best way of updating the datahub version, with deploying myself with helm charts, without breaking the database schemas du to major version updates ?

To update your DataHub version using Helm charts without breaking the database schemas, you should follow a structured approach. Here are the key steps to ensure a smooth upgrade:

  1. Review Breaking Changes and Deprecations: Before starting the upgrade, review the <https://github.com/datahub-project/datahub/blob/master/docs/how/updating-datahub.md|breaking changes and deprecations> between your current version and the target version. This will help you understand any potential issues and prepare accordingly.

  2. Backup Existing Data: Make sure to backup your existing data, including snapshots of your databases and any other persistent storage used by DataHub. This is crucial in case you need to roll back.

  3. Update Helm Charts: Ensure that you are using the latest Helm charts for both datahub and datahub-prerequisites. According to the <https://datahubproject.io/docs/0.13.1/advanced/no-code-upgrade/#helm|No Code Upgrade guide>, it is crucial to deploy the latest Helm charts to support the newer release.

  4. Deploy Latest Container Images: Pull and deploy the latest container images for DataHub components. This can be done by updating your Helm deployment:

    helm upgrade datahub datahub/ --set image.tag=<new_version>
    
  5. Execute Migration Jobs: The migration job will be handled by the Helm upgrade process itself. Ensure that the migration job completes successfully. You can verify the status of the migration job by checking the logs of the upgrade job:

    kubectl logs job/datahub-upgrade-job
    
  6. Reindex Indices: As part of the upgrade, you might need to reindex your Elasticsearch indices. This is mentioned in the <https://github.com/datahub-project/datahub/blob/master/docs/how/updating-datahub.md|updating DataHub documentation>. Ensure that the reindexing process completes successfully.

  7. Verify Deployment: After the upgrade, verify that all DataHub components are running correctly. Check the logs for any errors and ensure that you can access the DataHub UI and perform basic operations like searching and browsing metadata.

  8. Clean Up: Once you have verified that the upgrade is successful and everything is working as expected, you can clean up any old resources or data that are no longer needed. This might include old indices or database tables.

Would you like more detailed steps or information on any specific part of the upgrade process?

Sources:

Hi <@U074GE887LG>, how did my answer work out for you? Need any more assistance? You can also tag <@U07FV3UHE4D> to let your fellow community members know you’re still looking for guidance.

Hi :slight_smile: ! Did this interaction resolve your issue or do you still require assistance ?

Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look :slight_smile:

Hope this helps ! ><