Using External PostgreSQL During DataHub Installation

Original Slack Thread

<@U06TM7M40RK> use external postgresql during installation

To use an external PostgreSQL database during DataHub installation, you need to configure your deployment (whether using Helm on Kubernetes or Docker Compose) to point to your external PostgreSQL instance instead of deploying the default internal one.

Helm/Kubernetes Installation

  1. Disable Internal PostgreSQL:
    In your values.yaml for the prerequisites chart, set the internal PostgreSQL to enabled: false.

  2. Configure External PostgreSQL Connection:
    In your main DataHub values.yaml, under the global.sql.datasource section, provide the connection details for your external PostgreSQL:

    global:
      sql:
        datasource:
          host: "<EXTERNAL_PG_HOST>:5432"
          hostForpostgresqlClient: "<EXTERNAL_PG_HOST>"
          port: "5432"
          url: "jdbc:postgresql://<EXTERNAL_PG_HOST>:5432/datahub?user=<PG_USER>&password=<PG_PASSWORD>&sslmode=require"
          driver: "org.postgresql.Driver"
          username: "<PG_USER>"
          password:
            value: "<PG_PASSWORD>"
    

    Replace <EXTERNAL_PG_HOST>, <PG_USER>, and <PG_PASSWORD> with your actual PostgreSQL host, username, and password.

  3. Apply the Configuration:
    Deploy or upgrade DataHub with your updated values:

    helm upgrade --install datahub datahub/datahub --values values.yaml
    

This will configure DataHub to use your external PostgreSQL instance as its storage backend.

  1. (Optional) PostgreSQL Setup Job:
    If you want DataHub to initialize the schema in your external database, ensure the postgresqlSetupJob is enabled and points to the same connection details. If your database is already initialized, you can disable this job.

References:

Docker Compose Installation

If you are using Docker Compose, you can use the https://github.com/datahub-project/datahub/blob/master/docker/postgres/docker-compose.postgres.yml|docker-compose.postgres.yml as a reference. You will need to:

  • Remove the internal PostgreSQL service from your compose file.
  • Set the environment variables for GMS and other services to point to your external PostgreSQL instance.

References:


Would you like a step-by-step example for a specific deployment method (Helm or Docker Compose), or do you need help with a particular cloud provider or environment?

Sources:

Hi <@U08PVDCPJAG>, how did my answer work out for you? Need any more assistance? You can also tag <@U07FV3UHE4D> to let your fellow community members know you’re still looking for guidance.