Configuring External Kafka and PostgreSQL with Helm Chart via Custom Values File

Original Slack Thread

Hey team (if I should ask this somewhere else, please let me know),
Is it possible to bring your own external kafka and postgresql and link it up with the helm chart (instead of using the prerequisite charts)? If so can someone point me to the docs on how I would configure the values files for this??

NAME            NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                           APP VERSION
datahub         default         9               2024-01-05 16:17:51.626536499 +0000 UTC deployed        datahub-0.3.23                  0.12.1     
prerequisites   default         1               2024-01-03 19:54:53.274859431 +0000 UTC deployed        datahub-prerequisites-0.1.6```

you can disable Kafka and PostgreSQL setup module in datahub prerequisites helm chart value.yaml by setting value enabled: false
https://github.com/acryldata/datahub-helm/blob/master/charts/prerequisites/values.yaml

for GMS and Frontend deployment setup env variable for KAFKA_BOOTSTRAP_SERVER, KAFKA_SCHEMAREGISTRY_URL
and EBEAN_DATASOURCE_HOST, EBEAN_DATASOURCE_URL, EBEAN_DATASOURCE_DRIVER for you installation of Kafka and PostgreSQL

Are the prerequisites considered production ready/capable? We are brining external postgresql and kafka, but are thinking about using the prerequisite ElasticSearch. Is that safe to use in a production workload?

Yes, default values are just templates, you have the option to change the default configuration to specify the resource capacity, and customization you need by specifying in the resource block

  enabled: true   # set this to false, if you want to provide your own ES instance.

  replicas: 1
  minimumMasterNodes: 1
  antiAffinity: "soft"
  clusterHealthCheckParams: "wait_for_status=yellow&timeout=1s"

  esJavaOpts: "-Xmx512m -Xms512m"
  resources:
    requests:
      cpu: "100m"
      memory: "1024M"
    limits:
      cpu: "1000m"
      memory: "1024M"```

I would not consider the prerequisites production ready. It is a way to deploy to k8 but the underlying datastores are not redundant or meant for production use. Typical production setups rely on production ready data stores instead of the prerequisites.

So you don’t recommend the prerequisite ElsaticSearch chart (even if configured using the prerequisite values file for helm) safe for a production workload because of storage reliability concerns?

It definitely depends on your specific requirements for sure. I think most production environments are concerned with performance and reliability. From the performance side, large scale deployments typically involve multiple data instances and master instances. From a reliability perspective multiple replicas are used and there are typically backup requirements to ensure data can be restored in case of a disaster or the cluster is lost. It is possible to configure the helm charts for Elasticsearch to handle all these concerns in your environment but I think a lot of folks use services like OpenSearch on AWS or Elastic Cloud to provide this infrastructure in production environments. Alternatively some organizations have teams to run and maintain their own infrastructure that meet the requirements.