Connecting DataHub to Multiple OpenSearch Nodes: Configuration Guide

Original Slack Thread

I am using datahub 2.11.0. It has multiple opensearch hosts. I want to connect to all the nodes I have in datahub.

Hey there! :wave:

<@U06TM7M40RK> Can you help with the questions in this thread?

:information_source: Please continue to tag RunLLM (e.g <@U06TM7M40RK>) to have it continue generating responses!

To connect DataHub to multiple OpenSearch nodes, you need to configure the connection settings in your values.yaml file or equivalent configuration file used for deploying DataHub. Here are some general steps you can follow:

  1. Update OpenSearch Configuration: In your values.yaml file, you should specify the OpenSearch hosts. If you have multiple nodes, you can list them under the elasticsearch configuration. However, the data sources provided do not explicitly mention how to configure multiple hosts directly. Typically, you would specify a single endpoint that acts as a load balancer or a primary node that can route requests to other nodes.

  2. Use a Load Balancer: If OpenSearch is set up with a load balancer, you can point DataHub to the load balancer’s endpoint. This way, the load balancer will distribute the requests across the available OpenSearch nodes.

  3. Environment Variables: Ensure that any necessary environment variables are set to enable communication with OpenSearch. For example, if using AWS OpenSearch, you might need to set OPENSEARCH_USE_AWS_IAM_AUTH=true as mentioned in the <Deploying to AWS | DataHub deployment guide>.

  4. Verify Configuration: After updating the configuration, deploy DataHub and verify that it can connect to OpenSearch. You can check the logs to ensure there are no connection errors.

If you need more detailed steps or specific configuration examples, please let me know!

Sources: