Setting up DataHub and Connecting to a Local Postgres Server using Docker on Mac OS

Original Slack Thread

Hello, I need a simple local setup for a client demo for tomorrow (12pm) that would run DataHub and connect to local Postgres server on Mac OS. I’m using provided docker version of DataHub and have trouble connecting to local Postgres server. I’m running docker container provided from https://www.docker.com/products/docker-desktop/ Do I need some adjustments ?

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

<@U06TM7M40RK> Can you help with the questions in this thread?


:information_source: We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a :thumbsup: or :thumbsdown: so other folks in the Community know if you’re still looking for help on the topic!

To assist you effectively, I need a bit more information:

  1. Are you using the UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g., 0.12.0)?
  3. What data source(s) are you integrating with DataHub? (e.g., BigQuery)?

However, based on the information you provided, here are some general steps and adjustments you might need to make to connect DataHub to a local Postgres server using Docker on Mac OS:

Steps to Connect DataHub to Local Postgres

  1. Ensure Postgres is Running Locally:

    • Make sure your Postgres server is running and accessible on the default port (5432).
  2. Modify Docker Compose for DataHub:

  3. Update DataHub Configuration:

    • Update the DataHub configuration to point to your local Postgres instance. Here is a sample configuration:
# docker-compose.override.yml
version: '2.1'
services:
  datahub-gms:
    environment:
      EBEAN_DATASOURCE_USERNAME: 'your_postgres_username'
      EBEAN_DATASOURCE_PASSWORD: 'your_postgres_password'
      EBEAN_DATASOURCE_URL: 'jdbc:postgresql://host.docker.internal:5432/your_database_name'
  1. Run Docker Compose:
    • Use the following command to start DataHub with the modified configuration:
docker-compose -f docker-compose.yml -f docker-compose.override.yml up

Troubleshooting Connection Issues

  • Connection Refused Error: If you encounter a Connection refused error, ensure that Docker can access your local Postgres instance. Using host.docker.internal is recommended for Docker on Mac OS to refer to localhost.

  • Port Conflicts: Ensure no other services are conflicting with the ports used by DataHub and Postgres.

Example Configuration for Ingestion

If you are using the CLI for ingestion, here is an example configuration for connecting to Postgres:

source:
  type: postgres
  config:
    host_port: localhost:5432
    database: your_database_name
    username: your_postgres_username
    password: your_postgres_password
sink:
  type: datahub-rest
  config:
    server: 'http://localhost:8080'

Would you like more detailed steps or help with a specific part of the setup?

Sources:

I am using UI version v0.13.3rc1 and want to connect to local postgres database

mladen - I don’t know if it’s useful but I found that datahub docker couldn’t ‘see’ the postgres server becaue of the way docker does networking.
When you create the postgres database you can add it to the datahub network with --network dathub or something like that (do docker network ls to figure it out?)
Anyway bit of a hack but that’s how I got a local demo going quick.

I’m looking at my log and I ran:
docker run -d --network a74974753377 --name pg_chinook -p 5432:5432 pg_chinook
where pg_chinook is my postgres image name and container name.
a74974753377 is the id of the network that I got from docker network ls

you can then probably use localhost or pg_chinook as the hostname

If anyone wants to chime in with the correct way to do this feel free

Anyway this is a docker thing basically