Troubleshooting Redshift Data Ingestion Issue Behind Proxy

Original Slack Thread

Hey team!

I’m having some issues ingesting redshift data that’s configured behind our proxy, hoping to get some assistance here:
Unable to connect: Failed to connect to Redshift. Please verify your connection details.Unable to connect: Failed to connect to Redshift. Please verify your connection details.: BrokenPipe: server socket closed. Please check that client side networking configurations such as Proxies, firewalls, VPN, etc. are not affecting your network connection.

Some additional details:
• Running this on minikube with these configurations: minikube start --cpus=8 --memory=16384 --cni=bridge --driver=docker --profile=datahub --network=host
• Can confirm that my minikube pods are connected to the VPN
• Our proxy is set up to allow connections from our VPN
Will provide yaml file in thread.

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

<@U06TM7M40RK> Can you help with the questions in this thread?


:information_source: Please continue to tag RunLLM (e.g <@U06TM7M40RK>) to have it continue generating responses! We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66|> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a :+1: or :-1: so other folks in the Community know if you’re still looking for help on the topic and to help us improve the assistant!

To assist you effectively, could you please provide the following information:

  1. Are you using the UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g., 0.13.1.2)?
  3. What data source(s) are you integrating with DataHub (e.g., Redshift)?

Based on the error message you provided, it seems that there might be an issue with the network configuration, particularly with the proxy settings. Here are some steps you can take to troubleshoot and resolve the issue:

Steps to Troubleshoot and Resolve the Issue:

  1. Verify Proxy Configuration:
    Ensure that your proxy settings are correctly configured to allow connections from your VPN. You can specify proxy settings in your recipe.dhub.yaml file. Here is an example configuration for Redshift ingestion with proxy settings:

    source:
        type: redshift
        config:
            host_port: 'my-proxy-hostname:5439'
            database: 'DB_NAME'
            username: 'user'
            password: 'pass'
            options:
                connect_args:
                    sslmode: prefer
                    sslrootcert: null
            profiling:
                enabled: true
                profile_table_level_only: true
            stateful_ingestion:
                enabled: true
    sink:
        type: datahub-rest
        config:
            server: '${DATAHUB_GMS_URL}'
            token: '${DATAHUB_GMS_TOKEN}'
    
  2. Check Network Connectivity:
    Ensure that your Minikube pods have network connectivity to the Redshift cluster through the proxy. You can use tools like curl or telnet from within the Minikube pods to test connectivity.

  3. Review Logs for Detailed Errors:
    Review the logs from the DataHub ingestion executor and the DataHub GMS to identify any specific error messages. This can provide insights into why the connection is failing. You can access the logs by clicking on the Details button of the failed ingestion run in the UI.

  4. Update DataHub and Dependencies:
    Ensure that both the DataHub server and CLI are updated to the latest versions. Sometimes, updating to the latest version can resolve compatibility issues.

    pip install --upgrade acryl-datahub
    
  5. Disable Stateful Ingestion:
    As a temporary workaround, you can try disabling stateful ingestion to see if it resolves the issue:

    stateful_ingestion:
        enabled: false
    

Would you like more detailed steps on any of these potential solutions?

Sources:

    type: redshift
    config:
        host_port: satori-host:5439
        database: pantheon
        username: *
        password: *

        table_lineage_mode: stl_scan_based
        include_table_lineage: true
        include_tables: true
        include_views: true
        include_usage_statistics: false
        schema_pattern: {'allow': ['pantheon.credit'], 'deny': [], 'ignoreCase': True}
        match_fully_qualified_names: true

        profiling:
            enabled: false
            profile_table_level_only: false
        stateful_ingestion:
            enabled: false
        


sink:
  type: "datahub-rest"
  config:
    server: "<http://localhost:8080>"
    token: "*"```
  1. CLI for ingestion
  2. v0.13.2
  3. Redshift
    I’ve taken the steps that RunLLM suggested but still not working

Hi <@U07D6AG6S3X> do you mind sharing further logs on what is currently not working?