Troubleshooting MongoDB Ingestion Connection Refused Error

Original Slack Thread

Hi
I have just setup datahub on my local system by following the getting started guide. From ui i tried adding mongodb ingestion source. I get connection refused error while the ingestion runs. My mongo instance is also running on local docker and i am able to access it using mongo compass using the below connection string. Please help to get it working
<mongodb://collectorusr:collectorpass@localhost:27018/?authSource=admin&amp;readPreference=secondary&amp;directConnection=true&amp;ssl=false|mongodb://usr:pass@localhost:27018/?authSource=admin&amp;readPreference=secondary&amp;directConnection=true&amp;ssl=false>

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

<@U06TM7M40RK> Can you help with the questions in this thread?


:information_source: We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a :thumbsup: or :thumbsdown: so other folks in the Community know if you’re still looking for help on the topic!

To help you more effectively, could you please provide the following details:

  1. Are you using the UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g., 0.13.1.2)
  3. What data source(s) are you integrating with DataHub? (e.g., MongoDB, BigQuery)?

Additionally, based on the connection refused error, here are a few troubleshooting steps you can try:

  1. Check MongoDB Configuration: Ensure that your MongoDB instance is configured to accept connections from the DataHub container. Sometimes, MongoDB might be configured to listen only on localhost within its container.

  2. Network Configuration: Since both DataHub and MongoDB are running in Docker containers, ensure that they are on the same Docker network. You can create a custom Docker network and attach both containers to it.

  3. Use Docker Internal Hostname: Instead of using localhost in your connection string, try using host.docker.internal which is a special DNS name that resolves to the internal IP address used by the host.

  4. Port Exposure: Ensure that the MongoDB container’s ports are correctly exposed and mapped to the host machine.

Here is an example configuration for MongoDB ingestion that you can use:

source:
    type: mongodb
    config:
        connect_uri: '<mongodb://collectorusr:collectorpass@host.docker.internal:27018/?authSource=admin&amp;readPreference=secondary&amp;directConnection=true&amp;ssl=false>'
        username: 'collectorusr'
        password: 'collectorpass'
        authMechanism: 'SCRAM-SHA-1'
        enableSchemaInference: true
        useRandomSampling: true
        maxSchemaSize: 300

For more detailed steps on debugging ingestion issues, refer to the <Ingestion | DataHub documentation on debugging a failed ingestion run>.

Would you like more detailed instructions on any of these steps?

Sources:

  • <Slack Slack Thread>
  • <Ingestion | DataHub Documentation on Debugging Ingestion> :+1: 1 button :-1: 0 button :bulb: Hint: Mention <@U06TM7M40RK> in the thread for followups.

> raise PipelineInitError(f"Failed to {step}: {e}") from e &gt; datahub.ingestion.run.pipeline.PipelineInitError: Failed to configure the source (mongodb): localhost:27018: [Errno 111] Connection refused (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms), Timeout: 30s, Topology Description: &lt;TopologyDescription id: 6682b5a72675e87570806e36, topology_type: Single, servers: [&lt;ServerDescription ('localhost', 27018) server_type: Unknown, rtt: None, error=AutoReconnect('localhost:27018: [Errno 111] Connection refused (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms)')&gt;]
>
> source:
> type: mongodb
> config:
> connect_uri: ‘mongodb://collectorusr:collectorpass@localhost:27018/?authSource=admin&amp;readPreference=secondary&amp;directConnection=true&amp;ssl=false|mongodb://usr:pass@localhost:27018/?authSource=admin&amp;readPreference=secondary&amp;directConnection=true&amp;ssl=false
> enableSchemaInference: true
> useRandomSampling: true
> maxSchemaSize: 300
> sink:
> type: datahub-rest
> config:
> server: ‘http://datahub-gms:8080

The provided config worked. Thanks a lot