Hi
I have just setup datahub on my local system by following the getting started guide. From ui i tried adding mongodb ingestion source. I get connection refused error while the ingestion runs. My mongo instance is also running on local docker and i am able to access it using mongo compass using the below connection string. Please help to get it working
<mongodb://collectorusr:collectorpass@localhost:27018/?authSource=admin&readPreference=secondary&directConnection=true&ssl=false|mongodb://usr:pass@localhost:27018/?authSource=admin&readPreference=secondary&directConnection=true&ssl=false>
Hey there! Make sure your message includes the following information if relevant, so we can help more effectively!
- Are you using UI or CLI for ingestion?
- Which DataHub version are you using? (e.g. 0.12.0)
- What data source(s) are you integrating with DataHub? (e.g. BigQuery)
<@U06TM7M40RK> Can you help with the questions in this thread?
We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a or so other folks in the Community know if you’re still looking for help on the topic!
To help you more effectively, could you please provide the following details:
- Are you using the UI or CLI for ingestion?
- Which DataHub version are you using? (e.g., 0.13.1.2)
- What data source(s) are you integrating with DataHub? (e.g., MongoDB, BigQuery)?
Additionally, based on the connection refused error, here are a few troubleshooting steps you can try:
-
Check MongoDB Configuration: Ensure that your MongoDB instance is configured to accept connections from the DataHub container. Sometimes, MongoDB might be configured to listen only on
localhost
within its container. -
Network Configuration: Since both DataHub and MongoDB are running in Docker containers, ensure that they are on the same Docker network. You can create a custom Docker network and attach both containers to it.
-
Use Docker Internal Hostname: Instead of using
localhost
in your connection string, try usinghost.docker.internal
which is a special DNS name that resolves to the internal IP address used by the host. -
Port Exposure: Ensure that the MongoDB container’s ports are correctly exposed and mapped to the host machine.
Here is an example configuration for MongoDB ingestion that you can use:
source:
type: mongodb
config:
connect_uri: '<mongodb://collectorusr:collectorpass@host.docker.internal:27018/?authSource=admin&readPreference=secondary&directConnection=true&ssl=false>'
username: 'collectorusr'
password: 'collectorpass'
authMechanism: 'SCRAM-SHA-1'
enableSchemaInference: true
useRandomSampling: true
maxSchemaSize: 300
For more detailed steps on debugging ingestion issues, refer to the <Ingestion | DataHub documentation on debugging a failed ingestion run>.
Would you like more detailed instructions on any of these steps?
Sources:
- <Slack Slack Thread>
- <Ingestion | DataHub Documentation on Debugging Ingestion> 1 button 0 button Hint: Mention <@U06TM7M40RK> in the thread for followups.
> raise PipelineInitError(f"Failed to {step}: {e}") from e > datahub.ingestion.run.pipeline.PipelineInitError: Failed to configure the source (mongodb): localhost:27018: [Errno 111] Connection refused (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms), Timeout: 30s, Topology Description: <TopologyDescription id: 6682b5a72675e87570806e36, topology_type: Single, servers: [<ServerDescription ('localhost', 27018) server_type: Unknown, rtt: None, error=AutoReconnect('localhost:27018: [Errno 111] Connection refused (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms)')>]
>
> source:
> type: mongodb
> config:
> connect_uri: ‘mongodb://collectorusr:collectorpass@localhost:27018/?authSource=admin&readPreference=secondary&directConnection=true&ssl=false|mongodb://usr:pass@localhost:27018/?authSource=admin&readPreference=secondary&directConnection=true&ssl=false’
> enableSchemaInference: true
> useRandomSampling: true
> maxSchemaSize: 300
> sink:
> type: datahub-rest
> config:
> server: ‘http://datahub-gms:8080’
The provided config worked. Thanks a lot