Configuring DataHub YAML for Ingesting MongoDB as a Source

Original Slack Thread

<@U06TM7M40RK> give me yaml for ingestion mongodb as a source in datahub?

To ingest MongoDB as a source in DataHub, you can use the following YAML configuration as a template. This configuration specifies the MongoDB source and the DataHub REST sink. Make sure to replace placeholders with your actual MongoDB connection details and DataHub server URL.

source:
  type: mongodb
  config:
    connect_uri: "mongodb://localhost:27017"  # Replace with your MongoDB URI
    username: "your_username"  # Replace with your MongoDB username
    password: "your_password"  # Replace with your MongoDB password
    authMechanism: "DEFAULT"  # Authentication mechanism, adjust if needed
    enableSchemaInference: true
    useRandomSampling: true
    maxSchemaSize: 300
    schemaSamplingSize: 1000  # Number of documents to sample for schema inference
    database_pattern:
      allow:
        - "your_database_name"  # Replace with your database name

sink:
  type: datahub-rest
  config:
    server: "http://localhost:8080"  # Replace with your DataHub server URL
    token: "your_auth_token"  # Replace with your DataHub auth token if required

Key Configuration Details:

  • connect_uri: The MongoDB connection URI. Adjust the host and port as needed.
  • username and password: Credentials for accessing your MongoDB instance.
  • authMechanism: The authentication mechanism used by MongoDB. Default is “DEFAULT”.
  • enableSchemaInference: Enables schema inference for MongoDB collections.
  • useRandomSampling: Determines if random sampling is used for schema inference.
  • maxSchemaSize: Maximum number of fields to include in the schema.
  • schemaSamplingSize: Number of documents to sample for schema inference.
  • database_pattern: Specifies which databases to include in the ingestion process.

This configuration is based on the <https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/mongodb#config-details|MongoDB ingestion source documentation>.

Would you like more detailed instructions on any specific part of the process?

Sources:

Hi <@U087HP4P21X>, how did my answer work out for you? Need any more assistance? You can also tag <@U07FV3UHE4D> to let your fellow community members know you’re still looking for guidance.