Troubleshooting MongoDB Data Ingestion Warning in DataHub v0.14.0

Original Slack Thread

Hi <@U06TM7M40RK> i’m getting this warning when ingesting mongodb in datahub v0.14.0:
An unexpected issue occurred
Unrecognized column types found: field type <class ‘bytes’>

The warning message “Unrecognized column types found: field type <class ‘bytes’>” indicates that the MongoDB ingestion process encountered a field type that it does not recognize or handle properly. This issue can arise due to various reasons, such as unsupported data types or schema inference limitations.

To address this issue, you can try the following steps:

  1. Check Data Types: Ensure that the data types in your MongoDB collections are supported by DataHub’s ingestion framework. The bytes type might not be directly supported, and you may need to convert it to a supported type like string or binary.

  2. Schema Inference Configuration: You can adjust the schema inference settings in your MongoDB ingestion configuration to better handle the data types. For example, you can enable or disable schema inference, use random sampling, or set a maximum schema size.

  3. Custom Schema Handling: If the default schema inference does not work for your use case, you might need to implement custom logic to handle specific data types. This can involve writing custom ingestion code or modifying the existing ingestion source code.

Here is an example configuration for MongoDB ingestion with schema inference settings:

source:
    type: mongodb
    config:
        connect_uri: 'mongodb://<username>:<password>@<host>:<port>'
        username: '<your-username>'
        password: '<your-password>'
        authMechanism: 'DEFAULT'
        enableSchemaInference: true
        useRandomSampling: true
        maxSchemaSize: 300
        ssl: true
        ssl_ca_certs: '/path/to/ca.pem'
        ssl_certfile: '/path/to/client.pem'
        ssl_keyfile: '/path/to/client-key.pem'

If you continue to face issues, you may need to look into the specific error logs and adjust the configuration accordingly.

Would you like more detailed steps on how to troubleshoot or fix this issue?

Sources: