Ingesting Kafka Messages without Schema Registry In DataHub

Original Slack Thread

Hi team!
We have a Kafka without schema registry and want to ingest it somehow :slightly_smiling_face:
We are trying to use a ingest config with type: “kafka”. Topic names are extracted correctly and displayed in datahub but obviously schema is empty for each of the topic.

My question:
Maybe anyone had similar problem as well and found their own solution to ingest kafka without registry?

• All of kafka messages are JSON objects, so theoretically it’s possible to predict type based on key/value pairs.
• We are using CLI for ingestion
• DataHub version: 0.13.0

I know that kafka ingestor support custom schema registry. Maybe someone already implemented similar case and can share a details/experience?
It looks like you’re on the right track. <@UV14447EU> do you have any additional guidance?

This would force datahub to create consumers for every topic and consume messages from them. The messages on a Kafka topic without the Schema Registry could also have different structures. Which schema would it choose then? I don’t think it’s a good idea to infer the schema from messages on a topic.

<@U06U8TD6509> If I were you, I would store my schemas elsewhere and build a job to push them to GMS myself.