Ingesting Kafka Messages without Schema Registry In DataHub

Original Slack Thread

Hi team!
We have a Kafka without schema registry and want to ingest it somehow :slightly_smiling_face:
We are trying to use a ingest config with type: “kafka”. Topic names are extracted correctly and displayed in datahub but obviously schema is empty for each of the topic.

My question:
Maybe anyone had similar problem as well and found their own solution to ingest kafka without registry?

FYI:
• All of kafka messages are JSON objects, so theoretically it’s possible to predict type based on key/value pairs.
• We are using CLI for ingestion
• DataHub version: 0.13.0

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

I know that kafka ingestor support custom schema registry. Maybe someone already implemented similar case and can share a details/experience?
Thanks!![attachment]({‘ID’: ‘F06TVS2TUJ3’, ‘EDITABLE’: False, ‘IS_EXTERNAL’: False, ‘USER_ID’: ‘U06U8TD6509’, ‘CREATED’: ‘2024-04-15 09:52:13+00:00’, ‘PERMALINK’: ‘Slack’, ‘EXTERNAL_TYPE’: ‘’, ‘TIMESTAMPS’: ‘2024-04-15 09:52:13+00:00’, ‘MODE’: ‘hosted’, ‘DISPLAY_AS_BOT’: False, ‘PRETTY_TYPE’: ‘PNG’, ‘NAME’: ‘image.png’, ‘IS_PUBLIC’: True, ‘PREVIEW_HIGHLIGHT’: None, ‘MIMETYPE’: ‘image/png’, ‘PERMALINK_PUBLIC’: ‘https://slack-files.com/TUMKD5EGJ-F06TVS2TUJ3-f9d3ae2619’, ‘FILETYPE’: ‘png’, ‘EDIT_LINK’: None, ‘URL_PRIVATE’: ‘Slack’, ‘HAS_RICH_PREVIEW’: False, ‘TITLE’: ‘image.png’, ‘IS_STARRED’: False, ‘PREVIEW_IS_TRUNCATED’: None, ‘URL_PRIVATE_DOWNLOAD’: ‘Slack’, ‘PREVIEW’: None, ‘PUBLIC_URL_SHARED’: False, ‘MESSAGE_TS’: ‘1713174802.521349’, ‘PARENT_MESSAGE_TS’: ‘1713173243.523909’, ‘MESSAGE_CHANNEL_ID’: ‘CUMUWQU66’, ‘_FIVETRAN_DELETED’: False, ‘LINES_MORE’: None, ‘LINES’: None, ‘SIZE’: 362504, ‘_FIVETRAN_SYNCED’: ‘2024-04-21 08:22:08.625000+00:00’})

It looks like you’re on the right track. <@UV14447EU> do you have any additional guidance?

This would force datahub to create consumers for every topic and consume messages from them. The messages on a Kafka topic without the Schema Registry could also have different structures. Which schema would it choose then? I don’t think it’s a good idea to infer the schema from messages on a topic.

<@U06U8TD6509> If I were you, I would store my schemas elsewhere and build a job to push them to GMS myself.