Ingesting Metadata from Kafka Producer Without Using Kafka Connect

Original Slack Thread

Hello, Good morning, Good afternoon, Good evening.
I have a question.
I understand we can ingest the topic and schema from kafka broker (https://datahubproject.io/docs/generated/ingestion/sources/kafka)
And how can we ingest the metadata like what producers(like team name or id, etc…) send messages to the topic, not using kafka connet…

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

<@U06TM7M40RK> Can you help with the questions in this thread?


:information_source: We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a :thumbsup: or :thumbsdown: so other folks in the Community know if you’re still looking for help on the topic!

Hello! To help you more effectively, could you please provide the following information?

  1. Are you using the UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g., 0.12.0)?
  3. What data source(s) are you integrating with DataHub? (e.g., BigQuery)?

Regarding your question about ingesting metadata like producer information (e.g., team name or ID) for Kafka topics, DataHub’s Kafka ingestion primarily focuses on extracting topic names and schemas from the schema registry. According to the <https://datahubproject.io/docs/generated/ingestion/sources/kafka|Kafka ingestion documentation>, it does not natively support extracting producer metadata directly from Kafka messages.

However, you can enrich DataHub metadata with additional information using custom configurations. For example, you can use the meta_mapping configuration to map specific fields in Avro schemas to DataHub metadata like owners, tags, and terms. This approach is detailed in the <Kafka | DataHub ingestion documentation>.

If you need to ingest producer metadata that is not part of the schema, you might need to implement a custom solution. One approach could be to create a custom ingestion source that reads this metadata from an external system or directly from Kafka messages and then ingests it into DataHub using the GMS REST APIs.

Would you like more detailed guidance on how to implement such a custom solution?

Sources:

I am new to Datahub.

  1. I don’t know.
  2. the newest.
  3. I want to fetch metadata from kafka producer, kafka(topic), SR(topic schema), kafka consumer into metadata.