Ingesting Kafka Schemas with Meta Mapping and Schema Registry Integration

Original Slack Thread

Hi datahub, I am not being successful in using meta_mapping as|instructed in the docs. I am working on CLI-based ingestion for Kafka running v0.13. I am modifying the avro schema with the examples provided in the docs but they won’t get stored in Confluent’s Schema Registry (the open-source version), hence the ingestion won’t see the new tags/fields. When I build the schemas locally, the .java/.class files do contain the tags and metadata attributes though. Any help would be appreciated. Thanks!

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

Here is an example of the schema changes as per the docs:

  "name": "vat_number",
  "tags": ["test-avro-tag"],
  "type": "string"
  "name": "fiscal_code",
  "type": [
  "gdpr": {
    "pii": true

.java file


I’m a bit confused on what you’re trying to do - the meta mapping piece only works with the python ingestion source

What’s the python source? The|documentation provides the configuration for the CLI YAML. I am trying to add meta attributes to my avro schemas as stated in DataHub’s documentation and use the meta mapping to enrich DataHub. Is it not what the documentation says?

That is - I guess I’m confused where that .java file fits into this?

That’s the source/compiled code for the avro classes generated out of the schemas.
Avro Schema >> Avro artefact >> Kafka producer >> Schema Registry

I see - if they’re not in the schema registry, then our ingestion source can’t see them

So how are those|examples meant to be sent to the Schema Registry? Via the REST API?

Yup - most folks tend to already have some mechanism for moving their avro/json schemas into the schema registry, since they need to do that for their operational use cases anyways.

Hi <@U01GZEETMEZ>, just tried with our gitops pipeline to register the schema and it has got added to the schema in the Schema Register which then datahub was able to ingest successfully. I think it’d be worth improving the documentation mentioning this is a feature of the Schema Registry and not avro schemas per se which may also be sent to Kafka without the Schema Registry. The|docs say:
Avro schemas are permitted to have additional attributes not defined by the specification as arbitrary metadata.
Which is quite misleading as Kafka producers don’t push those attributes to the Schema Registry and in fact it’s not documented any where in the avro specification. This should be a sub-section within the Schema Registry. Is this something you’d be able to raise or should I raise it anywhere else?

Thanks a mil for your help btw.