Hi datahub, I am not being successful in using meta_mapping as https://datahubproject.io/docs/generated/ingestion/sources/kafka/#meta-mapping|instructed in the docs. I am working on CLI-based ingestion for Kafka running v0.13. I am modifying the avro schema with the examples provided in the docs but they won’t get stored in Confluent’s Schema Registry (the open-source version), hence the ingestion won’t see the new tags/fields. When I build the schemas locally, the .java
/.class
files do contain the tags and metadata attributes though. Any help would be appreciated. Thanks!
Hey there! Make sure your message includes the following information if relevant, so we can help more effectively!
- Are you using UI or CLI for ingestion?
- Which DataHub version are you using? (e.g. 0.12.0)
- What data source(s) are you integrating with DataHub? (e.g. BigQuery)
Here is an example of the schema changes as per the docs:
"name": "vat_number",
"tags": ["test-avro-tag"],
"type": "string"
},
{
"name": "fiscal_code",
"type": [
"null",
"string"
],
"gdpr": {
"pii": true
}
},```
.java
file
.....
{\"name\":\"vat_number\",\"type\":{\"type\":\"string\",\"avro.java.string\":\"String\"},\"tags\":[\"test-avro-tag\"]},{\"name\":\"fiscal_code\",\"type\":[\"null\",{\"type\":\"string\",\"avro.java.string\":\"String\"}],\"gdpr\":{\"pii\":true}}
...```
I’m a bit confused on what you’re trying to do - the meta mapping piece only works with the python ingestion source
What’s the python source? The https://datahubproject.io/docs/generated/ingestion/sources/kafka/#meta-mapping|documentation provides the configuration for the CLI YAML. I am trying to add meta attributes to my avro schemas as stated in DataHub’s documentation and use the meta mapping to enrich DataHub. Is it not what the documentation says?
That is - I guess I’m confused where that .java file fits into this?
That’s the source/compiled code for the avro classes generated out of the schemas.
Avro Schema >> Avro artefact >> Kafka producer >> Schema Registry
I see - if they’re not in the schema registry, then our ingestion source can’t see them
So how are those https://datahubproject.io/docs/generated/ingestion/sources/kafka/#simple-tags|examples meant to be sent to the Schema Registry? Via the REST API?
Yup - most folks tend to already have some mechanism for moving their avro/json schemas into the schema registry, since they need to do that for their operational use cases anyways.
Hi <@U01GZEETMEZ>, just tried with our gitops pipeline to register the schema and it has got added to the schema in the Schema Register which then datahub was able to ingest successfully. I think it’d be worth improving the documentation mentioning this is a feature of the Schema Registry and not avro schemas per se which may also be sent to Kafka without the Schema Registry. The https://datahubproject.io/docs/generated/ingestion/sources/kafka/#enriching-datahub-metadata-with-automated-meta-mapping|docs say:
Avro schemas are permitted to have additional attributes not defined by the specification as arbitrary metadata.
Which is quite misleading as Kafka producers don’t push those attributes to the Schema Registry and in fact it’s not documented any where in the avro specification. This should be a sub-section within the Schema Registry. Is this something you’d be able to raise or should I raise it anywhere else?
Thanks a mil for your help btw.