Generating Avro Schema Files for Metadata Emitter in Rust

Original Slack Thread

Can anyone either tell me how to generate or find these avro schema files?

Hi <@U05PNDPTAFP>: curious why these specific files got your attention :slightly_smiling_face:

These are generated files as part of the build from the source-of-truth model in .pdl files

e.g. you’ll find one of them here ./metadata-models/src/mainGeneratedAvroSchema/avro/com/linkedin/mxe/MetadataChangeEvent.avsc

after running ./gradlew :metadata-models:build -x test


I’m wanting to write a Rust metadata emitter :wink:

unless one already exists

I hate python

it’s my one fatal flaw XD

ooh interesting … cc <@U01GZEETMEZ>

I’m the lead architect for the digital engineering group at the Idaho National Laboratory, a DOE national lab. I’m the head architect for a multi-lab project called Alexandria, a data management platform across NA-22 data ecosystem. This multi-million dollar project is set to unify the data experience across the labs and provide easy to access,catalog, and compute data for all employees. We’re currently evaluating DataHub as our data catalog of choice. We do a lot with Rust here, so looking at seeing how much we could utilize our existing skillset for metadata emission from our HPC system for bespoke file types like TDMS etc.

That sounds pretty cool!

In case it’s helpful, the python codegen is mainly driven by a avro schema -> code generator, the code of which lives here However, the python code ultimately serializes everything as json when pushing to DataHub’s backend, so having codegen’d models is convenient but not strictly necessary to just get something working. If you run the python emitter with debug logging enabled, it will print curl commands equivalent to each api call it makes

the kafka emitter is all avro though right?

That’s true - for the kafka emitter, as long as you can serialize avro using those schemas and write to kafka, you should be good to go.

that’s the plan, rust has some solid support