Understanding Metadata Change Events in Ingestion Process and DataHub's Architecture

Original Slack Thread

Hello all! I have a general question regarding the ingestion process. It’s not clear to me where the Metadata Change Events (that are generated by the ingestion recipes) are sent when I trigger the ingestion. Currently we are doing ingestion via the UI, and using the default sink. Does that mean that when a particular ingestion (say, Athena) runs, a lot of MCEs are sent to the GMS, which then makes the corresponding changes to the MYSQL DB ? Where does Kafka play into this? Is kafka only used when I set it as a sink?

Thank you :slightly_smiling_face:

EDIT: Additional question: When I run the ingestion on the UI, the ingestion runs on the actions container right?

Yup - ingestion runs in the actions container, and by default uses a rest sink to talk to GMS

There’s multiple kafka topics at play here, so for more details I’d recommend reading this doc https://datahubproject.io/docs/0.10.5/what/mxe/#metadata-change-log-mcl

In broad strokes: GMS can get MCEs/MCPs from either rest of kafka. It then writes that change to MySQL, and emits a metadata change log event to kafka.

The MCL events are used to update DataHub’s search and graph indicies, and can also be used to run change-based actions using the actions framework

Thanks a lot for this explanation!