Understanding Metadata Change Events in Ingestion Process and DataHub's Architecture

user-2 · March 4, 2024, 3:59pm

Hello all! I have a general question regarding the ingestion process. It’s not clear to me where the Metadata Change Events (that are generated by the ingestion recipes) are sent when I trigger the ingestion. Currently we are doing ingestion via the UI, and using the default sink. Does that mean that when a particular ingestion (say, Athena) runs, a lot of MCEs are sent to the GMS, which then makes the corresponding changes to the MYSQL DB ? Where does Kafka play into this? Is kafka only used when I set it as a sink?

Thank you

EDIT: Additional question: When I run the ingestion on the UI, the ingestion runs on the actions container right?

user-1 · March 4, 2024, 3:59pm

Yup - ingestion runs in the actions container, and by default uses a rest sink to talk to GMS

There’s multiple kafka topics at play here, so for more details I’d recommend reading this doc https://datahubproject.io/docs/0.10.5/what/mxe/#metadata-change-log-mcl

In broad strokes: GMS can get MCEs/MCPs from either rest of kafka. It then writes that change to MySQL, and emits a metadata change log event to kafka.

user-1 · March 4, 2024, 3:59pm

The MCL events are used to update DataHub’s search and graph indicies, and can also be used to run change-based actions using the actions framework

user-2 · March 4, 2024, 3:59pm

Thanks a lot for this explanation!

Topic		Replies	Views
Understanding Metadata Change Events in DataHub ingestion	5	61	June 10, 2024
Forced Ingestion Updates in DataHub Recipes from CLI and MySQL ingestion	1	45	May 13, 2024
Using MetadataChangeLogEvent for Tracking Dataset Creation and Ownership Changes in DataHub ingestion	2	20	June 24, 2024
Ingesting Kafka Metadata into Datahub: Understanding Extracted Data getting-started	1	67	March 4, 2024
Creating Notifications for Metadata Changes Using DataHub Action Framework ingestion	4	7	December 30, 2024

Understanding Metadata Change Events in Ingestion Process and DataHub's Architecture

Related topics