<@U06TM7M40RK> can you please give me an indication of the architecture of datahub ?
DataHub’s architecture is designed to support data discovery, collaboration, governance, and observability within the modern data stack. Here are the key components and highlights of its architecture:
-
Schema-first Metadata Modeling: DataHub employs a model-first approach using a serialization-agnostic language. It supports both REST and GraphQL APIs, and an AVRO-based API over Kafka for metadata changes. This allows for interoperability between different tools and systems (<https://datahubproject.io/docs/0.13.1/architecture/architecture|DataHub Architecture Overview>).
-
Stream-based Real-time Metadata Platform: The platform is stream-oriented, enabling real-time communication and reflection of metadata changes. This allows for building systems that can react to metadata changes in real-time, such as access control systems (<https://datahubproject.io/docs/0.13.1/architecture/architecture|DataHub Architecture Overview>).
-
Federated Metadata Serving: DataHub supports federated metadata services, allowing different teams to own and operate their metadata services. These services communicate with a central search index and graph using Kafka, supporting global search and discovery while enabling decoupled ownership of metadata (<https://datahubproject.io/docs/0.13.1/architecture/architecture|DataHub Architecture Overview>).
-
Metadata Storage: Metadata is stored in a document store, typically an RDBMS like MySQL, Postgres, or Cassandra. The Metadata Service exposes REST and GraphQL APIs for CRUD operations, search, and graph queries (<https://datahubproject.io/docs/0.14.0/architecture/metadata-serving|DataHub Serving Architecture>).
-
Metadata Change Log Stream (MCL): When a metadata change is committed, a Metadata Change Log event is emitted over Kafka. This allows external systems to react in real-time to metadata changes (<https://datahubproject.io/docs/0.14.0/architecture/metadata-serving|DataHub Serving Architecture>).
-
Metadata Index Applier: The Metadata Change Log events are consumed by a job that applies changes to the graph and search index, ensuring metadata changes are processed in the correct order (<https://datahubproject.io/docs/0.14.0/architecture/metadata-serving|DataHub Serving Architecture>).
Would you like more detailed information on any specific aspect of DataHub’s architecture?
Sources:
- <https://datahubproject.io/docs/0.13.1/architecture/architecture|DataHub Architecture Overview>
- <https://datahubproject.io/docs/0.14.0/architecture/metadata-serving|DataHub Serving Architecture>
0 button
0 button
Hint: Mention @runllm in the thread for followups.
Hi <@U08A2TALZE1>, how did my answer work out for you? Need any more assistance? You can also tag <@U07FV3UHE4D> to let your fellow community members know you’re still looking for guidance.
Hi ! Did this interaction resolve your issue or do you still require assistance ?
Please tag @ RunLLM if you still need further assistance !
Or tag @Datahub Community Support if you require a person to take a look
Hope this helps ! ><