Understanding Metadata Change Events in DataHub

Original Slack Thread

This message was deleted.

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

<@U06TM7M40RK> Can you help with the questions in this thread?


:information_source: We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a :thumbsup: or :thumbsdown: so other folks in the Community know if you’re still looking for help on the topic!

To help you more effectively, could you please provide the following information?

  1. Are you using the UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g., 0.12.0)?
  3. What data source(s) are you integrating with DataHub (e.g., BigQuery)?

Based on your initial question, it seems like you might need to implement a custom ingestion source to handle the XML metadata from the API endpoint. For more details on how to add a custom ingestion source, you can refer to the <https://datahubproject.io/docs/0.13.1/how/add-custom-ingestion-source|DataHub documentation>.

Would you like more detailed guidance on how to proceed with creating a custom ingestion source?

Sources:

I’m using datahub 0.13. I’m using UI for ingestion.

<@U06TM7M40RK> what is MetadataChangeEvent?

A Metadata Change Event (MCE) is a type of event in DataHub that represents a request to change multiple aspects of the same entity. It leverages a deprecated concept of Snapshot, which is a strongly-typed list of aspects for the same entity. MCEs are considered “proposals” for a set of metadata changes, as opposed to Metadata Audit Events (MAEs), which convey committed changes. Only successfully accepted and processed MCEs will lead to the emission of a corresponding MAE or Metadata Change Logs (MCLs).

Key Points about MCE:

  • Emission: MCEs may be emitted by clients of DataHub’s low-level ingestion APIs during the process of metadata ingestion.
  • Consumption: DataHub’s storage layer actively listens for new Metadata Change Events and attempts to apply the requested changes to the Metadata Graph.
  • Schema: The structure of an MCE is defined by its PDL schema.

Example of an MCE:
An example of an MCE emitted to change the ‘ownership’ aspect for an Entity:

{
  "proposedSnapshot": {
    "com.linkedin.pegasus2avro.metadata.snapshot.DatasetSnapshot": {
      "urn": "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)",
      "aspects": [
        {
          "com.linkedin.pegasus2avro.common.Ownership": {
            "owners": [
              {
                "owner": "urn:li:corpuser:jdoe",
                "type": "DATAOWNER",
                "source": null
              },
              {
                "owner": "urn:li:corpuser:datahub",
                "type": "DATAOWNER",
                "source": null
              }
            ],
            "lastModified": {
              "time": 1581407189000,
              "actor": "urn:li:corpuser:jdoe",
              "impersonator": null
            }
          }
        }
      ]
    }
  }
}

For more detailed information, you can refer to the <Metadata Events | DataHub Change Event documentation>.

Would you like more detailed guidance on how to work with MCEs or any other specific aspect of DataHub?

Sources:

  • <Metadata Events | DataHub Change Event documentation> :+1: 1 button :-1: 0 button :bulb: Hint: Mention <@U06TM7M40RK> in the thread for followups.