Using MetadataChangeLogEvent for Tracking Dataset Creation and Ownership Changes in DataHub

Original Slack Thread

Will a https://datahubproject.io/docs/advanced/mcp-mcl/|MetadataChangelogEvent tell me if a totally new dataset was created in DataHub or if an owner was added/removed?

In this case, ingestion uses the Python API to emit to DH.

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

<@U06TM7M40RK> Can you help with the questions in this thread?


:information_source: We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a :thumbsup: or :thumbsdown: so other folks in the Community know if you’re still looking for help on the topic!

To determine if a totally new dataset was created in DataHub or if an owner was added/removed, you can use the MetadataChangeLogEvent.

  1. New Dataset Creation:

    • According to the <https://datahubproject.io/docs/advanced/mcp-mcl/|Metadata Change Log Event documentation>, this event is emitted when any aspect on DataHub Metadata Graph is changed, including the creation of new entities. Specifically, a new dataset creation would be indicated by an event with changeType set to CREATE and entityType set to dataset.
  2. Owner Added/Removed:

    • Similarly, the addition or removal of an owner can be tracked using the MetadataChangeLogEvent. The event will have changeType set to UPSERT for adding an owner and DELETE for removing an owner. The aspectName field will be ownership to indicate changes related to ownership.

Here is an example of how these events might look:

New Dataset Creation Event:

{
  "entityUrn": "urn:li:dataset:abc",
  "entityType": "dataset",
  "changeType": "CREATE",
  "aspectName": "datasetProperties",
  "aspect": {
    "value": "{\"name\": \"Sample Dataset\"}",
    "contentType": "application/json"
  },
  "created": {
    "time": 1649953100653,
    "actor": "urn:li:corpuser:jdoe"
  }
}

Owner Added Event:

{
  "entityUrn": "urn:li:dataset:abc",
  "entityType": "dataset",
  "changeType": "UPSERT",
  "aspectName": "ownership",
  "aspect": {
    "value": "{\"owners\": [{\"owner\": \"urn:li:corpuser:jdoe\", \"type\": \"DATAOWNER\"}]}",
    "contentType": "application/json"
  },
  "created": {
    "time": 1649953100653,
    "actor": "urn:li:corpuser:jdoe"
  }
}

Owner Removed Event:

{
  "entityUrn": "urn:li:dataset:abc",
  "entityType": "dataset",
  "changeType": "DELETE",
  "aspectName": "ownership",
  "aspect": {
    "value": "{\"owners\": [{\"owner\": \"urn:li:corpuser:jdoe\", \"type\": \"DATAOWNER\"}]}",
    "contentType": "application/json"
  },
  "created": {
    "time": 1649953100653,
    "actor": "urn:li:corpuser:jdoe"
  }
}

Would you like more detailed information on how to configure or use these events?

Sources: