Creating a Container Class in DataHub Using Python SDK

Original Slack Thread

<@U06TM7M40RK> how to create container class?

To create a container class in DataHub, you can use the Python SDK to define and emit the container entity. Here’s a step-by-step guide on how to do this:

  1. Define the Container URN: Use the make_container_urn function to create a unique URN for the container.
  2. Create Container Properties: Define the properties of the container using the ContainerPropertiesClass.
  3. Emit the Container: Use the MetadataChangeProposalWrapper to emit the container entity with its properties.

Here is an example code snippet to create and emit a container:

import datahub.emitter.mce_builder as builder
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.metadata.schema_classes import ContainerPropertiesClass, ChangeTypeClass

# Define the container URN
container_urn = builder.make_container_urn("your_container_name")

# Define the container properties
container_properties = ContainerPropertiesClass(
    name="Your Container Name",
    description="Description of your container",
    customProperties={"key1": "value1", "key2": "value2"}
)

# Create a Metadata Change Proposal (MCP) to emit the container
mcp = MetadataChangeProposalWrapper(
    entityType="container",
    changeType=ChangeTypeClass.UPSERT,
    entityUrn=container_urn,
    aspect=container_properties
)

# Emit the MCP using your DataHub emitter
# Assuming `emitter` is an instance of DataHub emitter
emitter.emit(mcp)

This code snippet demonstrates how to create a container with specific properties and emit it to DataHub.

Would you like more detailed instructions or have any specific questions about this process?

Sources: