Configuring mcp_builder.gen_containers() for Human-Readable Container URN

Original Slack Thread

<@U06TM7M40RK> Hello, I am ingesting lots of data from dbt using a combination of the dbt ingestion and a custom ingester. Since DBT does not recognise databases, the custom ingester creates database container entities. The urn of the entity is a hash value and not clear text. How do I configure mcp_builder.gen_containers() to define the url of the container to not be a hash, as is possible via the GUI?

To configure mcp_builder.gen_containers() to define the URL of the container without using a hash, you can specify the qualified_name parameter when generating containers. This parameter allows you to set a human-readable name for the container, which can be used instead of a hash value.

Here’s a general approach based on the available data sources:

  1. Use gen_containers Method: The gen_containers method in the DataHub Python SDK allows you to specify various parameters for a container, including qualified_name, description, owner_urn, external_url, and more.

  2. Specify qualified_name: By providing a qualified_name, you can ensure that the container URN is more readable and not just a hash. This name will be used in the URN generation process.

  3. Example Code:

    from datahub.emitter.mcp_builder import gen_containers
    
    # Define your container key and other parameters
    container_key = "your_container_key"
    name = "Your Container Name"
    sub_types = ["Database"]  # Example subtype
    qualified_name = "your_readable_name"
    
    # Generate the container with a qualified name
    containers = gen_containers(
        container_key=container_key,
        name=name,
        sub_types=sub_types,
        qualified_name=qualified_name
    )
    
    # Emit the generated containers
    for container in containers:
        conn.emit(container)
    
  4. Emit the Container: Use the conn.emit() method to send the generated container metadata to DataHub.

This approach should help you create containers with more readable URNs instead of hash values. If you need more detailed instructions or have specific questions about this process, feel free to ask!

Sources:

  • <Builder | DataHub Python SDK Builder> :+1: 0 button :-1: 0 button :bulb: Hint: Mention @runllm in the thread for followups.

Hi :slight_smile: ! Did this interaction resolve your issue or do you still require assistance ?

Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look :slight_smile:

Hope this helps ! ><

<@U06TM7M40RK> can you give me an example of how to pass in a string to qualifiedname so that the URN is not a guid?

I’m not able to find any data sources to answer your question. You might want to specify the context or format of the string you intend to use for the qualified_name. If this doesn’t help, please reach out to the DataHub support team for further assistance. :+1: 0 button :-1: 0 button :bulb: Hint: Mention @runllm in the thread for followups.

Hello, I am ingesting lots of data from dbt using a combination of the dbt ingestion and a custom ingester. Since DBT does not recognise databases, the custom ingester creates database container entities. The urn of the entity is a hash value and not clear text. How do I configure mcp_builder.gen_containers() to define the url of the container to not be a hash, as is possible via the GUI? Give me an example with real string values <@U06TM7M40RK>

To configure mcp_builder.gen_containers() to define the URL of the container with a clear text instead of a hash, you can use the qualified_name parameter. This parameter allows you to specify a human-readable name for the container, which will be used in the URN instead of a hash.

Here’s an example of how you can achieve this:

from datahub.emitter.mcp_builder import gen_containers
from datahub.emitter.rest_emitter import DatahubRestEmitter

# Initialize the DataHub REST emitter
emitter = DatahubRestEmitter(gms_server="http://localhost:8080")

# Define your container key and other parameters
container_key = "your_container_key"  # This should be a unique identifier for your container
name = "MyDatabaseContainer"
sub_types = ["Database"]  # Example subtype
qualified_name = "my_database_container"  # Human-readable name

# Generate the container with a qualified name
containers = gen_containers(
    container_key=container_key,
    name=name,
    sub_types=sub_types,
    qualified_name=qualified_name
)

# Emit the generated containers
for container in containers:
    emitter.emit(container)

print("Container with human-readable URN created successfully.")

Explanation:

  • Container Key: This is a unique identifier for your container. It can be any string that uniquely identifies the container within your system.
  • Name: The display name for the container.
  • Sub Types: A list of subtypes for the container, such as “Database”.
  • Qualified Name: This is the human-readable name that will be used in the URN instead of a hash. It should be unique and descriptive.

By setting the qualified_name, you ensure that the URN generated for the container is based on this human-readable string rather than a hash, making it easier to identify and manage.

Would you like more detailed steps or troubleshooting tips?

Sources:

Hi :slight_smile: ! Did this interaction resolve your issue or do you still require assistance ?

Just checking in a final time, Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look at this thread :slight_smile:

Hope this helps ! ><