Changing Entity Browse Path in DataHub using Python SDK

Original Slack Thread

<@U06TM7M40RK> could you please provide an example to change the browse path of an entity using python SDK?

To change the browse path of an entity using the Python SDK, you can use the MetadataChangeProposalWrapper to emit a custom browse path. Here’s an example of how you can achieve this:

from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.metadata.schema_classes import BrowsePathsClass
from datahub.ingestion.graph.client import DatahubClientConfig, DataHubGraph

# Configure the DataHub client
gms_endpoint = "http://localhost:8080"  # Replace with your DataHub GMS endpoint
graph = DataHubGraph(DatahubClientConfig(server=gms_endpoint))

# Define the new browse path
new_path = BrowsePathsClass(paths=["/level1/level2/"])

# Specify the URN of the entity you want to update
urn = "urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)"

# Create a MetadataChangeProposalWrapper with the new browse path
mcp = MetadataChangeProposalWrapper(
    aspect=new_path,
    entityType="dataset",
    changeType="UPSERT",
    entityUrn=urn,
    aspectName="browsePaths",
)

# Emit the MCP to update the browse path
graph.emit(mcp)

In this example:

  • Replace gms_endpoint with the URL of your DataHub GMS endpoint.
  • Replace urn with the URN of the entity you want to update.
  • Modify new_path to reflect the desired browse path structure.

This code will update the browse path of the specified entity in DataHub. If you need further assistance or more detailed steps, feel free to ask!

Sources:

And using BrowsePathsV2Class? <@U06TM7M40RK>

<@U06TM7M40RK>

To change the browse path of an entity using the BrowsePathsV2Class in the Python SDK, you can follow a similar approach to the previous example, but with the BrowsePathsV2Class. Here’s how you can do it:

from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.metadata.schema_classes import BrowsePathsV2Class, BrowsePathEntryClass
from datahub.ingestion.graph.client import DatahubClientConfig, DataHubGraph

# Configure the DataHub client
gms_endpoint = "http://localhost:8080"  # Replace with your DataHub GMS endpoint
graph = DataHubGraph(DatahubClientConfig(server=gms_endpoint))

# Define the new browse path using BrowsePathsV2Class
new_path_v2 = BrowsePathsV2Class(
    path=[
        BrowsePathEntryClass(id="level1"),
        BrowsePathEntryClass(id="level2"),
    ]
)

# Specify the URN of the entity you want to update
urn = "urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)"

# Create a MetadataChangeProposalWrapper with the new browse path
mcp = MetadataChangeProposalWrapper(
    aspect=new_path_v2,
    entityType="dataset",
    changeType="UPSERT",
    entityUrn=urn,
    aspectName="browsePathsV2",
)

# Emit the MCP to update the browse path
graph.emit(mcp)

In this example:

  • Replace gms_endpoint with the URL of your DataHub GMS endpoint.
  • Replace urn with the URN of the entity you want to update.
  • Modify new_path_v2 to reflect the desired browse path structure using BrowsePathsV2Class.

This code will update the browse path of the specified entity in DataHub using the BrowsePathsV2Class. If you need further assistance or more detailed steps, feel free to ask!

Sources:

Within the BrowsePathsV2Class is it possible to use URNs? <@U06TM7M40RK>

Yes, within the BrowsePathsV2Class, it is possible to use URNs. The BrowsePathEntryClass used in BrowsePathsV2Class allows for an optional urn field. This field can be populated with a URN if there is an entity in your DataHub instance that belongs in that entity’s browse path. This is particularly useful for representing hierarchical relationships where entities are part of a container or another entity.

Here’s an example of how you can use URNs within BrowsePathsV2Class:

from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.metadata.schema_classes import BrowsePathsV2Class, BrowsePathEntryClass
from datahub.ingestion.graph.client import DatahubClientConfig, DataHubGraph

# Configure the DataHub client
gms_endpoint = "http://localhost:8080"  # Replace with your DataHub GMS endpoint
graph = DataHubGraph(DatahubClientConfig(server=gms_endpoint))

# Define the new browse path using BrowsePathsV2Class with URNs
new_path_v2 = BrowsePathsV2Class(
    path=[
        BrowsePathEntryClass(id="level1", urn="urn:li:container:level1Container"),
        BrowsePathEntryClass(id="level2", urn="urn:li:container:level2Container"),
    ]
)

# Specify the URN of the entity you want to update
urn = "urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)"

# Create a MetadataChangeProposalWrapper with the new browse path
mcp = MetadataChangeProposalWrapper(
    aspect=new_path_v2,
    entityType="dataset",
    changeType="UPSERT",
    entityUrn=urn,
    aspectName="browsePathsV2",
)

# Emit the MCP to update the browse path
graph.emit(mcp)

In this example:

  • The urn field in BrowsePathEntryClass is used to specify the URN of the container or entity that is part of the browse path.
  • This allows the browse path to be more semantically meaningful and linked to actual entities within DataHub.

If you need further assistance or more detailed steps, feel free to ask!

Sources:

Hi :slight_smile: ! Did this interaction resolve your issue or do you still require assistance ?

Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look :slight_smile:

Hope this helps ! ><