Creating Custom Browse Paths for Datasets in DataHub using Python Codes

user-4 · September 9, 2024, 12:04am

This message was deleted. attachment

datahub_team · September 9, 2024, 12:04am

Hey there! Make sure your message includes the following information if relevant, so we can help more effectively!

<@U06TM7M40RK> Can you help with the questions in this thread?

_Please continue to tag RunLLM (e.g <@U06TM7M40RK>) to have it continue generating responses!

user-1 · September 9, 2024, 12:04am

To address the issue of displaying only the table name in the UI instead of the complete path, you can customize the browse paths aspect in DataHub. This involves modifying the metadata ingestion process to set custom browse paths for your datasets.

Based on the provided data sources, you can achieve this by emitting a custom browse path aspect from your ingestion source code. Here is a Python code snippet that demonstrates how to set a custom browse path for a dataset:

from datahub.emitter.mce_builder import make_dataset_urn
from datahub.metadata.schema_classes import MetadataChangeProposalWrapper, BrowsePathsClass

# Define the custom browse path
browse_path = BrowsePathsClass(
    paths=["/sql_server/instance_name"]
)

# Create a MetadataChangeProposalWrapper for the dataset
mcp = MetadataChangeProposalWrapper(
    entityType="dataset",
    changeType="UPSERT",
    entityUrn=make_dataset_urn(platform="sql_server", name="table_name", env="PROD"),
    aspectName="browsePaths",
    aspect=browse_path,
)

# Emit the MetadataChangeProposalWrapper
# Assuming you have an emitter instance
emitter.emit(mcp)

In this example:

Replace "sql_server" with your actual platform name.
Replace "instance_name" with your actual instance name.
Replace "table_name" with your actual table name.

This code sets the browse path to /sql_server/instance_name, which should display only the table name in the UI under the specified path.

Would you like more details on any specific part of this process?

Sources:

<Issues · datahub-project/datahub · GitHub files as datasets in Datahub>
<https://github.com/datahub-project/datahub/blob/master/docs/advanced/browse-paths-upgrade.md|Browse Paths Upgrade (August 2022)> 0 button 0 button Hint: Mention @runllm in the thread for followups.

user-1 · September 9, 2024, 12:04am

Certainly! Below is a full sample code for setting custom browse paths for a dataset on the MSSQL platform. This example demonstrates how to configure the ingestion process to display only the table name in the UI.

from datahub.emitter.mce_builder import make_dataset_urn
from datahub.metadata.schema_classes import MetadataChangeProposalWrapper, BrowsePathsClass
from datahub.emitter.rest_emitter import DatahubRestEmitter

# Initialize the DataHub REST emitter
emitter = DatahubRestEmitter(gms_server="http://localhost:8080")

# Define the custom browse path
browse_path = BrowsePathsClass(
    paths=["/mssql/instance_name"]
)

# Create a MetadataChangeProposalWrapper for the dataset
mcp = MetadataChangeProposalWrapper(
    entityType="dataset",
    changeType="UPSERT",
    entityUrn=make_dataset_urn(platform="mssql", name="table_name", env="PROD"),
    aspectName="browsePaths",
    aspect=browse_path,
)

# Emit the MetadataChangeProposalWrapper
emitter.emit(mcp)

print("Custom browse path set successfully.")

Steps to Run the Code:

Install DataHub Python Client:
Ensure you have the DataHub Python client installed. You can install it using pip:
```
pip install acryl-datahub
```
Configure the Emitter:
Update the gms_server URL to point to your DataHub instance. The default is http://localhost:8080.
Customize the Browse Path:
- Replace "mssql" with your actual platform name.
- Replace "instance_name" with your actual instance name.
- Replace "table_name" with your actual table name.
Run the Script:
Execute the script to set the custom browse path for your dataset.

This code sets the browse path to /mssql/instance_name, which should display only the table name in the UI under the specified path.

Would you like more detailed steps or troubleshooting tips?

Sources:

<Issues · datahub-project/datahub · GitHub files as datasets in Datahub> 0 button 0 button Hint: Mention @runllm in the thread for followups.

user-2 · September 9, 2024, 12:04am

<@U06TM7M40RK> - can you please help to create python code for showing the Sql Server on UI as below screenshot?

![attachment]({‘ID’: ‘F07L2QGBHTK’, ‘EDITABLE’: False, ‘IS_EXTERNAL’: False, ‘USER_ID’: ‘U07K90G6431’, ‘CREATED’: ‘2024-09-03 14:38:17+00:00’, ‘PERMALINK’: ‘Slack’, ‘EXTERNAL_TYPE’: ‘’, ‘TIMESTAMPS’: ‘2024-09-03 14:38:17+00:00’, ‘MODE’: ‘hosted’, ‘DISPLAY_AS_BOT’: False, ‘PRETTY_TYPE’: ‘JPEG’, ‘NAME’: ‘Image from iOS.jpg’, ‘IS_PUBLIC’: True, ‘PREVIEW_HIGHLIGHT’: None, ‘MIMETYPE’: ‘image/jpeg’, ‘PERMALINK_PUBLIC’: ‘https://slack-files.com/TUMKD5EGJ-F07L2QGBHTK-cfce9ea789’, ‘FILETYPE’: ‘jpg’, ‘EDIT_LINK’: None, ‘URL_PRIVATE’: ‘Slack’, ‘HAS_RICH_PREVIEW’: False, ‘TITLE’: ‘Image from iOS’, ‘IS_STARRED’: False, ‘PREVIEW_IS_TRUNCATED’: None, ‘URL_PRIVATE_DOWNLOAD’: ‘Slack’, ‘PREVIEW’: None, ‘PUBLIC_URL_SHARED’: False, ‘MESSAGE_TS’: ‘1725374301.444579’, ‘PARENT_MESSAGE_TS’: ‘1725370434.848879’, ‘MESSAGE_CHANNEL_ID’: ‘CUMUWQU66’, ‘_FIVETRAN_DELETED’: False, ‘LINES_MORE’: None, ‘LINES’: None, ‘SIZE’: 26528, ‘_FIVETRAN_SYNCED’: ‘2024-09-08 08:21:53.414000+00:00’})

user-1 · September 9, 2024, 12:04am

Sure thing! I’m looking through the available data to answer your question. Hold on a second…

user-2 · September 9, 2024, 12:04am

<@U06TM7M40RK>
Can you Please help to give a complete python code sample for Sql Server ingestion and show it properly on UI with platform instance along with schema and database?

user-1 · September 9, 2024, 12:04am

Certainly! Below is a complete Python code sample for ingesting metadata from a SQL Server (MSSQL) database into DataHub and configuring it to display properly on the UI with platform instance, schema, and database.

Step-by-Step Guide:

Install Required Packages:
Ensure you have the DataHub Python client and MSSQL ingestion plugin installed:
```
pip install acryl-datahub acryl-datahub[mssql]
```

Create the Ingestion Recipe:
Create a YAML file (mssql_ingestion_recipe.yaml) to configure the ingestion process.

source:
  type: mssql
  config:
    host_port: "localhost:1433"  # Update with your SQL Server host and port
    database: "DemoDatabase"     # Update with your database name
    username: "user"             # Update with your username
    password: "pass"             # Update with your password
    options:
      connect_args:
        TrustServerCertificate: "Yes"
        Encrypt: "yes"
sink:
  type: "datahub-rest"
  config:
    server: "http://localhost:8080"  # Update with your DataHub server URL

Run the Ingestion:
Use the DataHub CLI to run the ingestion:
```
datahub ingest -c mssql_ingestion_recipe.yaml
```

Customize Browse Paths:
After ingestion, you can customize the browse paths to display the platform instance, schema, and database properly.

from datahub.emitter.mce_builder import make_dataset_urn
from datahub.metadata.schema_classes import MetadataChangeProposalWrapper, BrowsePathsClass
from datahub.emitter.rest_emitter import DatahubRestEmitter

# Initialize the DataHub REST emitter
emitter = DatahubRestEmitter(gms_server="http://localhost:8080")

# Define the custom browse path
browse_path = BrowsePathsClass(
    paths=["/mssql/DemoDatabase/schema_name/table_name"]
)

# Create a MetadataChangeProposalWrapper for the dataset
mcp = MetadataChangeProposalWrapper(
    entityType="dataset",
    changeType="UPSERT",
    entityUrn=make_dataset_urn(platform="mssql", name="schema_name.table_name", env="PROD"),
    aspectName="browsePaths",
    aspect=browse_path,
)

# Emit the MetadataChangeProposalWrapper
emitter.emit(mcp)

print("Custom browse path set successfully.")

Full Python Code Sample:

user-1 · September 9, 2024, 12:04am

import datahub.emitter.mce_builder as builder
from datahub.metadata.schema_classes import MetadataChangeProposalWrapper, BrowsePathsClass
from datahub.emitter.rest_emitter import DatahubRestEmitter

# Initialize the DataHub REST emitter
emitter = DatahubRestEmitter(gms_server="<http://localhost:8080>")

# Define the custom browse path
browse_path = BrowsePathsClass(
    paths=["/mssql/DemoDatabase/schema_name/table_name"]
)

# Create a MetadataChangeProposalWrapper for the dataset
mcp = MetadataChangeProposalWrapper(
    entityType="dataset",
    changeType="UPSERT",
    entityUrn=builder.make_dataset_urn(platform="mssql", name="schema_name.table_name", env="PROD"),
    aspectName="browsePaths",
    aspect=browse_path,
)

# Emit the MetadataChangeProposalWrapper
emitter.emit(mcp)

print("Custom browse path set successfully.")

Explanation:

Ingestion Recipe: The YAML file configures the MSSQL source and DataHub REST sink.
Custom Browse Path: The Python script sets a custom browse path to display the platform instance, schema, and database properly in the UI.

Would you like more detailed steps or troubleshooting tips?

Sources:

<https://datahubproject.io/docs/0.13.1/generated/ingestion/sources/mssql|Microsoft SQL Server Ingestion Documentation> 0 button 0 button Hint: Mention @runllm in the thread for followups.

user-1 · September 9, 2024, 12:04am

To insert a dataset with platform, database, and schema information and display it properly in the DataHub UI, you need to follow these steps:

Ingest Metadata from SQL Server: Use the DataHub ingestion framework to ingest metadata from your SQL Server database.
Customize Browse Paths: Set custom browse paths to ensure the platform, database, and schema are displayed at the top level in the UI.

Below is a complete Python code sample that demonstrates how to achieve this.

Step-by-Step Guide

Step 1: Install Required Packages

Ensure you have the DataHub Python client and MSSQL ingestion plugin installed:

pip install acryl-datahub acryl-datahub[mssql]

Step 2: Create the Ingestion Recipe

Create a YAML file (mssql_ingestion_recipe.yaml) to configure the ingestion process.

source:
  type: mssql
  config:
    host_port: "localhost:1433"  # Update with your SQL Server host and port
    database: "DemoDatabase"     # Update with your database name
    username: "user"             # Update with your username
    password: "pass"             # Update with your password
    options:
      connect_args:
        TrustServerCertificate: "Yes"
        Encrypt: "yes"
sink:
  type: "datahub-rest"
  config:
    server: "http://localhost:8080"  # Update with your DataHub server URL

Step 3: Run the Ingestion

Use the DataHub CLI to run the ingestion:

datahub ingest -c mssql_ingestion_recipe.yaml

Step 4: Customize Browse Paths

After ingestion, you can customize the browse paths to display the platform, database, and schema properly.

import datahub.emitter.mce_builder as builder
from datahub.metadata.schema_classes import MetadataChangeProposalWrapper, BrowsePathsClass
from datahub.emitter.rest_emitter import DatahubRestEmitter

# Initialize the DataHub REST emitter
emitter = DatahubRestEmitter(gms_server="http://localhost:8080")

# Define the custom browse path
browse_path = BrowsePathsClass(
    paths=["/mssql/DemoDatabase/schema_name/table_name"]
)

# Create a MetadataChangeProposalWrapper for the dataset
mcp = MetadataChangeProposalWrapper(
    entityType="dataset",
    changeType="UPSERT",
    entityUrn=builder.make_dataset_urn(platform="mssql", name="schema_name.table_name", env="PROD"),
    aspectName="browsePaths",
    aspect=browse_path,
)

# Emit the MetadataChangeProposalWrapper
emitter.emit(mcp)

print("Custom browse path set successfully.")

Full Python Code Sample

user-1 · September 9, 2024, 12:04am

import datahub.emitter.mce_builder as builder
from datahub.metadata.schema_classes import MetadataChangeProposalWrapper, BrowsePathsClass
from datahub.emitter.rest_emitter import DatahubRestEmitter

# Initialize the DataHub REST emitter
emitter = DatahubRestEmitter(gms_server="<http://localhost:8080>")

# Define the custom browse path
browse_path = BrowsePathsClass(
    paths=["/mssql/DemoDatabase/schema_name/table_name"]
)

# Create a MetadataChangeProposalWrapper for the dataset
mcp = MetadataChangeProposalWrapper(
    entityType="dataset",
    changeType="UPSERT",
    entityUrn=builder.make_dataset_urn(platform="mssql", name="schema_name.table_name", env="PROD"),
    aspectName="browsePaths",
    aspect=browse_path,
)

# Emit the MetadataChangeProposalWrapper
emitter.emit(mcp)

print("Custom browse path set successfully.")

Explanation

Ingestion Recipe: The YAML file configures the MSSQL source and DataHub REST sink.
Custom Browse Path: The Python script sets a custom browse path to display the platform, database, and schema properly in the UI.

Summary

Configure Ingestion: Create a YAML configuration file with the necessary connection details and options for SQL Server.
Run Ingestion: Use the DataHub CLI to run the ingestion.
Customize Browse Paths: Use the provided Python script to set custom browse paths for your datasets.

Would you like more detailed steps or troubleshooting tips?

Sources:

https://datahubspace.slack.com/archives/CUMUWQU66/p1721125582.597279|https://datahubspace.slack.com/archives/CUMUWQU66/p1721125582.597279
https://datahubspace.slack.com/archives/CUMUWQU66/p1718256813.887969|https://datahubspace.slack.com/archives/CUMUWQU66/p1718256813.887969 0 button 0 button Hint: Mention @runllm in the thread for followups.

user-1 · September 9, 2024, 12:04am

To create a URN (Uniform Resource Name) with platform, database, and container/schema in DataHub, you need to follow the URN format specified by DataHub. The URN format for a dataset typically includes the platform, the dataset name (which can include the database and schema), and the environment.

URN Format

The general format for a dataset URN in DataHub is:

urn:li:dataset:(urn:li:dataPlatform:<platform>,<database>.<schema>.<table>,<env>)

Example

For a SQL Server (MSSQL) dataset, the URN might look like this:

urn:li:dataset:(urn:li:dataPlatform:mssql,DemoDatabase.schema_name.table_name,PROD)

Python Code to Create URN

Here is a Python code snippet to create a URN with platform, database, and schema:

from datahub.emitter.mce_builder import make_dataset_urn

# Define the platform, database, schema, table, and environment
platform = "mssql"
database = "DemoDatabase"
schema = "schema_name"
table = "table_name"
env = "PROD"

# Create the dataset URN
dataset_urn = make_dataset_urn(platform=platform, name=f"{database}.{schema}.{table}", env=env)

print(f"Dataset URN: {dataset_urn}")

Explanation

Platform: The data platform (e.g., “mssql” for SQL Server).
Database: The name of the database (e.g., “DemoDatabase”).
Schema: The schema or container name (e.g., “schema_name”).
Table: The table name (e.g., “table_name”).
Environment: The environment (e.g., “PROD” for production).

Full Example

Here is a complete example that includes creating the URN and emitting it to DataHub:

from datahub.emitter.mce_builder import make_dataset_urn
from datahub.metadata.schema_classes import MetadataChangeProposalWrapper, BrowsePathsClass
from datahub.emitter.rest_emitter import DatahubRestEmitter

# Initialize the DataHub REST emitter
emitter = DatahubRestEmitter(gms_server="http://localhost:8080")

# Define the platform, database, schema, table, and environment
platform = "mssql"
database = "DemoDatabase"
schema = "schema_name"
table = "table_name"
env = "PROD"

# Create the dataset URN
dataset_urn = make_dataset_urn(platform=platform, name=f"{database}.{schema}.{table}", env=env)

# Define the custom browse path
browse_path = BrowsePathsClass(
    paths=[f"/{platform}/{database}/{schema}/{table}"]
)

# Create a MetadataChangeProposalWrapper for the dataset
mcp = MetadataChangeProposalWrapper(
    entityType="dataset",
    changeType="UPSERT",
    entityUrn=dataset_urn,
    aspectName="browsePaths",
    aspect=browse_path,
)

# Emit the MetadataChangeProposalWrapper
emitter.emit(mcp)

print(f"Dataset URN: {dataset_urn}")
print("Custom browse path set successfully.")

user-1 · September 9, 2024, 12:04am

Explanation

Dataset URN: The URN is created using the make_dataset_urn function with the platform, database, schema, table, and environment.
Custom Browse Path: The browse path is set to display the platform, database, schema, and table properly in the UI.
MetadataChangeProposalWrapper: This wrapper is used to emit the custom browse path to DataHub.

Would you like more detailed steps or troubleshooting tips?

Sources:

<https://github.com/datahub-project/datahub/blob/master/docs/what/urn.md|URN Documentation> 0 button 0 button Hint: Mention @runllm in the thread for followups.

Topic		Replies	Views
Changing Entity Browse Path in DataHub using Python SDK ingestion	7	25	March 3, 2025
How to Set Custom BrowsePathV2 on Ingestion Using Python SDK in DataHub 0.11.0 and Plans for Transformer Update ingestion	2	47	March 4, 2024
Creating a Custom Ingestion Source in DataHub with CSV Data ingestion	5	52	August 12, 2024
Adding Metadata about Data Source URL to Datasets during Ingestion ingestion	3	123	July 1, 2024
Transforming UI with BrowsePathsV2Class in DataHub ingestion	7	8	October 7, 2024

Creating Custom Browse Paths for Datasets in DataHub using Python Codes

Hey there! Make sure your message includes the following information if relevant, so we can help more effectively!

<@U06TM7M40RK> Can you help with the questions in this thread?

Related topics