This message was deleted.
Hey there! Make sure your message includes the following information if relevant, so we can help more effectively!
<@U06TM7M40RK> Can you help with the questions in this thread?
_Please continue to tag RunLLM (e.g <@U06TM7M40RK>) to have it continue generating responses!
To address the issue of displaying only the table name in the UI instead of the complete path, you can customize the browse paths aspect in DataHub. This involves modifying the metadata ingestion process to set custom browse paths for your datasets.
Based on the provided data sources, you can achieve this by emitting a custom browse path aspect from your ingestion source code. Here is a Python code snippet that demonstrates how to set a custom browse path for a dataset:
from datahub.emitter.mce_builder import make_dataset_urn
from datahub.metadata.schema_classes import MetadataChangeProposalWrapper, BrowsePathsClass
# Define the custom browse path
browse_path = BrowsePathsClass(
paths=["/sql_server/instance_name"]
)
# Create a MetadataChangeProposalWrapper for the dataset
mcp = MetadataChangeProposalWrapper(
entityType="dataset",
changeType="UPSERT",
entityUrn=make_dataset_urn(platform="sql_server", name="table_name", env="PROD"),
aspectName="browsePaths",
aspect=browse_path,
)
# Emit the MetadataChangeProposalWrapper
# Assuming you have an emitter instance
emitter.emit(mcp)
In this example:
- Replace
"sql_server"
with your actual platform name. - Replace
"instance_name"
with your actual instance name. - Replace
"table_name"
with your actual table name.
This code sets the browse path to /sql_server/instance_name
, which should display only the table name in the UI under the specified path.
Would you like more details on any specific part of this process?
Sources:
- <Issues · datahub-project/datahub · GitHub files as datasets in Datahub>
- <https://github.com/datahub-project/datahub/blob/master/docs/advanced/browse-paths-upgrade.md|Browse Paths Upgrade (August 2022)> 0 button 0 button Hint: Mention @runllm in the thread for followups.
Certainly! Below is a full sample code for setting custom browse paths for a dataset on the MSSQL platform. This example demonstrates how to configure the ingestion process to display only the table name in the UI.
from datahub.emitter.mce_builder import make_dataset_urn
from datahub.metadata.schema_classes import MetadataChangeProposalWrapper, BrowsePathsClass
from datahub.emitter.rest_emitter import DatahubRestEmitter
# Initialize the DataHub REST emitter
emitter = DatahubRestEmitter(gms_server="http://localhost:8080")
# Define the custom browse path
browse_path = BrowsePathsClass(
paths=["/mssql/instance_name"]
)
# Create a MetadataChangeProposalWrapper for the dataset
mcp = MetadataChangeProposalWrapper(
entityType="dataset",
changeType="UPSERT",
entityUrn=make_dataset_urn(platform="mssql", name="table_name", env="PROD"),
aspectName="browsePaths",
aspect=browse_path,
)
# Emit the MetadataChangeProposalWrapper
emitter.emit(mcp)
print("Custom browse path set successfully.")
Steps to Run the Code:
-
Install DataHub Python Client:
Ensure you have the DataHub Python client installed. You can install it using pip:pip install acryl-datahub
-
Configure the Emitter:
Update thegms_server
URL to point to your DataHub instance. The default ishttp://localhost:8080
. -
Customize the Browse Path:
- Replace
"mssql"
with your actual platform name. - Replace
"instance_name"
with your actual instance name. - Replace
"table_name"
with your actual table name.
- Replace
-
Run the Script:
Execute the script to set the custom browse path for your dataset.
This code sets the browse path to /mssql/instance_name
, which should display only the table name in the UI under the specified path.
Would you like more detailed steps or troubleshooting tips?
Sources:
- <Issues · datahub-project/datahub · GitHub files as datasets in Datahub> 0 button 0 button Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK> - can you please help to create python code for showing the Sql Server on UI as below screenshot?
![attachment]({‘ID’: ‘F07L2QGBHTK’, ‘EDITABLE’: False, ‘IS_EXTERNAL’: False, ‘USER_ID’: ‘U07K90G6431’, ‘CREATED’: ‘2024-09-03 14:38:17+00:00’, ‘PERMALINK’: ‘Slack’, ‘EXTERNAL_TYPE’: ‘’, ‘TIMESTAMPS’: ‘2024-09-03 14:38:17+00:00’, ‘MODE’: ‘hosted’, ‘DISPLAY_AS_BOT’: False, ‘PRETTY_TYPE’: ‘JPEG’, ‘NAME’: ‘Image from iOS.jpg’, ‘IS_PUBLIC’: True, ‘PREVIEW_HIGHLIGHT’: None, ‘MIMETYPE’: ‘image/jpeg’, ‘PERMALINK_PUBLIC’: ‘https://slack-files.com/TUMKD5EGJ-F07L2QGBHTK-cfce9ea789’, ‘FILETYPE’: ‘jpg’, ‘EDIT_LINK’: None, ‘URL_PRIVATE’: ‘Slack’, ‘HAS_RICH_PREVIEW’: False, ‘TITLE’: ‘Image from iOS’, ‘IS_STARRED’: False, ‘PREVIEW_IS_TRUNCATED’: None, ‘URL_PRIVATE_DOWNLOAD’: ‘Slack’, ‘PREVIEW’: None, ‘PUBLIC_URL_SHARED’: False, ‘MESSAGE_TS’: ‘1725374301.444579’, ‘PARENT_MESSAGE_TS’: ‘1725370434.848879’, ‘MESSAGE_CHANNEL_ID’: ‘CUMUWQU66’, ‘_FIVETRAN_DELETED’: False, ‘LINES_MORE’: None, ‘LINES’: None, ‘SIZE’: 26528, ‘_FIVETRAN_SYNCED’: ‘2024-09-08 08:21:53.414000+00:00’})
Sure thing! I’m looking through the available data to answer your question. Hold on a second…
<@U06TM7M40RK>
Can you Please help to give a complete python code sample for Sql Server ingestion and show it properly on UI with platform instance along with schema and database?
Certainly! Below is a complete Python code sample for ingesting metadata from a SQL Server (MSSQL) database into DataHub and configuring it to display properly on the UI with platform instance, schema, and database.
Step-by-Step Guide:
-
Install Required Packages:
Ensure you have the DataHub Python client and MSSQL ingestion plugin installed:pip install acryl-datahub acryl-datahub[mssql]
-
Create the Ingestion Recipe:
Create a YAML file (mssql_ingestion_recipe.yaml
) to configure the ingestion process.source: type: mssql config: host_port: "localhost:1433" # Update with your SQL Server host and port database: "DemoDatabase" # Update with your database name username: "user" # Update with your username password: "pass" # Update with your password options: connect_args: TrustServerCertificate: "Yes" Encrypt: "yes" sink: type: "datahub-rest" config: server: "http://localhost:8080" # Update with your DataHub server URL
-
Run the Ingestion:
Use the DataHub CLI to run the ingestion:datahub ingest -c mssql_ingestion_recipe.yaml
-
Customize Browse Paths:
After ingestion, you can customize the browse paths to display the platform instance, schema, and database properly.from datahub.emitter.mce_builder import make_dataset_urn from datahub.metadata.schema_classes import MetadataChangeProposalWrapper, BrowsePathsClass from datahub.emitter.rest_emitter import DatahubRestEmitter # Initialize the DataHub REST emitter emitter = DatahubRestEmitter(gms_server="http://localhost:8080") # Define the custom browse path browse_path = BrowsePathsClass( paths=["/mssql/DemoDatabase/schema_name/table_name"] ) # Create a MetadataChangeProposalWrapper for the dataset mcp = MetadataChangeProposalWrapper( entityType="dataset", changeType="UPSERT", entityUrn=make_dataset_urn(platform="mssql", name="schema_name.table_name", env="PROD"), aspectName="browsePaths", aspect=browse_path, ) # Emit the MetadataChangeProposalWrapper emitter.emit(mcp) print("Custom browse path set successfully.")
Full Python Code Sample:
import datahub.emitter.mce_builder as builder
from datahub.metadata.schema_classes import MetadataChangeProposalWrapper, BrowsePathsClass
from datahub.emitter.rest_emitter import DatahubRestEmitter
# Initialize the DataHub REST emitter
emitter = DatahubRestEmitter(gms_server="<http://localhost:8080>")
# Define the custom browse path
browse_path = BrowsePathsClass(
paths=["/mssql/DemoDatabase/schema_name/table_name"]
)
# Create a MetadataChangeProposalWrapper for the dataset
mcp = MetadataChangeProposalWrapper(
entityType="dataset",
changeType="UPSERT",
entityUrn=builder.make_dataset_urn(platform="mssql", name="schema_name.table_name", env="PROD"),
aspectName="browsePaths",
aspect=browse_path,
)
# Emit the MetadataChangeProposalWrapper
emitter.emit(mcp)
print("Custom browse path set successfully.")
Explanation:
- Ingestion Recipe: The YAML file configures the MSSQL source and DataHub REST sink.
- Custom Browse Path: The Python script sets a custom browse path to display the platform instance, schema, and database properly in the UI.
Would you like more detailed steps or troubleshooting tips?
Sources:
- <https://datahubproject.io/docs/0.13.1/generated/ingestion/sources/mssql|Microsoft SQL Server Ingestion Documentation> 0 button 0 button Hint: Mention @runllm in the thread for followups.
To insert a dataset with platform, database, and schema information and display it properly in the DataHub UI, you need to follow these steps:
- Ingest Metadata from SQL Server: Use the DataHub ingestion framework to ingest metadata from your SQL Server database.
- Customize Browse Paths: Set custom browse paths to ensure the platform, database, and schema are displayed at the top level in the UI.
Below is a complete Python code sample that demonstrates how to achieve this.
Step-by-Step Guide
Step 1: Install Required Packages
Ensure you have the DataHub Python client and MSSQL ingestion plugin installed:
pip install acryl-datahub acryl-datahub[mssql]
Step 2: Create the Ingestion Recipe
Create a YAML file (mssql_ingestion_recipe.yaml
) to configure the ingestion process.
source:
type: mssql
config:
host_port: "localhost:1433" # Update with your SQL Server host and port
database: "DemoDatabase" # Update with your database name
username: "user" # Update with your username
password: "pass" # Update with your password
options:
connect_args:
TrustServerCertificate: "Yes"
Encrypt: "yes"
sink:
type: "datahub-rest"
config:
server: "http://localhost:8080" # Update with your DataHub server URL
Step 3: Run the Ingestion
Use the DataHub CLI to run the ingestion:
datahub ingest -c mssql_ingestion_recipe.yaml
Step 4: Customize Browse Paths
After ingestion, you can customize the browse paths to display the platform, database, and schema properly.
import datahub.emitter.mce_builder as builder
from datahub.metadata.schema_classes import MetadataChangeProposalWrapper, BrowsePathsClass
from datahub.emitter.rest_emitter import DatahubRestEmitter
# Initialize the DataHub REST emitter
emitter = DatahubRestEmitter(gms_server="http://localhost:8080")
# Define the custom browse path
browse_path = BrowsePathsClass(
paths=["/mssql/DemoDatabase/schema_name/table_name"]
)
# Create a MetadataChangeProposalWrapper for the dataset
mcp = MetadataChangeProposalWrapper(
entityType="dataset",
changeType="UPSERT",
entityUrn=builder.make_dataset_urn(platform="mssql", name="schema_name.table_name", env="PROD"),
aspectName="browsePaths",
aspect=browse_path,
)
# Emit the MetadataChangeProposalWrapper
emitter.emit(mcp)
print("Custom browse path set successfully.")
Full Python Code Sample
import datahub.emitter.mce_builder as builder
from datahub.metadata.schema_classes import MetadataChangeProposalWrapper, BrowsePathsClass
from datahub.emitter.rest_emitter import DatahubRestEmitter
# Initialize the DataHub REST emitter
emitter = DatahubRestEmitter(gms_server="<http://localhost:8080>")
# Define the custom browse path
browse_path = BrowsePathsClass(
paths=["/mssql/DemoDatabase/schema_name/table_name"]
)
# Create a MetadataChangeProposalWrapper for the dataset
mcp = MetadataChangeProposalWrapper(
entityType="dataset",
changeType="UPSERT",
entityUrn=builder.make_dataset_urn(platform="mssql", name="schema_name.table_name", env="PROD"),
aspectName="browsePaths",
aspect=browse_path,
)
# Emit the MetadataChangeProposalWrapper
emitter.emit(mcp)
print("Custom browse path set successfully.")
Explanation
- Ingestion Recipe: The YAML file configures the MSSQL source and DataHub REST sink.
- Custom Browse Path: The Python script sets a custom browse path to display the platform, database, and schema properly in the UI.
Summary
- Configure Ingestion: Create a YAML configuration file with the necessary connection details and options for SQL Server.
- Run Ingestion: Use the DataHub CLI to run the ingestion.
- Customize Browse Paths: Use the provided Python script to set custom browse paths for your datasets.
Would you like more detailed steps or troubleshooting tips?
Sources:
- https://datahubspace.slack.com/archives/CUMUWQU66/p1721125582.597279|https://datahubspace.slack.com/archives/CUMUWQU66/p1721125582.597279
- https://datahubspace.slack.com/archives/CUMUWQU66/p1718256813.887969|https://datahubspace.slack.com/archives/CUMUWQU66/p1718256813.887969 0 button 0 button Hint: Mention @runllm in the thread for followups.
To create a URN (Uniform Resource Name) with platform, database, and container/schema in DataHub, you need to follow the URN format specified by DataHub. The URN format for a dataset typically includes the platform, the dataset name (which can include the database and schema), and the environment.
URN Format
The general format for a dataset URN in DataHub is:
urn:li:dataset:(urn:li:dataPlatform:<platform>,<database>.<schema>.<table>,<env>)
Example
For a SQL Server (MSSQL) dataset, the URN might look like this:
urn:li:dataset:(urn:li:dataPlatform:mssql,DemoDatabase.schema_name.table_name,PROD)
Python Code to Create URN
Here is a Python code snippet to create a URN with platform, database, and schema:
from datahub.emitter.mce_builder import make_dataset_urn
# Define the platform, database, schema, table, and environment
platform = "mssql"
database = "DemoDatabase"
schema = "schema_name"
table = "table_name"
env = "PROD"
# Create the dataset URN
dataset_urn = make_dataset_urn(platform=platform, name=f"{database}.{schema}.{table}", env=env)
print(f"Dataset URN: {dataset_urn}")
Explanation
- Platform: The data platform (e.g., “mssql” for SQL Server).
- Database: The name of the database (e.g., “DemoDatabase”).
- Schema: The schema or container name (e.g., “schema_name”).
- Table: The table name (e.g., “table_name”).
- Environment: The environment (e.g., “PROD” for production).
Full Example
Here is a complete example that includes creating the URN and emitting it to DataHub:
from datahub.emitter.mce_builder import make_dataset_urn
from datahub.metadata.schema_classes import MetadataChangeProposalWrapper, BrowsePathsClass
from datahub.emitter.rest_emitter import DatahubRestEmitter
# Initialize the DataHub REST emitter
emitter = DatahubRestEmitter(gms_server="http://localhost:8080")
# Define the platform, database, schema, table, and environment
platform = "mssql"
database = "DemoDatabase"
schema = "schema_name"
table = "table_name"
env = "PROD"
# Create the dataset URN
dataset_urn = make_dataset_urn(platform=platform, name=f"{database}.{schema}.{table}", env=env)
# Define the custom browse path
browse_path = BrowsePathsClass(
paths=[f"/{platform}/{database}/{schema}/{table}"]
)
# Create a MetadataChangeProposalWrapper for the dataset
mcp = MetadataChangeProposalWrapper(
entityType="dataset",
changeType="UPSERT",
entityUrn=dataset_urn,
aspectName="browsePaths",
aspect=browse_path,
)
# Emit the MetadataChangeProposalWrapper
emitter.emit(mcp)
print(f"Dataset URN: {dataset_urn}")
print("Custom browse path set successfully.")
Explanation
- Dataset URN: The URN is created using the
make_dataset_urn
function with the platform, database, schema, table, and environment. - Custom Browse Path: The browse path is set to display the platform, database, schema, and table properly in the UI.
- MetadataChangeProposalWrapper: This wrapper is used to emit the custom browse path to DataHub.
Would you like more detailed steps or troubleshooting tips?
Sources:
- <https://github.com/datahub-project/datahub/blob/master/docs/what/urn.md|URN Documentation> 0 button 0 button Hint: Mention @runllm in the thread for followups.