<@U06TM7M40RK> hello, provide an example of a python script emitter that soft deletes an entire database in datahub, with comments besides the code explaining what it does
To soft delete an entire database in DataHub using a Python script, you can utilize the DataHub Python SDK. Below is an example script that demonstrates how to achieve this. The script uses the DatahubRestEmitter
to send a soft delete request for a database entity. Comments are included to explain each part of the code.
import logging
from datahub.emitter.mce_builder import make_data_platform_urn, make_dataset_urn
from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.metadata.schema_classes import StatusClass, MetadataChangeProposalWrapper
# Set up logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
# Initialize the DataHub REST emitter
# Replace 'http://localhost:8080' with your DataHub GMS endpoint
emitter = DatahubRestEmitter(gms_server="http://localhost:8080")
# Define the URN for the database you want to soft delete
# Replace 'my_database' and 'my_platform' with your database name and platform
database_urn = make_data_platform_urn(platform="my_platform", instance="my_database")
# Create a MetadataChangeProposalWrapper object for the soft delete
# The Status aspect is used to mark the entity as removed
mcp = MetadataChangeProposalWrapper(
entityUrn=database_urn,
aspect=StatusClass(removed=True)
)
# Emit the metadata change proposal to DataHub
# This will mark the database as soft deleted
emitter.emit(mcp)
logger.info(f"Soft deleted database with URN: {database_urn}")
Explanation:
- Logging Setup: Initializes logging to capture information about the script’s execution.
- DataHub REST Emitter Initialization: Sets up the
DatahubRestEmitter
to communicate with the DataHub instance. Thegms_server
parameter should point to your DataHub GMS endpoint. - Database URN Definition: Constructs the URN for the database you want to soft delete. Replace
'my_database'
and'my_platform'
with the actual database name and platform. - MetadataChangeProposalWrapper Creation: Creates a
MetadataChangeProposalWrapper
object with theStatusClass
aspect set toremoved=True
, indicating a soft delete. - Emit Metadata Change Proposal: Sends the metadata change proposal to DataHub, marking the database as soft deleted.
Would you like more detailed information on any specific part of this process?
Sources:
- https://datahubproject.io/docs/0.14.0/api/tutorials/datasets#expected-outcomes-of-creating-dataset|https://datahubproject.io/docs/0.14.0/api/tutorials/datasets#expected-outcomes-of-creating-dataset
- https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/library/delete_dataset.py|https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/library/delete_dataset.py
0 button
0 button
Hint: Mention @runllm in the thread for followups.
thank you <@U06TM7M40RK>, provide the same example code for a postgres database, and a postgres database called “test_db” and two schemas, one called “testing” and the other "testing_1"e
The DataHub’s documentation I have ingested does not mention how to specifically soft delete a PostgreSQL database with schemas using a Python script. However, you can adapt the general approach for soft deleting a database by constructing the appropriate URNs for the PostgreSQL database and its schemas. You might want to ensure that the URNs for both the database and schemas are correctly formatted according to your DataHub setup. If this doesn’t help, please reach out to the DataHub support team for further assistance. 0 button
0 button
Hint: Mention @runllm in the thread for followups.
By the way, I found these links that might be relevant to what you’re looking for. Let me know if this helps: - Removing Metadata from DataHub - DataHub CLI - Migrate Data from One DataHub Instance to Another
thank you <@U06TM7M40RK>, provide the same example code for a postgres database, and a postgres database called “test_db”
The DataHub’s documentation I have ingested does not mention how to specifically soft delete a PostgreSQL database using a Python script. However, you can adapt the general approach for soft deleting a database by constructing the appropriate URN for the PostgreSQL database. You might want to ensure that the URN for the database is correctly formatted according to your DataHub setup. If this doesn’t help, please reach out to the DataHub support team for further assistance. 0 button
0 button
Hint: Mention @runllm in the thread for followups.
thank you <@U06TM7M40RK>, provide the same example code for a postgres database called “test_db” and two schemas, one called “testing” and the other "testing_1"e
It could be possible that you can adapt the general approach for soft deleting a database and its schemas in DataHub using the Python SDK. Below is an example script that demonstrates how to soft delete a PostgreSQL database named “test_db” and its schemas “testing” and “testing_1”. The script uses the DatahubRestEmitter
to send a soft delete request for each entity.
import logging
from datahub.emitter.mce_builder import make_dataset_urn
from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.metadata.schema_classes import StatusClass, MetadataChangeProposalWrapper
# Set up logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
# Initialize the DataHub REST emitter
# Replace 'http://localhost:8080' with your DataHub GMS endpoint
emitter = DatahubRestEmitter(gms_server="http://localhost:8080")
# Define the URNs for the database and schemas you want to soft delete
# Replace 'postgres' with your platform if different
database_urn = make_dataset_urn(platform="postgres", name="test_db", env="PROD")
schema_urn_1 = make_dataset_urn(platform="postgres", name="test_db.testing", env="PROD")
schema_urn_2 = make_dataset_urn(platform="postgres", name="test_db.testing_1", env="PROD")
# List of URNs to soft delete
urns_to_delete = [database_urn, schema_urn_1, schema_urn_2]
# Iterate over each URN and emit a soft delete request
for urn in urns_to_delete:
# Create a MetadataChangeProposalWrapper object for the soft delete
mcp = MetadataChangeProposalWrapper(
entityUrn=urn,
aspect=StatusClass(removed=True)
)
# Emit the metadata change proposal to DataHub
# This will mark the entity as soft deleted
emitter.emit(mcp)
logger.info(f"Soft deleted entity with URN: {urn}")
thank you <@U06TM7M40RK>, provide the same example code for a postgres database called “test_db”
The DataHub’s documentation I have ingested does not mention how to specifically soft delete a PostgreSQL database using a Python script. However, you can adapt the general approach for soft deleting a database by constructing the appropriate URN for the PostgreSQL database. You might want to ensure that the URN for the database is correctly formatted according to your DataHub setup, typically in the form of urn:li:dataset:(urn:li:dataPlatform:postgres,test_db,<ENV>)
. If this doesn’t help, please reach out to the DataHub support team for further assistance. 0 button
0 button
Hint: Mention @runllm in the thread for followups.
Explanation:
- Logging Setup: Initializes logging to capture information about the script’s execution.
- DataHub REST Emitter Initialization: Sets up the
DatahubRestEmitter
to communicate with the DataHub instance. Thegms_server
parameter should point to your DataHub GMS endpoint. - URN Definitions: Constructs the URNs for the PostgreSQL database and its schemas. Replace
'postgres'
with your actual platform if different. - Soft Delete Process: Iterates over each URN, creates a
MetadataChangeProposalWrapper
object with theStatusClass
aspect set toremoved=True
, and sends the metadata change proposal to DataHub, marking each entity as soft deleted.
Would you like more detailed information on any specific part of this process?
Sources:
- https://datahubproject.io/docs/0.13.1/api/tutorials/datasets#expected-outcomes-of-creating-dataset|https://datahubproject.io/docs/0.13.1/api/tutorials/datasets#expected-outcomes-of-creating-dataset
- https://datahubproject.io/docs/0.14.0/api/tutorials/datasets#expected-outcomes-of-creating-dataset|https://datahubproject.io/docs/0.14.0/api/tutorials/datasets#expected-outcomes-of-creating-dataset
Note: I wasn’t able to find highly relevant data sources, but above are a few potentially relevant links.0 button
0 button
Hint: Mention @runllm in the thread for followups.
thank you, now <@U06TM7M40RK> provide another example of doing the same thing, soft deleting an entire database, but now using a different class, because the one that you sent is not used anymore
To soft delete an entire database in DataHub using a different approach, you can use the DataHubGraph
class from the DataHub Python SDK. This class provides methods to interact with the DataHub metadata graph, including the ability to soft delete entities. Below is an example script that demonstrates how to soft delete a PostgreSQL database named “test_db” using the DataHubGraph
class.
import logging
from datahub.ingestion.graph.client import DatahubClientConfig, DataHubGraph
from datahub.emitter.mce_builder import make_dataset_urn
# Set up logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
# Initialize the DataHubGraph client
# Replace 'http://localhost:8080' with your DataHub GMS endpoint
graph = DataHubGraph(
config=DatahubClientConfig(
server="http://localhost:8080",
)
)
# Define the URN for the database you want to soft delete
# Replace 'postgres' with your platform if different
database_urn = make_dataset_urn(platform="postgres", name="test_db", env="PROD")
# Soft-delete the database
graph.delete_entity(urn=database_urn, hard=False)
logger.info(f"Soft deleted database with URN: {database_urn}")
Explanation:
- Logging Setup: Initializes logging to capture information about the script’s execution.
- DataHubGraph Client Initialization: Sets up the
DataHubGraph
client to communicate with the DataHub instance. Theserver
parameter should point to your DataHub GMS endpoint. - Database URN Definition: Constructs the URN for the PostgreSQL database. Replace
'postgres'
with your actual platform if different. - Soft Delete Process: Calls the
delete_entity
method on theDataHubGraph
client with thehard=False
parameter to perform a soft delete on the database.
Would you like more detailed information on any specific part of this process?
Sources:
- https://datahubspace.slack.com/archives/CUMUWQU66/p1723460687.521639|https://datahubspace.slack.com/archives/CUMUWQU66/p1723460687.521639
0 button
0 button
Hint: Mention @runllm in the thread for followups.
Hi ! Did this interaction resolve your issue or do you still require assistance ?
Please tag @ RunLLM if you still need further assistance !
Or tag @Datahub Community Support if you require a person to take a look
Hope this helps ! ><