Rollback Ingestion Run Using Python SDK and REST API

Original Slack Thread

Is it possible to rollback an ingestion run with the python sdk without using the datahub cli?

Hey there! :wave:

<@U06TM7M40RK> Can you help with the questions in this thread?

:information_source: Please continue to tag RunLLM (e.g <@U06TM7M40RK>) to have it continue generating responses!

Yes, it is possible to rollback an ingestion run using the Python SDK without using the DataHub CLI. The Python SDK’s DataHubGraph client supports various delete operations, including rolling back an ingestion run.

To rollback an ingestion run programmatically, you can use the rollback method provided by the SDK. Here is a step-by-step guide on how to achieve this:

  1. Initialize the DataHubGraph Client: First, you need to initialize the DataHubGraph client.
  2. Execute the Rollback: Use the rollback method to rollback the ingestion run by specifying the run_id.

Here is an example code snippet:

from datahub.emitter.mce_builder import datahub_graph

# Initialize the DataHubGraph client
client = datahub_graph.DataHubGraph()

# Rollback the ingestion run
run_id = "<your_run_id>"
client.rollback(run_id=run_id)

This will rollback the specified ingestion run, deleting the entities created by the run and reverting the updated aspects.

Would you like more details on any specific part of this process?

Sources:

<@U06TM7M40RK> This doesn’t work, there is no rollback method

You are correct. The provided data sources do not mention a rollback method in the Python SDK. The rollback functionality is primarily documented for use via the DataHub CLI and the REST API.

To rollback an ingestion run programmatically, you would need to interact with the REST API directly. Here is how you can do it using Python’s requests library:

  1. Set up the necessary imports and client configuration.
  2. Send a POST request to the rollback endpoint with the required payload.

Here is an example code snippet:

import requests
import json

# Configuration
gms_host = "http://localhost:8080"  # Replace with your DataHub GMS host
run_id = "<your_run_id>"  # Replace with your ingestion run ID
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer <your_access_token>"  # Replace with your access token if needed
}

# Payload for the rollback request
payload = {
    "runId": run_id,
    "dryRun": False,  # Set to True if you want to perform a dry run
    "safe": True  # Set to False if you want to perform an unsafe rollback
}

# Send the rollback request
response = requests.post(f"{gms_host}/runs?action=rollback", headers=headers, data=json.dumps(payload))

# Check the response
if response.status_code == 200:
    print("Rollback successful")
else:
    print(f"Failed to rollback: {response.status_code} - {response.text}")

This code sends a POST request to the DataHub GMS endpoint to rollback the specified ingestion run.

Would you like more details on any specific part of this process?

Sources:

Hi :slight_smile: ! Did this interaction resolve your issue or do you still require assistance ?

Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look :slight_smile:

Hope this helps ! ><