Making Bulk Changes to Parameters in Multiple DataHub Recipes

Original Slack Thread

<@U06TM7M40RK> how i can make bulk change for example one parametr in 5 recipepts?

To make bulk changes to a parameter across multiple ingestion recipes in DataHub, you can use a script to programmatically update the recipes. This can be done using the DataHub Python client or by directly manipulating the YAML files of the recipes.

Here’s a step-by-step guide on how to achieve this:

  1. Set Up Your Environment: Ensure you have a Python environment set up with the necessary dependencies for DataHub.

  2. Load and Update Recipes: Write a script to load the YAML files of the recipes, update the desired parameter, and save the changes.

  3. Deploy Updated Recipes: Use the DataHub CLI to deploy the updated recipes.

Step 1: Set Up Your Environment

Ensure you have the DataHub Python client installed. You can install it using pip:

pip install acryl-datahub

Step 2: Load and Update Recipes

Create a Python script to load, update, and save the recipes. Here is an example script:

import yaml
import os

# Define the directory containing your recipe files
recipe_directory = '/path/to/your/recipes'

# Define the parameter you want to update and its new value
parameter_to_update = 'source.config.profiling.enabled'
new_value = False

def update_recipe(recipe_path, parameter, value):
    with open(recipe_path, 'r') as file:
        recipe = yaml.safe_load(file)

    # Update the parameter in the recipe
    keys = parameter.split('.')
    d = recipe
    for key in keys[:-1]:
        d = d.setdefault(key, {})
    d[keys[-1]] = value

    # Save the updated recipe
    with open(recipe_path, 'w') as file:
        yaml.safe_dump(recipe, file)

def update_all_recipes(directory, parameter, value):
    for filename in os.listdir(directory):
        if filename.endswith('.yaml'):
            recipe_path = os.path.join(directory, filename)
            update_recipe(recipe_path, parameter, value)
            print(f'Updated {parameter} in {filename}')

# Update all recipes in the directory
update_all_recipes(recipe_directory, parameter_to_update, new_value)

Step 3: Deploy Updated Recipes

Use the DataHub CLI to deploy the updated recipes. Here is an example command to deploy a recipe:

datahub ingest -c /path/to/your/recipes/recipe1.yaml
datahub ingest -c /path/to/your/recipes/recipe2.yaml
# Repeat for all updated recipes

You can also automate this step by adding the deployment commands to your script:

import subprocess

def deploy_recipe(recipe_path):
    subprocess.run(['datahub', 'ingest', '-c', recipe_path])

def deploy_all_recipes(directory):
    for filename in os.listdir(directory):
        if filename.endswith('.yaml'):
            recipe_path = os.path.join(directory, filename)
            deploy_recipe(recipe_path)
            print(f'Deployed {filename}')

# Deploy all updated recipes
deploy_all_recipes(recipe_directory)

Summary

  1. Set Up Your Environment: Install the DataHub Python client.

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

<@U06TM7M40RK> Can you help with the questions in this thread?


:information_source: We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a :thumbsup: or :thumbsdown: so other folks in the Community know if you’re still looking for help on the topic!

Sure, I’d be happy to help! To assist you more effectively, could you please provide the following information:

  1. Are you using the UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g., 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g., BigQuery, PostgreSQL, Snowflake, etc.)

This information will help me provide you with the most relevant guidance and resources.

Sources:

  1. Load and Update Recipes: Write a script to update the desired parameter in all recipe files.
  2. Deploy Updated Recipes: Use the DataHub CLI to deploy the updated recipes.

Would you like more detailed steps or a code example on how to implement this?

Sources:

<@U06TM7M40RK> Maybe i can write some python code not transformer?

Sure thing! I’m looking through the available data to answer your question. Hold on a second… :hourglass_flowing_sand: