<@U06TM7M40RK> how i can make bulk change for example one parametr in 5 recipepts?
To make bulk changes to a parameter across multiple ingestion recipes in DataHub, you can use a script to programmatically update the recipes. This can be done using the DataHub Python client or by directly manipulating the YAML files of the recipes.
Here’s a step-by-step guide on how to achieve this:
-
Set Up Your Environment: Ensure you have a Python environment set up with the necessary dependencies for DataHub.
-
Load and Update Recipes: Write a script to load the YAML files of the recipes, update the desired parameter, and save the changes.
-
Deploy Updated Recipes: Use the DataHub CLI to deploy the updated recipes.
Step 1: Set Up Your Environment
Ensure you have the DataHub Python client installed. You can install it using pip:
pip install acryl-datahub
Step 2: Load and Update Recipes
Create a Python script to load, update, and save the recipes. Here is an example script:
import yaml
import os
# Define the directory containing your recipe files
recipe_directory = '/path/to/your/recipes'
# Define the parameter you want to update and its new value
parameter_to_update = 'source.config.profiling.enabled'
new_value = False
def update_recipe(recipe_path, parameter, value):
with open(recipe_path, 'r') as file:
recipe = yaml.safe_load(file)
# Update the parameter in the recipe
keys = parameter.split('.')
d = recipe
for key in keys[:-1]:
d = d.setdefault(key, {})
d[keys[-1]] = value
# Save the updated recipe
with open(recipe_path, 'w') as file:
yaml.safe_dump(recipe, file)
def update_all_recipes(directory, parameter, value):
for filename in os.listdir(directory):
if filename.endswith('.yaml'):
recipe_path = os.path.join(directory, filename)
update_recipe(recipe_path, parameter, value)
print(f'Updated {parameter} in {filename}')
# Update all recipes in the directory
update_all_recipes(recipe_directory, parameter_to_update, new_value)
Step 3: Deploy Updated Recipes
Use the DataHub CLI to deploy the updated recipes. Here is an example command to deploy a recipe:
datahub ingest -c /path/to/your/recipes/recipe1.yaml
datahub ingest -c /path/to/your/recipes/recipe2.yaml
# Repeat for all updated recipes
You can also automate this step by adding the deployment commands to your script:
import subprocess
def deploy_recipe(recipe_path):
subprocess.run(['datahub', 'ingest', '-c', recipe_path])
def deploy_all_recipes(directory):
for filename in os.listdir(directory):
if filename.endswith('.yaml'):
recipe_path = os.path.join(directory, filename)
deploy_recipe(recipe_path)
print(f'Deployed {filename}')
# Deploy all updated recipes
deploy_all_recipes(recipe_directory)
Summary
- Set Up Your Environment: Install the DataHub Python client.
Hey there! Make sure your message includes the following information if relevant, so we can help more effectively!
- Are you using UI or CLI for ingestion?
- Which DataHub version are you using? (e.g. 0.12.0)
- What data source(s) are you integrating with DataHub? (e.g. BigQuery)
<@U06TM7M40RK> Can you help with the questions in this thread?
We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a
or
so other folks in the Community know if you’re still looking for help on the topic!
Sure, I’d be happy to help! To assist you more effectively, could you please provide the following information:
- Are you using the UI or CLI for ingestion?
- Which DataHub version are you using? (e.g., 0.12.0)
- What data source(s) are you integrating with DataHub? (e.g., BigQuery, PostgreSQL, Snowflake, etc.)
This information will help me provide you with the most relevant guidance and resources.
Sources:
- https://datahubproject.io/docs/0.13.1/quick-ingestion-guides/bigquery/configuration|https://datahubproject.io/docs/0.13.1/quick-ingestion-guides/bigquery/configuration
- https://datahubspace.slack.com/archives/CUMUWQU66/p1721286616.140319|https://datahubspace.slack.com/archives/CUMUWQU66/p1721286616.140319
0 button
0 button
Hint: Mention @runllm in the thread for followups.
- Load and Update Recipes: Write a script to update the desired parameter in all recipe files.
- Deploy Updated Recipes: Use the DataHub CLI to deploy the updated recipes.
Would you like more detailed steps or a code example on how to implement this?
Sources:
- <https://datahubproject.io/docs/metadata-ingestion/docs/transformer/intro|DataHub documentation on transformers>
- <Dataset | DataHub documentation on writing custom transformers>
- <datahub/docs/cli.md at master · datahub-project/datahub · GitHub CLI documentation>
0 button
0 button
Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK> Maybe i can write some python code not transformer?
Sure thing! I’m looking through the available data to answer your question. Hold on a second…