Automatically Deploying and Updating Recipes with Datahub CLI Tools

Original Slack Thread

Hey everyone!
I have a question about updating deployed recipes using the CLI tools. I want to be able to deploy a new recipe to datahub using datahub ingest deploy and then update it with subsequent deployments. Unfortunately I haven’t found any way to automatically get the URN of a deployed job other than copying it from the terminal output after the first deploy.

Is there a way to do this that I am missing? Or does anyone else have a different method they use to automatically deploy and update recipes?

Hmmm not really sure on this. <@U04N9PYJBEW> might be able to speak to this!

<@U01GZEETMEZ> know any easy way to do this? At the least, you can either parse the json terminal output, or you can make a script to make the graphql call to create the ingestion source yourself and parse out the urn in python.

We can definitely make some tweaks here. Our current code looks like this: https://github.com/datahub-project/datahub/blob/791e2e7bf588d96bad94ccfdcf1beddde02dadc3/metadata-ingestion/src/datahub/cli/ingest_cli.py#L348

Would it make sense to have an upsert command or something instead?

The workaround I ended up using was to make a search call through the REST Api to determine if the IngestionSource exists and if so, parse the URN from the response.

It would be very nice if that functionality could be incorporated into the CLI tools as well.

Got it - the tricky piece is that there’s no stable recipe -> urn mapping that we can use. How were you determining if the ingestion source exists?