"Triggering Metadata Ingest and Uploading Recipes on the Fly - AWS Lambda, Airflow, and More"

user-4 · March 4, 2024, 3:47pm

Newbie Questions:

Is there a way to trigger metadata ingest with an AWS Lambda function (or any other, e.g AirFlow).
Can recipes be uploaded on the fly?

user-2 · March 4, 2024, 3:47pm

There’s a couple options for how to trigger/schedule ingestion

For example, we have a doc on using Airflow here: https://datahubproject.io/docs/metadata-ingestion/schedule_docs/airflow/

user-2 · March 4, 2024, 3:47pm

Recipes can be uploaded on the fly using the datahub ingest deploy command, but I would caution that it’s a somewhat advanced feature and it’s pretty rare to actually need that, so it’d be helpful to understand what you’re trying to accomplish with it

user-4 · March 4, 2024, 3:47pm

We will most likely be ingesting metadata from different sources, including various S3 buckets - so it sounds like we’d need to pre-create recipes for the various S3 buckets as long as we knew in advance the path to those buckets. But if we didn’t know the S3 path in advance, as in the case where we create an S3 bucket on the fly, can we use the same recipe but have a parameter in the recipe to replace the actual bucket name (or upload a unique recipe for that bucket)??

user-2 · March 4, 2024, 3:47pm

I think the s3 source will iterate over the buckets it can access, so this may not be necessary

It depends on how frequently the set of paths is changing. Your options include dynamically modifying + uploading recipes, running CLI ingestion with an env var in the recipe to control the bucket, or running ingestion programmatically using Pipeline.create(config).run(...)

user-4 · March 4, 2024, 3:47pm

"running ingestion programmatically using Pipeline.create(config).run(...)" looks interesting… can you share some docs/blogs/readme’s on this?

user-2 · March 4, 2024, 3:47pm

Yup we have some sample code here https://datahubproject.io/docs/metadata-ingestion/#programmatic-pipeline

user-1 · March 4, 2024, 3:47pm

<@U01GZEETMEZ> Is it possible to create the pipeline via java api?

datahub_team · March 4, 2024, 3:47pm

Nope - our ingestion sources are written in python. That said, for most use case, using UI ingestion is the easiest / least error prone

user-1 · March 4, 2024, 3:47pm

Thanks for the reply. I was aiming for creating ingestion sources dynamically. I guess I could do it via graphql API.

Topic		Replies	Views
Ingesting Airflow Metadata Using DataHub CLI and Recipe Configuration ingestion	3	66	April 21, 2025
Setting AWS Permissions and Creating Ingestion Recipes for Pulling Metadata from MWAA into DataHub ingestion	4	4	October 7, 2024
Scheduling Metadata Ingestion from MWAA into DataHub on EKS ingestion	2	29	October 7, 2024
Creating an Ingestion Source and Handling Secrets in DataHub API Calls ingestion	6	45	January 13, 2025
Ingesting a Specific Table from Source Platform into DataHub using Recipe or CLI ingestion	1	55	September 2, 2024

"Triggering Metadata Ingest and Uploading Recipes on the Fly - AWS Lambda, Airflow, and More"

Related topics