Ingesting Lineage from Files on GCS using Python Emitter in DataHub

user-1 · March 4, 2024, 3:57pm

Hi folks, I have a quick question about creating and ingesting lineage from files on GCS. Suppose I have a set of notebooks or local scripts producing some datasets that get saved to GCS. These scripts just run raw Python (and maybe some Pandas), but don’t run on one of DataHub’s supported integrations.

Is this a use case for the <https://datahubproject.io/docs/metadata-ingestion/as-a-library/|Python emitter>? Assuming I can come up with some logic to extract dataset metadata from my local python scripts, is the Emitter the correct tool to write metadata/lineage to the DataHub metadata store?

user-2 · March 4, 2024, 3:57pm

yes, with the python api you can basically do whatever you want

Topic		Replies	Views
How to Ingest Lineage Information Manually Stored in Spark into DataHub ingestion	2	23	December 16, 2024
Generating a Python script to ingest a dataset using the DataHub SDK ingestion	3	88	December 2, 2024
Using Python Emitter for Ingesting Database Structure with DataHub ingestion	8	20	March 17, 2025
Ingesting Metadata Directly from Code into DataHub without Connecting to Datasource ingestion	7	38	July 22, 2024
Creating Lineage from S3 Stored Procedure Logic and DataHub SDK Usage ingestion	23	85	March 17, 2025

Ingesting Lineage from Files on GCS using Python Emitter in DataHub

Related topics