Building Lineage Container to Dataset in S3 Case

user-3 · March 4, 2024, 3:38pm

Hi guys! I couldn’t find any information anywhere on how to build a lineage container2dataset or vice versa. Is it even possible? What to do in the S3 case, when the top-level dataset is represented as a container?

user-1 · March 4, 2024, 3:38pm

<@U05A57K96F2>
Dataset has container aspect. you can set it to container (https://demo.datahubproject.io/dataset/urn:li:dataset:(urn:li:dataPlatform:datahub,Dataset,PROD)/Schema?is_lineage_mode=false&schemaFilter=)

user-3 · March 4, 2024, 3:38pm

Thx, Siddique!

It is clear that each dataset can be an IsPartOf of the container. The question is different. If I have a spark/flink service that consumes data from a s3 folder (container), which contains partition subfolders (years, months, days, hours - also containers), and only below are dataset files. In this case, how to correctly build a lineage container-dataset? How to calculate container urn? Now it represents like urn:li:container:8aedea34fd2377eae316eca5464c2034 attachment

user-2 · March 4, 2024, 3:38pm

Hi just wondering if you found the solution to container2dataset lineage? I have the same use case and didn’t find anything from the doc. <@U05A57K96F2>

user-3 · March 4, 2024, 3:38pm

Unfortunately, not yet.

Topic		Replies	Views
Managing Lineage from YAML Files in DataHub for Ingestion Recipes ingestion	1	47	March 4, 2024
Troubleshooting deleting folders and datasets in an ingested S3 Bucket troubleshoot	2	64	March 4, 2024
Ingesting Lineage from Files on GCS using Python Emitter in DataHub ingestion	1	50	March 4, 2024
Automatic extraction of dataset to dataset lineage available sources and documentation getting-started	2	56	March 4, 2024
Automatic Lineage Extraction and Data Source Compatibility getting-started	1	60	March 4, 2024

Building Lineage Container to Dataset in S3 Case

Related topics