.title {"How to Ingest Only the Most Recent Partition of Sharded Tables in Datahub UI"}

user-1 · May 20, 2024, 12:04am

Hello everyone. I’m ingesting dataset and tables coming from BigQuery in Datahub v 0.13 using Datahub UI. Some of my dataset contain sharded tables, meaning that I have a partition for every single day of that table. Is there any option when configuring the recipe from UI to indicate that I want to take only the most recent partition of every sharded table belonging to the dataset?

datahub_team · May 20, 2024, 12:04am

Hey there! Make sure your message includes the following information if relevant, so we can help more effectively!

Are you using UI or CLI for ingestion?
Which DataHub version are you using? (e.g. 0.12.0)
What data source(s) are you integrating with DataHub? (e.g. BigQuery)

datahub_team · May 20, 2024, 12:04am

What do the names of the tables look like? We have some support for handling sharded tables built in. There’s an advanced config sharded_tablepattern⁣ to tweak the behavior, but it requires some knowledge of how our ingestion works

datahub_team · May 20, 2024, 12:04am

We support basic date-suffixed table names

Topic		Replies	Views
Troubleshooting ingestion issue for dataset with tables and sharded tables in Datahub v0.13 ingestion	1	38	May 27, 2024
Ingesting Multiple Tables with the Same Name from Different Datasets in UI BigQuery ingestion	7	64	March 4, 2024
Profiling a Specific Table in a Dataset with Data Ingestion Recipe ingestion	2	64	March 4, 2024
"Investigating Pending State of Ingestion Recipe" ingestion	4	61	May 27, 2024
Troubleshooting DataHub Ingestion from Metabase and BigQuery ingestion	3	22	June 24, 2024

.title {"How to Ingest Only the Most Recent Partition of Sharded Tables in Datahub UI"}

Related topics