.title {"How to Ingest Only the Most Recent Partition of Sharded Tables in Datahub UI"}

Original Slack Thread

Hello everyone. I’m ingesting dataset and tables coming from BigQuery in Datahub v 0.13 using Datahub UI. Some of my dataset contain sharded tables, meaning that I have a partition for every single day of that table. Is there any option when configuring the recipe from UI to indicate that I want to take only the most recent partition of every sharded table belonging to the dataset?

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

What do the names of the tables look like? We have some support for handling sharded tables built in. There’s an advanced config sharded_tablepattern⁣ to tweak the behavior, but it requires some knowledge of how our ingestion works

We support basic date-suffixed table names