Understanding and Deleting Old BigQuery Data in DataHub

Original Slack Thread

Hi Everyone!
I made some big query ingestion in our project but
on the dashboard its showing 1.3K big query datasets but I made only 4 big query ingestion seems some ingestion were made and deleted but data is there
How can I remove that old big query data if corresponding ingestions are deleted.

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

UI for ingestion
0.12.1
BigQuery

Can you tell me if this is your production environment or your local environment?

This statement is what you wrote:
“…the dashboard its showing 1.3K big query datasets but I made only 4 big query ingestion…”

Each ingestion run can bring in a lot of metadata from the source system, which creates many datasets.

Can you take a look at the “configured ingestion recipe”?
For example, if I am ingesting from mysql database, I may have 20 tables. This will create 20 datasets. I may have done one ingestion run.

Can you see if there is information here that will help you?
https://datahubproject.io/docs/ui-ingestion/#debugging-a-failed-ingestion-run

This is production env and even all 4 ingestions showing only 20 asset resources

it seems some ingestion were made previously but they are deleted but ingestion data asset are still there

any way to delete those?

Can you follow these instructions:
https://datahubproject.io/docs/next/api/tutorials/datasets#delete-dataset

I use the curl approach for which you need an access token which you can obtain by following these instructions:
https://datahubproject.io/docs/next/authentication/personal-access-tokens/