Updating Metadata "env" Values for Datahub Ingested AWS Athena Tables

Original Slack Thread

Hi everyone. I am using Datahub version 0.11.0. I got the initial datahub ingest to work. I am specifically ingesting AWS Athena tables.

However, I noticed that subsequent ingests of the metadata of the same tables do not update the “env”. By default all the “env” values are “PROD”. I attempted to put in an “env” value of “TEST” based on certain Athena schema names, and I saw that the table ingested. But the URN still has “PROD” in it.

My goal is to be able to separate out different environments such as “PROD”, “TEST”, etc

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Which DataHub version are you using? (e.g. 0.12.0)
  2. Please post any relevant error logs on the thread!

re-ingesting “env”: “TEST” won’t overwrite alreday ingested metadata with PROD. Ingest evn ‘TEST’ should have created new set entities with ‘TEST’ in URNs. You may have to cleanup ‘PROD’ entities with by delete them
https://datahubproject.io/docs/how/delete-metadata/

Thanks. We’ve also noticed this to be true. I will try tjis and see if the TEST shows up in the search in the Datahub UI

<@U0445MUD81W> if I want to re-ingest everything for a platform (a database), can I delete the entire platform in datahub and re-ingest metadata, and the metadata will show up?

Yes, hard deleting platform re-ingesting should work.
datahub delete --platform athena --hard

or
just try to hard delete few URNs with PROD and re-ingest with "env": "TEST”, see deleted entries will show up
datahub delete --urn "&lt;urn&gt;" --hard

Will TEST show up in a filter or search?

I see it. The TEST URNs show up when I paste the URN in the browser, but not when I do a search for the table name.