Hi All,
can users be searched based on their email IDs? Right now I am not able to search users on datahub:
Hi All,
can users be searched based on their email IDs? Right now I am not able to search users on datahub:
Hey there, it looks like we do index users based on their emails in search: https://github.com/datahub-project/datahub/blob/3acd25ba1d2881597e5a0574331b6b81f7375d94/metadata-models/src/main/pegasus/com/linkedin/identity/CorpUserInfo.pdl#L37-L44|https://github.com/datahub-project/datahub/blob/3acd25ba1d2881597e5a0574331b6b81f7[…]-models/src/main/pegasus/com/linkedin/identity/CorpUserInfo.pdl
I took a look on the <DataHub site> users and groups feature and after typing and deleting a character, I can see users that are in the system. I’m wondering if your user ingestion worked correctly?
Do you see them on your users page? (the equivalent of this page: https://demo.datahubproject.io/settings/identities/users for your deployment)
I do see it, thanks <@U04UKA5L5LK> !
I do have one more question, does deleting an ingestion remove all the data from datahub? like If I synced a snowflake DB that I don’t want to know anymore, does deleting the ingestion get rid of the data?
if not what’s the best way to delete those? cc: <@U04UKA5L5LK>
Hey, we should have a script to do bulk deletes like this! Tagging <@U04N9PYJBEW> who would be the most familiar.
Deleting an ingestion source will not remove any of the data associated. See https://datahubproject.io/docs/next/how/delete-metadata/#delete-cli-usage for information on the delete CLI which can perform this bulk deletion. Very soon we’ll be able to support deleting all urns within a container, like a snowflake database, assuming you’ve ingested those urns recently (around past month). If you really need to delete data just from a specific ingestion run and can’t use the filters described in that doc, then let me know – you’ll need to run a more complex script
Thanks for the response! I need to delete data from a specific ingestion.
Did you use stateful ingestion (via stateful_ingestion: enabled
in your recipe) for that ingestion source? If not, we may need to find a workaround
stateful_ingestion: enabled is there in my recipe. Sorry for the delayed response. I have been unwell
You can run something like this:
pipeline_name = "<pipeline_name>"
graph = DataHubGraph(DatahubClientConfig(server=..., token=...))
checkpoint = graph.get_latest_pipeline_checkpoint(pipeline_name, platform)
if checkpoint:
urns = checkpoint.state.urns
timestamp = int(time.time() * 1000)
run_id = f"soft-delete-by-pipeline-{timestamp}"
for urn in progressbar.progressbar(urns):
graph.soft_delete_urn(urn, run_id=run_id)```
Where `pipeline_name` is the name of the ingestion source you are deleting. This may be specified in the recipe, but if not, then you can find it in your logs after the line:
> ```Committing ingestion checkpoint for pipeline```
It should look something like `urn:li:dataHubIngestionSource:<uuid>`
Thank you! We will try this