Troubleshooting deleting folders and datasets in an ingested S3 Bucket

Original Slack Thread

Hi Team, I am working on removing an ingested S3 Bucket, but I am having some trouble with deleting all of the folders and datasets. I am able to delete it with the DataHubGraphClient class and hard_delete method, for which I use the URN of the Bucket. However all of the other data persists, so the folders and datasets are not deleted. To overcome this I have tried to find a way to retrieve all of the child containers and files, but for some reason I can’t seem to find a good way to retrieve them. I have tried to use the get_related_entities method, but I can’t find the relationship to use to go from s3 to a folder. From a folder (container) to a dataset that works fine, because there the PartOf relationship which seems to work.

If there is not a good way to do this from the python client, I have found a way to use the graphql so I could go into that direction. But I have the feeling there should be a better way to do this. Can someone help me or point me in the right direction? Thanks :smiley:

<@U01GCJKA8P9> could you help me on this?

I found out that the “IsPartOf” relation does get created. However it takes some time to do that. Not sure how much time it takes, but eventually it will be added.

As a solution to retrieve the entities wihtin a bucket for now I am using the following code:

    """Retrieve all child items for an entity"""
    [*rel_entities] = dh_graph_client.get_related_entities(
        urn, ["IsPartOf"], "INCOMING"
    )
    urns = [rel_entity.urn for rel_entity in rel_entities]
    return urns```
Then I use a recursive function to retrieve all of the entities within each of the child entities. Of course this is not the best solution but it works.

With those URNs I am then able to delete each of them using the hard delete method.

So this works, but is there a better way to do it?