How to search DatahubGraph for dataset urns filtered by properties in Python

Original Slack Thread

In python, is there a good way to search a DatahubGraph for dataset urns filtered by properties or is would this require a more explicit query? .get_urns_by_filter doesnt seem to include properties as an option, so the only way i can think to do this through the python api is to .list_all_entity_urns and .get_dataset_properties on each urn to check if the property matches my target, which seems inefficient

You can pass extraFilters into the get_urns_by_filter method. I believe custom properties are indexed in ElasticSearch, although I don’t remember the exact syntax required

I’d also be open to a PR that adds filters for custom properties as a first-class filter arg on that method

Thanks Harshal, taking a look now. I’ll send up a pr once i get it running. The docs are a little sparse for filters and custom filters, do you have any examples of them in action i could take a look at?

There’s not a ton of good docs - there’s a bit here https://datahubproject.io/docs/next/how/search/#graphql, but it is sparse

The filters are passed directly through to ElasticSearch, so you can also reference their docs on their query DSL. Also, I sometimes inspect the network traffic in my browser to see example graphql queries

Thanks Harshal looking through this this morning

Posting for posterity in case anyone stumbles across this thread in the future.

attachment

This is how to structure your search by property through the python API

Harshal’s link is useful, as is his tip about inspecting your network traffic. <About DataHub Search | DataHub page> section has a good selection of queries to inspect.