In python, is there a good way to search a DatahubGraph for dataset urns filtered by properties or is would this require a more explicit query? .get_urns_by_filter
doesnt seem to include properties as an option, so the only way i can think to do this through the python api is to .list_all_entity_urns
and .get_dataset_properties
on each urn to check if the property matches my target, which seems inefficient
You can pass extraFilters
into the get_urns_by_filter
method. I believe custom properties are indexed in ElasticSearch, although I don’t remember the exact syntax required
I’d also be open to a PR that adds filters for custom properties as a first-class filter arg on that method
Thanks Harshal, taking a look now. I’ll send up a pr once i get it running. The docs are a little sparse for filters and custom filters, do you have any examples of them in action i could take a look at?
There’s not a ton of good docs - there’s a bit here https://datahubproject.io/docs/next/how/search/#graphql, but it is sparse
The filters are passed directly through to ElasticSearch, so you can also reference their docs on their query DSL. Also, I sometimes inspect the network traffic in my browser to see example graphql queries
Thanks Harshal looking through this this morning
Posting for posterity in case anyone stumbles across this thread in the future.
This is how to structure your search by property through the python API
Harshal’s link is useful, as is his tip about inspecting your network traffic. <About DataHub Search | DataHub page> section has a good selection of queries to inspect.