Clarifying the `platform` argument in the `get_urns_by_filter` function for DataHub

Original Slack Thread

Just asking the AI a question about code and some unclear documentation :slightly_smiling_face: please ignore haha

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

<@U06TM7M40RK> Can you help with the questions in this thread?

:information_source: We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a :thumbsup: or :thumbsdown: so other folks in the Community know if you’re still looking for help on the topic!

I’m here to help with any questions you have about DataHub! Could you please provide more details or rephrase your question to ensure it’s related to DataHub? Specifically, let me know if you have any questions about data discovery, data observability, federated governance, or any other aspect of DataHub. :+1: 0 button :-1: 0 button :bulb: Hint: Mention @runllm in the thread for followups.

<@U06TM7M40RK> What does get_urns_by_filter 's input argument: platform include? I’m experiencing a weird case where despite the urn looking like
urn:li:dataset:(urn:li:dataPlatform:dbt,my_catalog.main_schema.some_table,PROD) I cannot find it if I pass in the dbt as platform to the filter function call.

The platform argument in the get_urns_by_filter function is used to filter entities based on their platform. This argument should match the platform part of the URN. For example, if your URN is urn:li:dataset:(urn:li:dataPlatform:dbt,my_catalog.main_schema.some_table,PROD), the platform should be dbt.

However, make sure that the platform string is in lowercase when you pass it to the function. According to the <Slack Slack thread>, the platform should be specified in lowercase, like dbt.

Here’s an example of how you can use the get_urns_by_filter function with the platform argument:

from datahub.ingestion.graph.client import DataHubGraph, DataHubGraphConfig

datahub_graph = DataHubGraph(DataHubGraphConfig(server="http://localhost:8080"))

dataset_urns = datahub_graph.get_urns_by_filter(
    extraFilters=[{'field': 'domains', 'values': ['urn:li:domain:your_domain_urn']}]

for urn in dataset_urns:

If you are still unable to find the URN, double-check the following:

  1. Ensure that the platform string is in lowercase.
  2. Verify that the entity exists and is correctly ingested into DataHub.
  3. Check if there are any other filters that might be affecting the search results.

Would you like more details on any specific part of this process?


  • <Slack Slack Thread>
  • <Client | DataHub Python SDK Documentation> :+1: 0 button :-1: 0 button :bulb: Hint: Mention @runllm in the thread for followups.