How to Programmatically Pull Columns with a Specific Tag in Datahub

Original Slack Thread

Hi All,

We use Datahub (v0.10.4) as our cataloging tool and it pulls metadata of tables and views from our Snowflake instance. Over the past few days, our team has worked to attach some tags to the columns of these datasets in datahub. I would now like to programmatically pull all the columns across all datasets, on which a particular tag is attached.
Is there an easy way to do this?
I found ways to get a list of datasets, but not of columns on which the tag is directly attached. For eg, I could find this GraphQL query (again, didnt find anything in OpenAPI/Python for this, which I prefer using):

  searchAcrossEntities(
    input: {types: [], query: "*", start: 0, count: 10, filters: [{field: "fieldTags", value: "urn:li:tag:test_pii"}]}
  ) {
    start
    count
    total
    searchResults {
      entity {
        type
        ... on Dataset {
          urn
          type
          platform {
            name
          }
          name
        }
      }
    }
  }
}```
This gives me a list of datasets, but I havent been able to find a way to get to the _columns_. Appreciate any help on this!

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Which DataHub version are you using? (e.g. 0.12.0)
  2. Please post any relevant error logs on the thread!

hey Dhaval! right now we can’t return schema fields as their own entities but instead they exist as a “sub-entity” underneath a dataset. therefore I believe the query you have right now is going to be your best bet for getting these columns. on each searchResult there’s also going to be a matchedFields object which will provide more information from each dataset your get back that will tell you which field is matching your query