Managing Queries and Data Usage Stats in Snowflake Dataset on v0.11.0

Original Slack Thread

Hi Everyone,
We’re trying to disable the Queries tab on our dataset page while keeping usage stats intact. This is specific to the Snowflake platform. We attempted include_top_n_queries = false and top_n_queries = 0 in our recipes, but it didn’t work. We’re on version v0.11.0.
Is there a way to hide existing queries? Also, we are open if someone has a way to delete all the existing saved queries and then reenable ingestions with include_top_n_queries = false
Appreciate your help!

Setting include_top_n_queries should prevent the production of new queries, but won’t delete anything already there

Unfortunately both queries and stats are stored in the same underlying “aspect” datasetUsageStatistics, so it’s not easy to delete one without deleting the other (although it can be done using the API to read existing data, clear the queries, and then write it back)

To delete both usage stats and queries, you can use the datahub delete cli command https://datahubproject.io/docs/next/how/delete-metadata/#delete-cli-usage

Hey <@U01GZEETMEZ>,
Thanks for the response!
So, when I get the DatasetUsageStatistics aspect using get_latest_timeseries_value method from the Python SDK I get the following response for the dataset, which has the top Queries displayed in the queries tab
Usage of the method print(client.get_latest_timeseries_value(entity_urn="XX",aspect_type = DatasetUsageStatisticsClass, filter_criteria_map={}))

    {'timestampMillis': XXXX,
      'eventGranularity': TimeWindowSizeClass(
          {'unit': 'DAY',
            'multiple': 1}
            ),
              'partitionSpec':
                PartitionSpecClass(
                    {'type': 'FULL_TABLE', 'partition': 'FULL_TABLE_SNAPSHOT', 'timePartition': None}
                    ),
                      'messageId': None,
                        'uniqueUserCount': 1,
                          'totalSqlQueries': 21,
                            'topSqlQueries': None,
                              'userCounts': [], 
                              'fieldCounts': [&lt;ALL COLUMNS USAGE INFO&gt;]```
Now, I don't know why this class is not storing the actual queries, which I could remove and re-emit this aspect.

Also, I see that front end is using this query to get the actual queries to display.
```""" query getRecentQueries($urn: String!) {
    dataset(urn: $urn) {
      usageStats(range: MONTH) {
        buckets {
          bucket
        metrics {
            topSqlQueries
          __typename
        }
        __typename
      }
      __typename
    }
    __typename
  }
}
"""```
Any ideas on how I should manage to delete the the actual queries?

The get_latest method will only return the most recent datasetUsageStats, but it’s a “timeseries aspect” and so can have multiple values. In the python sdk, you can use the get_usage_aspects_from_urn method with time bounds to fetch all the data

Did you provide the the timestamps in milliseconds?