Enabling Stats Tab and Usage Statistics in DataHub GUI

Original Slack Thread

<@U06TM7M40RK> In the Datahub UI, for datasets, there is a “Stats” tab. However, it is greyed out. How do I get information to show up in this tab?

To enable the “Stats” tab in the DataHub UI for datasets, you need to ensure that profiling is enabled during the ingestion process. Profiling collects table-level and column-level statistics, which are necessary for populating the “Stats” tab. Here are the steps you should follow:

  1. Enable Profiling in Ingestion: Make sure that profiling is enabled in your ingestion recipe. This involves setting the profiling parameter to true in your ingestion configuration. Profiling will collect necessary statistics such as row counts, column counts, null counts, distinct counts, and other statistical measures for each column.

  2. Check Source Support: Ensure that the data source you are using supports profiling. According to the <https://datahubproject.io/docs/0.13.1/metadata-ingestion/docs/dev_guides/sql_profiles|SQL Profiling documentation>, SQL profiling is supported for all SQL sources, but you should verify this on the individual source page.

  3. Ingestion Recipe Configuration: Your ingestion recipe should include the necessary configuration to enable profiling. For example, if you are using a SQL-based source, your configuration might look like this:

    source:
      type: your-source-type
      config:
        profiling:
          enabled: true
    
  4. Reindexing: If you have already ingested data but the “Stats” tab is still greyed out, you might need to reindex your data. This can be necessary if the profiling data was not correctly indexed during the initial ingestion.

  5. Check for Errors: Ensure there are no errors in your ingestion logs that might indicate issues with the profiling process.

  6. Version Compatibility: Make sure that the version of DataHub you are using supports the features you are trying to enable. Some features might not be available in older versions.

If you have followed these steps and the “Stats” tab is still not showing data, you might want to check the <https://datahubspace.slack.com|DataHub Slack community> for additional support or troubleshooting tips.

Would you like more detailed steps or specific examples on any of these points?

Sources:

<@U06TM7M40RK> how do I get table usage statistics in datahub?

To get table usage statistics in DataHub, you need to ensure that your ingestion process is configured to capture and ingest usage statistics. Here are the steps to achieve this:

  1. Enable Usage Statistics in Ingestion: Ensure that your ingestion recipe is configured to include usage statistics. This involves setting the include_usage_statistics parameter to true in your ingestion configuration. This will allow DataHub to capture and display usage statistics such as the number of queries, top users, and other relevant metrics.

  2. Configure the Ingestion Recipe: Depending on the data source you are using, your ingestion recipe should include the necessary settings to capture usage statistics. For example, if you are using BigQuery, your configuration might look like this:

    source:
      type: bigquery
      config:
        include_usage_statistics: true
        usage:
          include_operational_stats: true
          include_read_operational_stats: true
    
  3. Check Source Support: Ensure that the data source you are using supports usage statistics. According to the <https://datahubproject.io/docs/0.13.1/features/dataset-usage-and-query-history|Dataset Usage & Query History documentation>, not all sources may support usage statistics, so it’s important to verify this on the individual source page.

  4. Ingestion Logs: After configuring your ingestion recipe, run the ingestion process and check the logs to ensure that usage statistics are being captured and ingested successfully. Look for any errors or warnings that might indicate issues with the ingestion process.

  5. UI Verification: Once the ingestion process is complete, you should be able to see usage statistics in the DataHub UI under the “Queries” and “Stats” tabs for the datasets.

  6. GraphQL API: You can also use the DataHub GraphQL API to query usage statistics programmatically. For example, you can use the getLastMonthUsageAggregations query to retrieve usage statistics for a specific dataset.

If you have followed these steps and are still not seeing usage statistics, you might want to check the <https://datahubspace.slack.com|DataHub Slack community> for additional support or troubleshooting tips.

Would you like more detailed steps or specific examples on any of these points?

Sources:

Hi :slight_smile: ! Did this interaction resolve your issue or do you still require assistance ?

Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look :slight_smile:

Hope this helps ! ><

<@U064583E403> which data source you would like to see the profiling?

Athena

<@U064583E403>, pls add below under the config: in your recipe

profiling:
enabled: true