Troubleshooting Missing BigQuery Table Query Stats

Original Slack Thread

Hello, I can no longer see BigQuery table query stats now. I’m not sure when this this became a problem but I know it used to work and I never changed the ingestion recipe.

    type: bigquery
    config:
        include_table_lineage: true
        include_tables: true
        include_views: true
        convert_urns_to_lowercase: true
        profiling:
            enabled: true
            profile_table_level_only: true
        stateful_ingestion:
            enabled: true```
Does anything seem off in here? Based on the documentation, `include_usage_statistics` still defaults to true so I'm not sure what's going on.

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

<@U06TM7M40RK> Can you help with the questions in this thread?


:information_source: We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a :thumbsup: or :thumbsdown: so other folks in the Community know if you’re still looking for help on the topic!

To address the issue of missing BigQuery table query stats, let’s first ensure that all necessary configurations are correctly set in your ingestion recipe. Based on the provided data sources, here are a few points to consider:

  1. Include Usage Statistics: Although include_usage_statistics defaults to true, explicitly setting it in your configuration might help ensure that usage statistics are indeed being included.

  2. Usage Configuration: Ensure that the usage configuration is correctly set. This includes parameters like include_top_n_queries, include_operational_stats, and include_read_operational_stats.

  3. Profiling Configuration: Your profiling settings look mostly correct, but ensure that profile_table_level_only is set according to your needs. Additionally, check if enable_stateful_profiling is affecting the stats.

  4. Stateful Ingestion: Ensure that stateful ingestion is correctly configured and not causing any issues with the stats.

Here is an updated version of your ingestion recipe with explicit settings for usage statistics:

source:
    type: bigquery
    config:
        include_table_lineage: true
        include_tables: true
        include_views: true
        convert_urns_to_lowercase: true
        include_usage_statistics: true  # Explicitly set this to true
        usage:
            include_top_n_queries: true
            include_operational_stats: true
            include_read_operational_stats: true
        profiling:
            enabled: true
            profile_table_level_only: true
        stateful_ingestion:
            enabled: true

Additional Questions:

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using?
  3. Have you checked the ingestion logs for any errors or warnings related to usage statistics?

Would you like more detailed steps or additional troubleshooting tips?

Sources:

Sources:

  • <BigQuery | DataHub Ingestion Configuration> :+1: 0 button :-1: 0 button :bulb: Hint: Mention @runllm in the thread for followups.
 'num_usage_total_log_entries': {},
 'num_usage_parsed_log_entries': {},
 'usage_error_count': {},
 'num_usage_resources_dropped': 0,
 'num_usage_operations_dropped': 0,
 'operation_dropped': [],
 'usage_failed_extraction': [],```
I ran with the above config and still do not see usage stats.

DataHub  0.13.3 and CLI 0.13.2.4

<@U06TM7M40RK>

Given that your ingestion configuration appears correct and you are still not seeing usage stats, there are a few additional steps you can take to troubleshoot the issue:

Steps to Troubleshoot:

  1. Check Permissions: Ensure that the service account used for ingestion has the necessary permissions to access BigQuery usage logs. Missing permissions can prevent usage statistics from being captured.

  2. Examine Logs: Review the ingestion logs for any errors or warnings related to usage statistics. The logs you provided show that no usage log entries were parsed ('num_usage_parsed_log_entries': {}), which suggests that the logs might not be accessible or are empty.

  3. Enable Detailed Debug Logs: Increase the logging level to debug to capture more detailed information about the ingestion process. This can help identify where the issue might be occurring.

  4. Check BigQuery Audit Logs: Verify that BigQuery audit logs are being generated and contain the expected usage information. If the audit logs are not being generated, DataHub cannot extract usage statistics.

  5. Review Stateful Ingestion: Ensure that stateful ingestion is not causing issues. Sometimes, stateful ingestion settings can interfere with the proper capture of usage statistics.

Updated Ingestion Recipe with Debug Logging:

source:
    type: bigquery
    config:
        include_table_lineage: true
        include_tables: true
        include_views: true
        convert_urns_to_lowercase: true
        include_usage_statistics: true
        usage:
            include_top_n_queries: true
            include_operational_stats: true
            include_read_operational_stats: true
        profiling:
            enabled: true
            profile_table_level_only: true
        stateful_ingestion:
            enabled: true
        debug: true  # Enable debug logging

Example Debug Logging Command:
Run the ingestion command with debug logging enabled to capture more detailed logs:

datahub ingest -c your_recipe.yml --debug

Additional Questions:

  1. Have you verified the BigQuery audit logs to ensure they contain the expected usage information?
  2. Are there any specific errors or warnings in the detailed debug logs?

Would you like more detailed steps or additional troubleshooting tips?

Sources:

Sources:

  • <BigQuery | DataHub Ingestion Configuration>
  • <Slack Discussion on Profiling> :+1: 0 button :-1: 0 button :bulb: Hint: Mention @runllm in the thread for followups.