Resolving Snowflake Max LOB Size Error and Lineage Extraction Optimization

user-1 · December 9, 2024, 12:03am

Hi, I am facing the <class 'snowflake.connector.errors.ProgrammingError'>: 100082 (22000): Max LOB size (16777216) exceeded, actual size of parsed column is 18431556
while i have tried the suggested configuration flags to fix this i.e., as part of the config

use_legacy_lineage_method: true   # Optional: Use legacy method for ```
the error doesn't seem to resolve, need help with this

datahub_team · December 9, 2024, 12:03am

Hey there!

<@U06TM7M40RK> Can you help with the questions in this thread?

Please continue to tag RunLLM (e.g <@U06TM7M40RK>) to have it continue generating responses!

user-2 · December 9, 2024, 12:03am

The error you’re encountering, Max LOB size exceeded, typically occurs when the size of the data being processed exceeds the maximum allowed size for a Large Object (LOB) in Snowflake. This is a common issue when dealing with large datasets or complex queries during metadata ingestion.

Based on the information from the <Slack Slack conversation>, here are some steps and considerations to address this issue:

Configuration Adjustments:
- You have already set ignore_start_time_lineage: false and use_legacy_lineage_method: true. These settings are intended to limit the amount of data processed and use a more efficient method for lineage extraction.
Data Volume:
- Consider reducing the amount of data being processed in a single run. This can be done by limiting the scope of the ingestion to specific tables or schemas, or by breaking down the ingestion process into smaller chunks.
Check for Temporary Tables:
- If you are using any ETL tools that create temporary tables with varying names, this could lead to large result sets. Ensure that such tables are not included in the lineage extraction process.
Review Query Complexity:
- Simplify the queries used for lineage extraction if possible. Complex queries with large result sets can contribute to exceeding the LOB size limit.
Monitor Logs:
- Keep an eye on the logs for any additional errors or warnings that might provide more context on what is causing the large data size.

If these steps do not resolve the issue, it might be helpful to provide more specific details about your setup or any additional error messages you are seeing. Would you like more detailed steps or further assistance on any specific part of the process?

Sources:

<Slack Slack conversation> 0 button 0 button Hint: Mention @runllm in the thread for followups.

user-1 · December 9, 2024, 12:03am

<@U06TM7M40RK> step 1 ignore_start_time_lineage: false & use_legacy_lineage_method: true are set & step 2 from the suggestion data volume is limited by reducing the schema & db, How to filter for temporary tables & simplify the queries used for lineage extraction

user-2 · December 9, 2024, 12:03am

To filter out temporary tables and simplify queries used for lineage extraction in DataHub, you can follow these steps:

Filter Temporary Tables:
- Use regex patterns to exclude temporary tables from lineage extraction. This can be done by setting up patterns in your ingestion configuration to ignore tables that match known temporary table naming conventions.
- According to the https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_config.py|SnowflakeV2Config, you can specify regex patterns for temporary tables using the temporary_tables_pattern field. This field allows you to define patterns that match the entire table name in the database.schema.table format.
Example configuration:
```
source:
  type: snowflake
  config:
    account_id: <your_account_id>
    username: <your_username>
    password: <your_password>
    warehouse: <your_warehouse>
    role: <your_role>
    include_table_lineage: true
    include_view_lineage: true
    ignore_start_time_lineage: false
    use_legacy_lineage_method: true
    temporary_tables_pattern:
      - '.*_temp.*'  # Example pattern to exclude tables with '_temp' in their name
```
Simplify Queries:
- Review the queries used for lineage extraction and identify any that are unnecessarily complex or that return large result sets.
- Consider breaking down complex queries into smaller, more manageable parts, or using more targeted queries that focus on specific tables or schemas.
- If possible, avoid using wildcard selections (e.g., SELECT *) and instead specify only the necessary columns.
Use Efficient Lineage Methods:
- Ensure that you are using the most efficient lineage extraction methods available. The use_legacy_lineage_method flag is one such option that can help optimize the process.

By implementing these strategies, you should be able to reduce the data volume and complexity of queries, which can help mitigate the “Max LOB size exceeded” error.

Would you like more detailed guidance on any specific part of this process?

Sources:

https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_config.py|SnowflakeV2Config 0 button 0 button Hint: Mention @runllm in the thread for followups.

datahub_team · December 9, 2024, 12:03am

Hi ! Did this interaction resolve your issue or do you still require assistance ?

Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look

Hope this helps ! ><

user-1 · December 9, 2024, 12:03am

the issue is resolved

Topic		Replies	Views
Troubleshooting Snowflake Lineage Extraction Error: "Max LOB size exceeded" ingestion	3	92	June 3, 2024
Troubleshooting "RecursionError: maximum recursion depth exceeded in comparison" during metadata ingestion ingestion	2	21	June 24, 2024
Troubleshooting SQL View Profiling in Snowflake Datahub ingestion	13	39	June 24, 2024
Addressing Snowflake Ingestion Issues: Resolving Timeout Error and Long Execution Times ingestion	2	15	May 5, 2025
Snowflake Lineage and DataHub Ingestion Challenges ingestion	8	55	March 4, 2024

Resolving Snowflake Max LOB Size Error and Lineage Extraction Optimization

Hey there!

<@U06TM7M40RK> Can you help with the questions in this thread?

Related topics