Hi, I am facing the <class 'snowflake.connector.errors.ProgrammingError'>: 100082 (22000): Max LOB size (16777216) exceeded, actual size of parsed column is 18431556
while i have tried the suggested configuration flags to fix this i.e., as part of the config
use_legacy_lineage_method: true # Optional: Use legacy method for ```
the error doesn't seem to resolve, need help with this
The error you’re encountering, Max LOB size exceeded, typically occurs when the size of the data being processed exceeds the maximum allowed size for a Large Object (LOB) in Snowflake. This is a common issue when dealing with large datasets or complex queries during metadata ingestion.
Based on the information from the <Slack Slack conversation>, here are some steps and considerations to address this issue:
Configuration Adjustments:
You have already set ignore_start_time_lineage: false and use_legacy_lineage_method: true. These settings are intended to limit the amount of data processed and use a more efficient method for lineage extraction.
Data Volume:
Consider reducing the amount of data being processed in a single run. This can be done by limiting the scope of the ingestion to specific tables or schemas, or by breaking down the ingestion process into smaller chunks.
Check for Temporary Tables:
If you are using any ETL tools that create temporary tables with varying names, this could lead to large result sets. Ensure that such tables are not included in the lineage extraction process.
Review Query Complexity:
Simplify the queries used for lineage extraction if possible. Complex queries with large result sets can contribute to exceeding the LOB size limit.
Monitor Logs:
Keep an eye on the logs for any additional errors or warnings that might provide more context on what is causing the large data size.
If these steps do not resolve the issue, it might be helpful to provide more specific details about your setup or any additional error messages you are seeing. Would you like more detailed steps or further assistance on any specific part of the process?
Sources:
<Slack Slack conversation> 0 button 0 button Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK> step 1 ignore_start_time_lineage: false & use_legacy_lineage_method: true are set & step 2 from the suggestion data volume is limited by reducing the schema & db, How to filter for temporary tables & simplify the queries used for lineage extraction
To filter out temporary tables and simplify queries used for lineage extraction in DataHub, you can follow these steps:
Filter Temporary Tables:
Use regex patterns to exclude temporary tables from lineage extraction. This can be done by setting up patterns in your ingestion configuration to ignore tables that match known temporary table naming conventions.
source:
type: snowflake
config:
account_id: <your_account_id>
username: <your_username>
password: <your_password>
warehouse: <your_warehouse>
role: <your_role>
include_table_lineage: true
include_view_lineage: true
ignore_start_time_lineage: false
use_legacy_lineage_method: true
temporary_tables_pattern:
- '.*_temp.*' # Example pattern to exclude tables with '_temp' in their name
Simplify Queries:
Review the queries used for lineage extraction and identify any that are unnecessarily complex or that return large result sets.
Consider breaking down complex queries into smaller, more manageable parts, or using more targeted queries that focus on specific tables or schemas.
If possible, avoid using wildcard selections (e.g., SELECT *) and instead specify only the necessary columns.
Use Efficient Lineage Methods:
Ensure that you are using the most efficient lineage extraction methods available. The use_legacy_lineage_method flag is one such option that can help optimize the process.
By implementing these strategies, you should be able to reduce the data volume and complexity of queries, which can help mitigate the “Max LOB size exceeded” error.
Would you like more detailed guidance on any specific part of this process?