I’m trying to ingest Redshift table/view metadata and limit the lineage extraction to views only by setting include_view_lineage = true and include_table_lineage = false. However, based on the logs, it seems that table lineage is still being extracted. Specifically, it appears that setting include_table_lineage does not fully skip lineage extraction for tables.
use_lineage_v2: true
include_tables: true
include_views: true
include_view_lineage: true
skip_external_tables: false
include_table_lineage: false
include_table_rename_lineage: false
table_lineage_mode: "sql_based" # I would like to set this to null, but get an error
include_table_location_lineage: false
include_usage_statistics: false
include_top_n_queries: false
top_n_queries: 1
lineage_v2_generate_queries: false
include_unload_lineage: false
extract_column_level_lineage: false
include_operational_stats: false
include_copy_lineage: false
include_view_column_lineage: false
incremental_lineage: false
resolve_temp_table_in_lineage: false
profiling:
enabled: false
stateful_ingestion:
enabled: true
remove_stale_metadata: true
pipeline_name: “redshift_lineage_only_ingest”
Has anyone encountered this issue or found a way to configure ingestion to exclude table lineage completely? Any guidance would be greatly appreciated!
It seems likely that you are encountering an issue where setting include_table_lineage to false does not fully prevent table lineage extraction. Based on the <Redshift | DataHub documentation>, the include_table_lineage parameter is intended to control whether table lineage should be ingested. However, there might be other configurations or defaults that are causing table lineage to still be extracted.
One potential area to investigate is the table_lineage_mode parameter. You mentioned wanting to set it to null but encountering errors. This parameter controls the mode of table lineage collection, and setting it to a specific mode like sql_based might still trigger some lineage extraction processes. Unfortunately, the documentation does not specify a null option for this parameter, which might be why you’re seeing errors.
To address this, you might want to ensure that all related lineage parameters are set to false, as you have done, and verify if there are any other configurations that might be influencing the behavior. It could also be helpful to check if there are any updates or known issues in the DataHub community or GitHub repository that might address this behavior.
Would you like more detailed guidance on any specific part of the configuration or further exploration of the source code?