Unsuccessful Lineage Parsing of Redshift CREATE TABLE Statements in Datahub 0.12.1.5 with Mixed Approach and `redshift` Connector

Original Slack Thread

Hi all! My team is currently considering adopting Datahub and we’ve been experimenting with it. We noticed that we weren’t getting lineage using the Redshift connector unless a table was being generated with INSERT INTO statements even when using the mixed query ingestion approach. Unfortunately, a lot of our legacy code uses a DROP/CREATE approach and many tables are created with CREATE TABLE tablename (SELECT …) statements. Is our understanding correct that these statements are not parsed by the Redshift connector?

what version of datahub you are using? which redshift connector you are using redshift or redshift-legacy
redshift connect extract lineage from CREATE statements when include_table_lineage:true (default it true)
what option you are using for lineage collector table_lineage_mode , best one to use mixed. Possible value [stl_scan_based, sql_based, mixed]

Make sure metadata ingest user as proper access to system tables.
set capture_lineage_query_parser_failures:true to capture lineage query parser errors ingest logs

We’re using 0.12.1.5 with the mixed approach and redshift. We were able to see lineage when we used INSERT INTO statements. But none of the CREATE TABLE statements seemed to be parsed.

Can you post your ingestion logs and a sample of a CREATE TABLE query that you expect to see lineage from? Also note that if you’re using temporary tables as part of this process, there’s some additional flags you need to add at the moment