Hi all! My team is currently considering adopting Datahub and we’ve been experimenting with it. We noticed that we weren’t getting lineage using the Redshift connector unless a table was being generated with INSERT INTO statements even when using the mixed query ingestion approach. Unfortunately, a lot of our legacy code uses a DROP/CREATE approach and many tables are created with CREATE TABLE tablename (SELECT …) statements. Is our understanding correct that these statements are not parsed by the Redshift connector?
what version of datahub you are using? which redshift connector you are using redshift
or redshift-legacy
redshift connect extract lineage from CREATE statements when include_table_lineage:true
(default it true)
what option you are using for lineage collector table_lineage_mode
, best one to use mixed. Possible value [stl_scan_based, sql_based, mixed]
Make sure metadata ingest user as proper access to system tables.
set capture_lineage_query_parser_failures:true
to capture lineage query parser errors ingest logs
We’re using 0.12.1.5 with the mixed approach and redshift
. We were able to see lineage when we used INSERT INTO statements. But none of the CREATE TABLE statements seemed to be parsed.
Can you post your ingestion logs and a sample of a CREATE TABLE
query that you expect to see lineage from? Also note that if you’re using temporary tables as part of this process, there’s some additional flags you need to add at the moment