Snowflake Lineage and DataHub Ingestion Challenges

Original Slack Thread

Hello - I’m experimenting with snowflake lineage and having a couple issues. If I create a table using a CTAS statement, everythign seems fine. However if I recreate the table with different select criteria (e.g. different upstream tables), Datahub ingestion does not get rid of the old Upstream tables in lineage view. Is this expected?

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)
  1. CLI
  2. v 0.12.0.1
  3. Snowflake

It might be because we see both the original create table statement and the new one in the query logs, and merge the two

We do merging because often people have a create table followed by an insert into

You can modify your time windows in the config to see if that helps

ok, seems like if I run with stateful ingestion enabled, and run the CTAS in between runs, it works ok

my other question - if I have a table who’s upstream table comes from a different database/schema… is it correct to say my recipe MUST ingest both databases/schemas to pick up that lineage?

That’s generally what we recommend