Troubleshooting Snowflake Ingestion Error with Profiling Enabled

Original Slack Thread

I encounter errors when ingesting Snowflake (v0.11):
snowflake.connector.errors.ProgrammingError: 090106 (22000): Cannot perform CREATE TEMPTABLE. This session does not have a current schema. Call 'USE SCHEMA', or use a qualified name.
I know that it is connected to profling, but how to solve it other than disabling it?

The role has all the permissions documented here: https://datahubproject.io/docs/generated/ingestion/sources/snowflake#prerequisites
Are there any missing? In a previous <Slack here in slack (2yrs old)> I found the mention of using the account admin for profiling, but I hope that’s not the answer here.

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

Hi <@U044D8JDX46> :wave:

Can you confirm if setting below in profiling config solves the issue for you ?

    enabled: True
    use_sampling: False```
This will disable use of sampling in profiling. I'm guessing the issue you are facing might be a bug in sample-based profiling workflow.

Hi, I just re-run the job with that setting but unfortunately, it is producing the same error.

Hey <@U044D8JDX46> can you share the entire debug log / or the stack trace at the least - it’ll help to understand the source of error.

sure, here’s a snippet:

2023-12-11T08:17:55.003501242Z [2023-12-11 08:17:55,003] INFO     {datahub.ingestion.source.ge_data_profiler:909} - Will profile 857 table(s) with 40 worker(s) - this may take a while
2023-12-11T08:17:55.014098699Z [2023-12-11 08:17:55,012] INFO     {great_expectations.util:1938} - Could not find local context root directory
2023-12-11T08:17:55.016267858Z [2023-12-11 08:17:55,016] INFO     {great_expectations.data_context.data_context.abstract_data_context:5420} - EphemeralDataContext has not implemented `_load_zep_config()` returning empty `GxConfig`
2023-12-11T08:17:55.017484066Z [2023-12-11 08:17:55,017] INFO     {great_expectations.experimental.datasources.config:53} - Loading 'datasources' -&gt;
2023-12-11T08:17:55.017715820Z {}
2023-12-11T08:17:55.019475270Z [2023-12-11 08:17:55,019] INFO     {great_expectations.experimental.datasources.config:79} - Loaded 'datasources' -&gt;
2023-12-11T08:17:55.019506141Z {}
2023-12-11T08:17:55.032821269Z [2023-12-11 08:17:55,032] INFO     {great_expectations.data_context.data_context.abstract_data_context:4495} - Usage statistics is disabled; skipping initialization.
2023-12-11T08:17:55.046327051Z [2023-12-11 08:17:55,045] INFO     {datahub.ingestion.source.ge_data_profiler:1063} - Profiling prod01_db_cds_edw.core_consumption.XXXXXXXX
2023-12-11T08:17:55.052498669Z [2023-12-11 08:17:55,052] WARNING  {py.warnings:109} - /usr/local/lib/python3.10/site-packages/great_expectations/data_context/data_context/abstract_data_context.py:2705: DeprecationWarning: create_expectation_suite is deprecated as of v0.15.48 and will be removed in v0.18. Please use add_expectation_suite or add_or_update_expectation_suite instead.
2023-12-11T08:17:55.052548411Z   warnings.warn(
2023-12-11T08:17:55.052554680Z 
2023-12-11T08:17:55.058564545Z [2023-12-11 08:17:55,058] INFO     {great_expectations.datasource.sqlalchemy_datasource:430} - Generating query from table batch_kwargs based on limit and offset
2023-12-11T08:17:55.062685117Z [2023-12-11 08:17:55,062] INFO     {great_expectations.dataset.sqlalchemy_dataset:1682} - Creating temporary table ge_temp_21f52991
2023-12-11T08:17:56.389700236Z [2023-12-11 08:17:56,202] ERROR    {datahub.ingestion.source.ge_data_profiler:1094} - Encountered exception while profiling prod01_db_cds_edw.core_consumption.XXXXXXXX
2023-12-11T08:17:56.389747887Z Traceback (most recent call last):
2023-12-11T08:17:56.389754898Z   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1900, in _execute_context
2023-12-11T08:17:56.389759838Z     self.dialect.do_execute(
2023-12-11T08:17:56.389764597Z   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
2023-12-11T08:17:56.389769088Z     cursor.execute(statement, parameters)
2023-12-11T08:17:56.389773428Z   File "/usr/local/lib/python3.10/site-packages/snowflake/connector/cursor.py", line 827, in execute
2023-12-11T08:17:56.389777997Z     Error.errorhandler_wrapper(self.connection, self, error_class, errvalue)
2023-12-11T08:17:56.389782388Z   File "/usr/local/lib/python3.10/site-packages/snowflake/connector/errors.py", line 275, in errorhandler_wrapper
2023-12-11T08:17:56.389787088Z     handed_over = Error.hand_to_other_handler(
2023-12-11T08:17:56.389791529Z   File "/usr/local/lib/python3.10/site-packages/snowflake/connector/errors.py", line 330, in hand_to_other_handler
2023-12-11T08:17:56.389807109Z     cursor.errorhandler(connection, cursor, error_class, error_value)
2023-12-11T08:17:56.389811809Z   File "/usr/local/lib/python3.10/site-packages/snowflake/connector/errors.py", line 209, in default_errorhandler
2023-12-11T08:17:56.389816218Z     raise error_class(
2023-12-11T08:17:56.389820729Z snowflake.connector.errors.ProgrammingError: 090106 (22000): Cannot perform CREATE TEMPTABLE. This session does not have a current schema. Call 'USE SCHEMA', or use a qualified name.
2023-12-11T08:17:56.389824779Z 
2023-12-11T08:17:56.389829120Z The above exception was the direct cause of the following exception:
2023-12-11T08:17:56.389833099Z 
2023-12-11T08:17:56.389839309Z Traceback (most recent call last):
2023-12-11T08:17:56.389844120Z   File "/datahub-ingestion/.local/lib/python3.10/site-packages/datahub/ingestion/source/ge_data_profiler.py", line 1065, in _generate_single_profile
2023-12-11T08:17:56.389848239Z     batch = self._get_ge_dataset(
2023-12-11T08:17:56.389852369Z   File "/datahub-ingestion/.local/lib/python3.10/site-packages/datahub/ingestion/source/ge_data_profiler.py", line 1128, in _get_ge_dataset
2023-12-11T08:17:56.389856580Z     batch = ge_context.data_context.get_batch(
2023-12-11T08:17:56.389860920Z   File "/usr/local/lib/python3.10/site-packages/great_expectations/data_context/data_context/abstract_data_context.py", line 1076, in get_batch
2023-12-11T08:17:56.389865299Z     return self._get_batch_v2(
2023-12-11T08:17:56.389869570Z   File "/usr/local/lib/python3.10/site-packages/great_expectations/data_context/data_context/abstract_data_context.py", line 1204, in _get_batch_v2
2023-12-11T08:17:56.389873960Z     return validator.get_dataset()
2023-12-11T08:17:56.389878370Z   File "/usr/local/lib/python3.10/site-packages/great_expectations/validator/validator.py", line 2122, in get_dataset
2023-12-11T08:17:56.389882740Z     return self.expectation_engine(
2023-12-11T08:17:56.389898361Z   File "/usr/local/lib/python3.10/site-packages/great_expectations/dataset/sqlalchemy_dataset.py", line 887, in __init__
2023-12-11T08:17:56.389902890Z     self.create_temporary_table(
2023-12-11T08:17:56.389907171Z   File "/usr/local/lib/python3.10/site-packages/great_expectations/dataset/sqlalchemy_dataset.py", line 1739, in create_temporary_table
2023-12-11T08:17:56.389916671Z     self.engine.execute(stmt)
2023-12-11T08:17:56.389920961Z   File "/datahub-ingestion/.local/lib/python3.10/site-packages/datahub/utilities/sqlalchemy_query_combiner.py", line 273, in _sa_execute_fake
2023-12-11T08:17:56.389927081Z     return _sa_execute_underlying_method(conn, query, *args, **kwargs)
2023-12-11T08:17:56.389931402Z   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1365, in execute
2023-12-11T08:17:56.389935651Z     return self._exec_driver_sql(
2023-12-11T08:17:56.389939851Z   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1669, in _exec_driver_sql
2023-12-11T08:17:56.389944192Z     ret = self._execute_context(
2023-12-11T08:17:56.389948412Z   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1943, in _execute_context
2023-12-11T08:17:56.389957852Z     self._handle_dbapi_exception(
2023-12-11T08:17:56.389963672Z   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2124, in _handle_dbapi_exception
2023-12-11T08:17:56.389967913Z     util.raise_(
2023-12-11T08:17:56.389972442Z   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/util/compat.py", line 208, in raise_
2023-12-11T08:17:56.389976662Z     raise exception
2023-12-11T08:17:56.389980863Z   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1900, in _execute_context
2023-12-11T08:17:56.389985033Z     self.dialect.do_execute(
2023-12-11T08:17:56.389989172Z   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
2023-12-11T08:17:56.389993403Z     cursor.execute(statement, parameters)
2023-12-11T08:17:56.389997563Z   File "/usr/local/lib/python3.10/site-packages/snowflake/connector/cursor.py", line 827, in execute
2023-12-11T08:17:56.390001773Z     Error.errorhandler_wrapper(self.connection, self, error_class, errvalue)
2023-12-11T08:17:56.390012723Z   File "/usr/local/lib/python3.10/site-packages/snowflake/connector/errors.py", line 275, in errorhandler_wrapper
2023-12-11T08:17:56.390016993Z     handed_over = Error.hand_to_other_handler(
2023-12-11T08:17:56.390021144Z   File "/usr/local/lib/python3.10/site-packages/snowflake/connector/errors.py", line 330, in hand_to_other_handler
2023-12-11T08:17:56.390030374Z     cursor.errorhandler(connection, cursor, error_class, error_value)
2023-12-11T08:17:56.390034644Z   File "/usr/local/lib/python3.10/site-packages/snowflake/connector/errors.py", line 209, in default_errorhandler
2023-12-11T08:17:56.390038844Z     raise error_class(
2023-12-11T08:17:56.390043464Z sqlalchemy.exc.ProgrammingError: (snowflake.connector.errors.ProgrammingError) 090106 (22000): Cannot perform CREATE TEMPTABLE. This session does not have a current schema. Call 'USE SCHEMA', or use a qualified name.
2023-12-11T08:17:56.390047774Z [SQL: CREATE OR REPLACE TEMPORARY TABLE ge_temp_21f52991 AS SELECT * 
2023-12-11T08:17:56.390052224Z FROM "CORE_CONSUMPTION"."XXXXXXXX"
2023-12-11T08:17:56.390056505Z  LIMIT 100]
2023-12-11T08:17:56.390060764Z (Background on this error at: <https://sqlalche.me/e/14/f405>)```

can you share your recipe ? Have you set the “limit” in profiling config ? I think, there might be an issue in this specific scenario on GX side. We use great expectations(GX) under the hood for profiling. Can you please create a github issue for this and we will look into it as soon as possible.

Meanwhile, does the profiling work if you remove the “limit” config ?

Could you also quickly confirm if the role used for ingestion can run queries “Use database <db>” and “Use schema <schema>” for the schema and database of this table "CORE_CONSUMPTION"."XXXXXXXX".

I’ll remove the limit and let you know once it’s done.

Database and schema usage rights are granted.

The profling part of the recipe looks like this, but I added those restrictions recently.

                enabled: true
                turn_off_expensive_profiling_metrics: true
                limit: 100
                max_number_of_fields_to_profile: 50
                sample_size: 1000
                use_sampling: false
                #profile_day_of_week: 5 #only on saturdays```

removed limit - no errors anymore.

But just disabling sampling is not really what I want - i want those sample values to be in our data catalog. So the question remains. what additional rules are required for my Snowflake role to make that work?

Hey <@U044D8JDX46> use_sampling: False does not hide sample values but it acts more like limit but chooses a random sample subset from table. You should already be able to see sample values with current ingestion.

created an issue as requested: https://github.com/datahub-project/datahub/issues/9461

Hey <@U044D8JDX46> I may have found the root cause for this. Can you confirm if the concerned database here does not have a “PUBLIC” schema ?

As mentioned here - https://docs.snowflake.com/en/sql-reference/sql/use-database#usage-notes no schema is selected if PUBLIC schema is absent. I’ll work on the fix for this as soon as you can confirm whether same is the case for you…

Correct, it does not have a “public” schema. Nice find!