Troubleshooting REDSHIFT Connector Errors in Datahub Version 11.0

Original Slack Thread

Hi Team -
We are starting to see some wierd errors in the REDSHIT connector. We are on datahub version 11.0 and CLI 11.0.5. Below is the error -
[2023-10-23, 05:28:25 UTC] {process_utils.py:187} INFO - cursor.execute(query)
[2023-10-23, 05:28:25 UTC] {process_utils.py:187} INFO - [0m
[2023-10-23, 05:28:25 UTC] {process_utils.py:187} INFO - Traceback (most recent call last):
[2023-10-23, 05:28:25 UTC] {process_utils.py:187} INFO - File "/tmp/venv4gcza_l8/script.py", line 136, in <module>
[2023-10-23, 05:28:25 UTC] {process_utils.py:187} INFO - res = ingest_from_cirrus_redshift(*arg_dict["args"], **arg_dict["kwargs"])
[2023-10-23, 05:28:25 UTC] {process_utils.py:187} INFO - File "/tmp/venv4gcza_l8/script.py", line 82, in ingest_from_cirrus_redshift
[2023-10-23, 05:28:25 UTC] {process_utils.py:187} INFO - pipeline.run()
[2023-10-23, 05:28:25 UTC] {process_utils.py:187} INFO - File "/tmp/venv4gcza_l8/lib/python3.9/site-packages/datahub/ingestion/run/pipeline.py", line 377, in run
[2023-10-23, 05:28:25 UTC] {process_utils.py:187} INFO - for wu in itertools.islice(
[2023-10-23, 05:28:25 UTC] {process_utils.py:187} INFO - File "/tmp/venv4gcza_l8/lib/python3.9/site-packages/datahub/ingestion/api/source_helpers.py", line 118, in auto_stale_entity_removal
[2023-10-23, 05:28:25 UTC] {process_utils.py:187} INFO - for wu in stream:
[2023-10-23, 05:28:25 UTC] {process_utils.py:187} INFO - File "/tmp/venv4gcza_l8/lib/python3.9/site-packages/datahub/ingestion/api/source_helpers.py", line 142, in auto_workunit_reporter
[2023-10-23, 05:28:25 UTC] {process_utils.py:187} INFO - for wu in stream:
[2023-10-23, 05:28:25 UTC] {process_utils.py:187} INFO - File "/tmp/venv4gcza_l8/lib/python3.9/site-packages/datahub/ingestion/api/source_helpers.py", line 224, in auto_browse_path_v2
[2023-10-23, 05:28:25 UTC] {process_utils.py:187} INFO - for urn, batch in _batch_workunits_by_urn(stream):
[2023-10-23, 05:28:25 UTC] {process_utils.py:187} INFO - File "/tmp/venv4gcza_l8/lib/python3.9/site-packages/datahub/ingestion/api/source_helpers.py", line 362, in _batch_workunits_by_urn
[2023-10-23, 05:28:25 UTC] {process_utils.py:187} INFO - for wu in stream:
[2023-10-23, 05:28:25 UTC] {process_utils.py:187} INFO - File "/tmp/venv4gcza_l8/lib/python3.9/site-packages/datahub/ingestion/api/source_helpers.py", line 155, in auto_materialize_referenced_tags
[2023-10-23, 05:28:25 UTC] {process_utils.py:187} INFO - for wu in stream:
[2023-10-23, 05:28:25 UTC] {process_utils.py:187} INFO - File "/tmp/venv4gcza_l8/lib/python3.9/site-packages/datahub/ingestion/api/source_helpers.py", line 70, in auto_status_aspect
[2023-10-23, 05:28:25 UTC] {process_utils.py:187} INFO - for wu in stream:
[2023-10-23, 05:28:25 UTC] {process_utils.py:187} INFO - File "/tmp/venv4gcza_l8/lib/python3.9/site-packages/datahub/ingestion/source/redshift/redshift.py", line 389, in get_workunits_internal
[2023-10-23, 05:28:25 UTC] {process_utils.py:187} INFO - self.cache_tables_and_views(connection, database)
[2023-10-23, 05:28:25 UTC] {process_utils.py:187} INFO - File "/tmp/venv4gcza_l8/lib/python3.9/site-packages/datahub/ingestion/source/redshift/redshift.py", line 768, in cache_tables_and_views
[2023-10-23, 05:28:25 UTC] {process_utils.py:187} INFO - tables, views = RedshiftDataDictionary.get_tables_and_views(conn=connection)
[2023-10-23, 05:28:25 UTC] {process_utils.py:187} INFO - File "/tmp/venv4gcza_l8/lib/python3.9/site-packages/datahub/ingestion/source/redshift/redshift_schema.py", line 170, in get_tables_and_views
[2023-10-23, 05:28:25 UTC] {process_utils.py:187} INFO - cur = RedshiftDataDictionary.get_query_result(conn, RedshiftQuery.list_tables)
[2023-10-23, 05:28:25 UTC] {process_utils.py:187} INFO - File "/tmp/venv4gcza_l8/lib/python3.9/site-packages/datahub/ingestion/source/redshift/redshift_schema.py", line 90, in get_query_result
[2023-10-23, 05:28:25 UTC] {process_utils.py:187} INFO - cursor.execute(query)
[2023-10-23, 05:28:25 UTC] {process_utils.py:187} INFO - File "/tmp/venv4gcza_l8/lib/python3.9/site-packages/redshift_connector/cursor.py", line 248, in execute
[2023-10-23, 05:28:25 UTC] {process_utils.py:187} INFO - raise e
[2023-10-23, 05:28:25 UTC] {process_utils.py:187} INFO - File "/tmp/venv4gcza_l8/lib/python3.9/site-packages/redshift_connector/cursor.py", line 241, in execute
[2023-10-23, 05:28:25 UTC] {process_utils.py:187} INFO - self._c.execute(self, operation, args)
[2023-10-23, 05:28:25 UTC] {process_utils.py:187} INFO - File "/tmp/venv4gcza_l8/lib/python3.9/site-packages/redshift_connector/core.py", line 1933, in execute
[2023-10-23, 05:28:25 UTC] {process_utils.py:187} INFO - self.handle_messages(cursor)
[2023-10-23, 05:28:25 UTC] {process_utils.py:187} INFO - File "/tmp/venv4gcza_l8/lib/python3.9/site-packages/redshift_connector/core.py", line 2140, in handle_messages
[2023-10-23, 05:28:25 UTC] {process_utils.py:187} INFO - raise self.error
[2023-10-23, 05:28:25 UTC] {process_utils.py:187} INFO - redshift_connector.error.ProgrammingError: {'S': 'ERROR', 'C': 'XX000', 'M': 'invalid INTERVAL typmod: 0x10010003', 'F': '../src/pg/src/backend/utils/adt/format_type.c', 'L': '324', 'R': 'format_type_internal'}
[2023-10-23, 05:28:29 UTC] {taskinstance.py:1772} ERROR - Task failed with exception
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.9/site-packages/airflow/operators/python.py", line 356, in execute
return super().execute(context=serializable_context)
File "/home/airflow/.local/lib/python3.9/site-packages/airflow/operators/python.py", line 175, in execute
return_value = self.execute_callable()
File "/home/airflow/.local/lib/python3.9/site-packages/airflow/operators/python.py", line 553, in execute_callable
return self._execute_python_callable_in_subprocess(python_path, tmp_path)
File "/home/airflow/.local/lib/python3.9/site-packages/airflow/operators/python.py", line 412, in _execute_python_callable_in_subprocess
execute_in_subprocess(
File "/home/airflow/.local/lib/python3.9/site-packages/airflow/utils/process_utils.py", line 168, in execute_in_subprocess
execute_in_subprocess_with_kwargs(cmd, cwd=cwd)
File "/home/airflow/.local/lib/python3.9/site-packages/airflow/utils/process_utils.py", line 191, in execute_in_subprocess_with_kwargs
raise subprocess.CalledProcessError(exit_code, cmd)
Can someone please help ? <@U0348BYAS56> <@UV14447EU> We have not changed anything in the recipe and it used to work fine

any lead will be helpful here

<@U03T7BBGNTY> It looks like some dependency got update. some info on this issue is available here https://www.postgresql.org/message-id/200401281328.21594.darcy%40wavefire.com

<@U01GZEETMEZ> might help you further

Looks like it might be a bug in postgres/redshift, so I’m not sure if we have a good workaround for now. You can try downgrading your version of aws’s redshift-connector package

<@U03T7BBGNTY> pls mention what workarounds we have tried here

Hi <@U01GZEETMEZ> , we even tried by downgrading redshift-connector to v2.0.913 and sqlalchemy-redshift to v0.8.14 , along with datahub v10.5.5 in our airflow dag , but that didn’t helped much.