Troubleshooting slow Vertica UI ingestion on Kubernetes with DataHub 0.12.0

Original Slack Thread

Hi, I have a problem with the vertica UI ingestion running on kubernetes deployment with datahub version 0.12.0. It seems to be quite slow and stuck on a query now. The last log message seems to be this

SELECT column_name, data_type, column_default, is_nullable
FROM v_catalog.columns
WHERE lower(table_name) = 'removed'
AND lower(table_schema) = 'schema'
UNION ALL
SELECT column_name, data_type, '' as column_default, true as is_nullable
FROM v_catalog.view_columns
WHERE lower(table_name) = 'removed'
AND lower(table_schema) = 'schema'
UNION ALL
SELECT projection_column_name,data_type,'' as column_default, true as is_nullable
FROM PROJECTION_COLUMNS
WHERE lower(projection_name) = 'removed'
AND lower(table_schema) = 'schema'

2024-01-12 00:20:05,494 INFO sqlalchemy.engine.Engine [dialect vertica+vertica_python does not support caching 0.00027s] {}
[2024-01-12 00:20:05,494] INFO     {sqlalchemy.engine.Engine:1868} - [dialect vertica+vertica_python does not support caching 0.00027s] {}```
I had this problem with vertica before where it failed at a random table so I added max_threads=1 as apparently that would help but I still get the same error. The ingestion is also still apparently running according to the UI and last time I cancelled the ingestion I had to delete the MySQL PVC and make a new one (Then I added max_threads=1). I can't see any obvious errors in the gms or actions logs either.

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)