Persistent Sleeping Connections Issue in Datahub UI with MySQL Backend Database

Original Slack Thread

Hi,

We are running Datahub v10.5 but this problem has been bothering us since we first started to use the product.
We have a backend MySQL DB for metadata and we see that is intermittently gets locked out due to the below error -

Caused by: java.sql.SQLException: null, message from server: "Host 'XX.XX.XX.XX' is blocked because of many connection errors; unblock with 'mysqladmin flush-hosts'"
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:129)
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:97)
at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:122)
at com.mysql.cj.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:836)
at com.mysql.cj.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:456)
at com.mysql.cj.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:246)
at com.mysql.cj.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:197)
at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:677)
at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:189)
at io.ebean.datasource.pool.ConnectionPool.createUnpooledConnection(ConnectionPool.java:529)
at io.ebean.datasource.pool.ConnectionPool.createUnpooledConnection(ConnectionPool.java:524)
at io.ebean.datasource.pool.ConnectionPool.createConnectionForQueue(ConnectionPool.java:766)
at io.ebean.datasource.pool.PooledConnectionQueue._getPooledConnection(PooledConnectionQueue.java:314)
at io.ebean.datasource.pool.PooledConnectionQueue.getPooledConnection(PooledConnectionQueue.java:270)
at io.ebean.datasource.pool.ConnectionPool.getPooledConnection(ConnectionPool.java:817)
at io.ebean.datasource.pool.ConnectionPool.getConnection(ConnectionPool.java:805)
at io.ebeaninternal.server.transaction.TransactionFactoryBasic.createQueryTransaction(TransactionFactoryBasic.java:28)

We have set the EBEAN_MAX_CONNECTIONS to 200 and below are our relevant Mysql settings:

max_user_connections: 200
max_connections_per_hour: 200 (400 in lower environments)

We observed that just 1 click on the Datahub UI creates multiple sleeping DB connections for the application user. We are not sure how the metadata DB connections are being handled in the code and we fear that any open connections are not being closed in the backend code, resulting in such sleeping connections. This has become a very frequent problem and when we onboard more end users we see more frequent occurences of this issue.
Right now the workaround is we manually login to the DB and kill the sleeping connections. We want to find the root cause as to why there are always so many open sleeping connections after any ingestion or any user activity on the UI.
Is anyone else facing the same issue with their backend DB ?

Hi - we recommend to post long logs in the thread, for the sake of the cleanness of the community :sparkles: <@UV5UEC3LN> might be able to speak to this!

this is the entire stack trace. THe same message keeps repeating. We have observed that even if 1 UI user does a page refresh it creates atleast 5 such sleeping connections for the mysql user.

Don’t think we’ve ever seen this before and MySQL gets used extensively in the community, I’m guessing this has something to do with your particular network configurations with how you connect to your DB. Please check your MySQL error log as suggested here: https://stackoverflow.com/questions/12639328/host-x-blocked-because-of-many-connection-errors

Hi <@UV5UEC3LN> this morning we saw this in the logs. We had set the EBEAN_MAX_CONNECTIONS to 200 and mysql_connections_per_hour to 400. It seems there were 200 open queries running from GMS (?) which is strange since we hardly had 2 users logged in, 1 of which was working on policies. Can you look at these queries and help in understanding on what user actions are they running and why are 200 such open connections pending ? Sending here the datasource config from components chart.yamlattachment

global:
graph_service_impl: elasticsearch

sql:
datasource:
host: “http://cp200mysql01.ddc.XXX.net:3306|cp200mysql01.ddc.XXX.net:3306
hostForMysqlClient: “http://cp200mysql01.ddc.XXX.net|cp200mysql01.ddc.XXX.net
url: “jdbc:mysql://cp200mysql01.ddc.teliasonera.net:3306/datahub?verifyServerCertificate=false&amp;useSSL=true&amp;useUnicode=yes&amp;characterEncoding=UTF-8&amp;enabledTLSProtocols=TLSv1.2&amp;useJDBCCompliantTimezoneShift=true&amp;useLegacyDatetimeCode=false&amp;serverTimezone=UTC|mysql://cp200mysql01.ddc.XXX.net:3306/datahub?verifyServerCertificate=false&amp;useSSL=true&amp;useUnicode=yes&amp;characterEncoding=UTF-8&amp;enabledTLSProtocols=TLSv1.2&amp;useJDBCCompliantTimezoneShift=true&amp;useLegacyDatetimeCode=false&amp;serverTimezone=UTC
username: “appladmin”
password:
secretRef: mysql-secrets
secretKey: mysql-root-password

Below MYSQL DB settings:

max_connect_errors | 100 |
| max_connections | 400 |
| max_user_connections | 400 |

400 connections per hour as a limit seems extremely low, there are periodic background threads that will query the DB to do things (like fetch policies etc.). Is there a reason you need this limit? The default is no limit per hour.

ok understood. We dont have limits in lower environment, but it seems as per company policy we cannot set that for Production.

Connections are fairly short lived, this is very typical application design to my knowledge. I’m not sure why you would have a policy that allows 200 simultaneous connections, but only 400 per hour. That wouldn’t stop an application from constantly streaming data to the database over 200 connections for an hour, but would stop an application that grabs a few connections, releases them when no longer needed, and then reconnects as needed.