Hi,
I’d like to push this issue a bit as this could lead to critical situations. Or does anyone has a quick fix in mind?
https://github.com/datahub-project/datahub/issues/10354
Hey there! Make sure your message includes the following information if relevant, so we can help more effectively!
- Which DataHub version are you using? (e.g. 0.12.0)
- Please post any relevant error logs on the thread!
After doing upgrade yesterday I am seeing this issue too. In our case SendMAEStep fails too.
[main] ERROR c.l.d.u.restoreindices.SendMAEStep:78 - Error iterating futures```
<@U05HUB7E31U> <@U06M21PS798>
Do you see any EbeanAspectDao
related messages such as:
WARN c.l.m.entity.ebean.EbeanAspectDao:747 - Retryable PersistenceException: java.sql.SQLException: Unsuccessfully waited [1000] millis for a connection to be returned. No connections are free. You need to Increase the max connections of [50] or look for a connection pool leak using datasource.xxx.capturestacktrace=true
<@U05SKM6KGGK> yes we do - not entirely same but similar
2024-04-29 04:09:57,453 [qtp133250414-78321] WARN c.l.m.entity.ebean.EbeanAspectDao:720 - Retryable PersistenceException: Error when batch flush on sql: update metadata_aspect_v2 set metadata=?, createdOn=?, createdBy=?, createdFor=?, systemmetadata=? where urn=? and aspect=? and version=?
org.postgresql.util.PSQLException: This connection has been closed. at org.postgresql.jdbc.PgConnection.checkClosed(PgConnection.java:883) at org.postgresql.jdbc.PgConnection.clearWarnings(PgConnection.java:776) at io.ebean.datasource.pool.PooledConnection.close(PooledConnection.java:436) at io.ebean.util.JdbcClose.close(JdbcClose.java:50) at io.ebeaninternal.server.transaction.TransactionFactoryBasic.createReadOnlyTransaction(TransactionFactoryBasic.java:31) at io.ebeaninternal.server.transaction.TransactionManager.createReadOnlyTransaction(TransactionManager.java:337) at io.ebeaninternal.server.core.DefaultServer.createReadOnlyTransaction(DefaultServer.java:2177) at io.ebeaninternal.server.core.OrmQueryRequest.initTransIfRequired(OrmQueryRequest.java:221) at io.ebeaninternal.server.core.DefaultServer.findList(DefaultServer.java:1462) at io.ebeaninternal.server.core.DefaultServer.findList(DefaultServer.java:1442) at io.ebeaninternal.server.querydefn.DefaultOrmQuery.findList(DefaultOrmQuery.java:1483) at com.linkedin.metadata.entity.ebean.EbeanAspectDao.batchGetUnion(EbeanAspectDao.java:443) at com.linkedin.metadata.entity.ebean.EbeanAspectDao.batchGet(EbeanAspectDao.java:358) at com.linkedin.metadata.entity.ebean.EbeanAspectDao.batchGet(EbeanAspectDao.java:334) at com.linkedin.metadata.entity.EntityServiceImpl.exists(EntityServiceImpl.java:1812) at com.linkedin.metadata.entity.EntityService.exists(EntityService.java:68) at com.linkedin.metadata.entity.EntityService.exists(EntityService.java:78) at com.linkedin.metadata.aspect.utils.DefaultAspectsUtil.getAdditionalChanges(DefaultAspectsUtil.java:60) at com.linkedin.metadata.entity.EntityServiceImpl.ingestAspects(EntityServiceImpl.java:631) at com.linkedin.metadata.entity.EntityServiceImpl.ingestProposalSync(EntityServiceImpl.java:1103) at com.linkedin.metadata.entity.EntityServiceImpl.ingestProposal(EntityServiceImpl.java:947) at com.linkedin.metadata.resources.entity.AspectResource.lambda$ingestProposal$2(AspectResource.java:250) at com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:30) at com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:51) at com.linkedin.metadata.resources.entity.AspectResource.ingestProposal(AspectResource.java:242) at jdk.internal.reflect.GeneratedMethodAccessor174.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:568) at com.linkedin.restli.internal.server.RestLiMethodInvoker.doInvoke(RestLiMethodInvoker.java:177) at com.linkedin.restli.internal.server.RestLiMethodInvoker.invoke(RestLiMethodInvoker.java:333) at com.linkedin.restli.internal.server.filter.FilterChainDispatcherImpl.onRequestSuccess(FilterChainDispatcherImpl.java:47) at com.linkedin.restli.internal.server.filter.RestLiFilterChainIterator.onRequest(RestLiFilterChainIterator.java:86) at com.linkedin.restli.internal.server.filter.RestLiFilterChainIterator.lambda$onRequest$0(RestLiFilterChainIterator.java:73) at java.base/java.util.concurrent.CompletableFuture.uniAcceptNow(CompletableFuture.java:757) at java.base/java.util.concurrent.CompletableFuture.uniAcceptStage(CompletableFuture.java:735) at java.base/java.util.concurrent.CompletableFuture.thenAccept(CompletableFuture.java:2182) at com.linkedin.restli.internal.server.filter.RestLiFilterChainIterator.onRequest(RestLiFilterChainIterator.java:72) at com.linkedin.restli.internal.server.filter.RestLiFilterChain.onRequest(RestLiFilterChain.java:55) at com.linkedin.restli.server.BaseRestLiServer.handleResourceRequest(BaseRestLiServer.java:270) at com.linkedin.restli.server.RestRestLiServer.handleResourceRequestWithRestLiResponse(RestRestLiServer.java:294) at com.linkedin.restli.server.RestRestLiServer.handleResourceRequest(RestRestLiServer.java:262) at com.linkedin.restli.server.RestRestLiServer.handleResourceRequest(RestRestLiServer.java:232) at com.linkedin.restli.server.RestRestLiServer.doHandleRequest(RestRestLiServer.java:215) at com.linkedin.restli.server.RestRestLiServer.handleRequest(RestRestLiServer.java:171) at com.linkedin.restli.server.RestLiServer.handleRequest(RestLiServer.java:132) at com.linkedin.restli.server.DelegatingTransportDispatcher.handleRestRequest(DelegatingTransportDispatcher.java:70) at com.linkedin.r2.filter.transport.DispatcherRequestFilter.onRestRequest(DispatcherRequestFilter.java:70) at com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:76) at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146) at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132) at com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62) at com.linkedin.r2.filter.TimedNextFilter.onRequest(TimedNextFilter.java:55) at com.linkedin.r2.filter.transport.ServerQueryTunnelFilter.onRestRequest(ServerQueryTunnelFilter.java:58) at com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:76) at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146) at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132) at com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62) at com.linkedin.r2.filter.TimedNextFilter.onRequest(TimedNextFilter.java:55) at com.linkedin.r2.filter.message.rest.RestFilter.onRestRequest(RestFilter.java:50) at com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:76) at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146) at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132) at com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62) at com.linkedin.r2.filter.FilterChainImpl.onRestRequest(FilterChainImpl.java:106) at com.linkedin.r2.filter.transport.FilterChainDispatcher.handleRestRequest(FilterChainDispatcher.java:75) at com.linkedin.r2.util.finalizer.RequestFinalizerDispatcher.handleRestRequest(RequestFinalizerDispatcher.java:61) at com.linkedin.r2.transport.http.server.HttpDispatcher.handleRequest(HttpDispatcher.java:101) at com.linkedin.r2.transport.http.server.AbstractJakartaR2Servlet.service(AbstractJakartaR2Servlet.java:96) at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:587) at com.linkedin.restli.server.RestliHandlerServlet.service(RestliHandlerServlet.java:20) at com.linkedin.restli.server.RestliHandlerServlet.handleRequest(RestliHandlerServlet.java:26) at org.springframework.web.context.support.HttpRequestHandlerServlet.service(HttpRequestHandlerServlet.java:67) at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:587) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:764) at org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1665) at com.datahub.auth.authentication.filter.AuthenticationFilter.doFilter(AuthenticationFilter.java:106) at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:202) at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1635) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:527) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:131) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:598) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:223) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1580) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:221) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1381) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:176) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:484) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1553) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:174) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1303) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:129) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:149) at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:51) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122) at org.eclipse.jetty.server.Server.handle(Server.java:563) at org.eclipse.jetty.server.HttpChannel$RequestDispatchable.dispatch(HttpChannel.java:1598) at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:753) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:501) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:287) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:314) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:100) at org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.java:53) at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.runTask(AdaptiveExecutionStrategy.java:421) at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.consumeTask(AdaptiveExecutionStrategy.java:390) at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:277) at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.run(AdaptiveExecutionStrategy.java:199) at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:411) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:969) at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.java:1194) at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1149) at java.base/java.lang.Thread.run(Thread.java:840)
We have this
at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
at com.linkedin.datahub.upgrade.restoreindices.SendMAEStep.iterateFutures(SendMAEStep.java:75)
at com.linkedin.datahub.upgrade.restoreindices.SendMAEStep.lambda$executable$0(SendMAEStep.java:198)
at com.linkedin.datahub.upgrade.impl.DefaultUpgradeManager.executeStepInternal(DefaultUpgradeManager.java:125)
at com.linkedin.datahub.upgrade.impl.DefaultUpgradeManager.executeInternal(DefaultUpgradeManager.java:76)
Caused by: javax.persistence.PersistenceException: java.sql.SQLException: Unsuccessfully waited [1000] millis for a connection to be returned. No connections are free. You need to Increase the max connections of [50] or look for a connection pool leak using datasource.xxx.capturestacktrace=true
at io.ebeaninternal.server.transaction.TransactionFactoryBasic.createTransaction(TransactionFactoryBasic.java:49)
at io.ebeaninternal.server.transaction.TransactionManager.createTransaction(TransactionManager.java:330)
at io.ebeaninternal.server.transaction.TransactionManager.createTransaction(TransactionManager.java:322)
at io.ebeaninternal.server.transaction.TransactionManager.beginScopedTransaction(TransactionManager.java:623)
at io.ebeaninternal.server.core.DefaultServer.beginTransaction(DefaultServer.java:791)
at com.linkedin.metadata.entity.ebean.EbeanAspectDao.runInTransactionWithRetryUnlocked(EbeanAspectDao.java:724)
... 21 common frames omitted
Caused by: java.sql.SQLException: Unsuccessfully waited [1000] millis for a connection to be returned. No connections are free. You need to Increase the max connections of [50] or look for a connection pool leak using datasource.xxx.capturestacktrace=true
at io.ebean.datasource.pool.PooledConnectionQueue._getPooledConnectionWaitLoop(PooledConnectionQueue.java:272)
at io.ebean.datasource.pool.PooledConnectionQueue._getPooledConnection(PooledConnectionQueue.java:249)
at io.ebean.datasource.pool.PooledConnectionQueue.getPooledConnection(PooledConnectionQueue.java:196)
at io.ebean.datasource.pool.ConnectionPool.getPooledConnection(ConnectionPool.java:715)
at io.ebean.datasource.pool.ConnectionPool.getConnection(ConnectionPool.java:705)
at io.ebeaninternal.server.transaction.TransactionFactoryBasic.createTransaction(TransactionFactoryBasic.java:42)
... 26 common frames omitted```
Yeah so the NullPointerException
is just a side-effect of the pool exhaustion as you can see from this ^ last exception
ok found connection exhaustion![attachment]({‘ID’: ‘F0719D9UKHT’, ‘EDITABLE’: False, ‘IS_EXTERNAL’: False, ‘USER_ID’: ‘U06FXLKSNHJ’, ‘CREATED’: ‘2024-04-30 15:42:44+00:00’, ‘PERMALINK’: ‘Slack’, ‘EXTERNAL_TYPE’: ‘’, ‘TIMESTAMPS’: ‘2024-04-30 15:42:44+00:00’, ‘MODE’: ‘hosted’, ‘DISPLAY_AS_BOT’: False, ‘PRETTY_TYPE’: ‘PNG’, ‘NAME’: ‘image.png’, ‘IS_PUBLIC’: True, ‘PREVIEW_HIGHLIGHT’: None, ‘MIMETYPE’: ‘image/png’, ‘PERMALINK_PUBLIC’: ‘https://slack-files.com/TUMKD5EGJ-F0719D9UKHT-e4e3b5c507’, ‘FILETYPE’: ‘png’, ‘EDIT_LINK’: None, ‘URL_PRIVATE’: ‘Slack’, ‘HAS_RICH_PREVIEW’: False, ‘TITLE’: ‘image.png’, ‘IS_STARRED’: False, ‘PREVIEW_IS_TRUNCATED’: None, ‘URL_PRIVATE_DOWNLOAD’: ‘Slack’, ‘PREVIEW’: None, ‘PUBLIC_URL_SHARED’: False, ‘MESSAGE_TS’: ‘1714491769.494729’, ‘PARENT_MESSAGE_TS’: ‘1714461624.761899’, ‘MESSAGE_CHANNEL_ID’: ‘C029A3M079U’, ‘_FIVETRAN_DELETED’: False, ‘LINES_MORE’: None, ‘LINES’: None, ‘SIZE’: 40012, ‘_FIVETRAN_SYNCED’: ‘2024-05-05 12:59:46.009000+00:00’})
Suggestion here is to increase batchDelayMs
to 1 second or higher. Ref: https://github.com/acryldata/datahub-helm/blob/89c92c8ac73b4dc371d647216d60dff28cc7c9ae/charts/datahub/values.yaml#L280
ok great. let me try that.
(Increasing the size of the connection pool could help too, but not sure if we have helm chart value for this)
ok. I can try with delay first. but good to know this connection issue. In future I can watch out too.
<@U05SKM6KGGK> adjusted batchDelayMs to 1 second and then 2.5 seconds too but still same issue both times. Looks like after 49000 records this issue happens continuously.![attachment]({‘ID’: ‘F0716816NAJ’, ‘EDITABLE’: False, ‘IS_EXTERNAL’: False, ‘USER_ID’: ‘U06FXLKSNHJ’, ‘CREATED’: ‘2024-04-30 16:51:21+00:00’, ‘PERMALINK’: ‘Slack’, ‘EXTERNAL_TYPE’: ‘’, ‘TIMESTAMPS’: ‘2024-04-30 16:51:21+00:00’, ‘MODE’: ‘hosted’, ‘DISPLAY_AS_BOT’: False, ‘PRETTY_TYPE’: ‘PNG’, ‘NAME’: ‘image.png’, ‘IS_PUBLIC’: True, ‘PREVIEW_HIGHLIGHT’: None, ‘MIMETYPE’: ‘image/png’, ‘PERMALINK_PUBLIC’: ‘https://slack-files.com/TUMKD5EGJ-F0716816NAJ-712eee6f8c’, ‘FILETYPE’: ‘png’, ‘EDIT_LINK’: None, ‘URL_PRIVATE’: ‘Slack’, ‘HAS_RICH_PREVIEW’: False, ‘TITLE’: ‘image.png’, ‘IS_STARRED’: False, ‘PREVIEW_IS_TRUNCATED’: None, ‘URL_PRIVATE_DOWNLOAD’: ‘Slack’, ‘PREVIEW’: None, ‘PUBLIC_URL_SHARED’: False, ‘MESSAGE_TS’: ‘1714495898.647019’, ‘PARENT_MESSAGE_TS’: ‘1714461624.761899’, ‘MESSAGE_CHANNEL_ID’: ‘C029A3M079U’, ‘_FIVETRAN_DELETED’: False, ‘LINES_MORE’: None, ‘LINES’: None, ‘SIZE’: 226300, ‘_FIVETRAN_SYNCED’: ‘2024-05-05 12:59:46.167000+00:00’})
try going to 10 seconds
<@U05SKM6KGGK> Still same. Is there a way to change connection pool manually anywhere ?
I manually increase this in restore indices job. EBEAN_MAX_CONNECTIONS to 250 but it reaches way beyond it.![attachment]({‘ID’: ‘F071GGP2G5A’, ‘EDITABLE’: False, ‘IS_EXTERNAL’: False, ‘USER_ID’: ‘U06FXLKSNHJ’, ‘CREATED’: ‘2024-04-30 19:53:48+00:00’, ‘PERMALINK’: ‘Slack’, ‘EXTERNAL_TYPE’: ‘’, ‘TIMESTAMPS’: ‘2024-04-30 19:53:48+00:00’, ‘MODE’: ‘hosted’, ‘DISPLAY_AS_BOT’: False, ‘PRETTY_TYPE’: ‘PNG’, ‘NAME’: ‘image.png’, ‘IS_PUBLIC’: True, ‘PREVIEW_HIGHLIGHT’: None, ‘MIMETYPE’: ‘image/png’, ‘PERMALINK_PUBLIC’: ‘https://slack-files.com/TUMKD5EGJ-F071GGP2G5A-4b2be372b3’, ‘FILETYPE’: ‘png’, ‘EDIT_LINK’: None, ‘URL_PRIVATE’: ‘Slack’, ‘HAS_RICH_PREVIEW’: False, ‘TITLE’: ‘image.png’, ‘IS_STARRED’: False, ‘PREVIEW_IS_TRUNCATED’: None, ‘URL_PRIVATE_DOWNLOAD’: ‘Slack’, ‘PREVIEW’: None, ‘PUBLIC_URL_SHARED’: False, ‘MESSAGE_TS’: ‘1714506896.937689’, ‘PARENT_MESSAGE_TS’: ‘1714461624.761899’, ‘MESSAGE_CHANNEL_ID’: ‘C029A3M079U’, ‘_FIVETRAN_DELETED’: False, ‘LINES_MORE’: None, ‘LINES’: None, ‘SIZE’: 38481, ‘_FIVETRAN_SYNCED’: ‘2024-05-05 12:59:46.262000+00:00’})
idle connections never gets closed.![attachment]({‘ID’: ‘F071JGL2XEF’, ‘EDITABLE’: False, ‘IS_EXTERNAL’: False, ‘USER_ID’: ‘U06FXLKSNHJ’, ‘CREATED’: ‘2024-05-01 16:46:29+00:00’, ‘PERMALINK’: ‘Slack’, ‘EXTERNAL_TYPE’: ‘’, ‘TIMESTAMPS’: ‘2024-05-01 16:46:29+00:00’, ‘MODE’: ‘hosted’, ‘DISPLAY_AS_BOT’: False, ‘PRETTY_TYPE’: ‘PNG’, ‘NAME’: ‘image.png’, ‘IS_PUBLIC’: True, ‘PREVIEW_HIGHLIGHT’: None, ‘MIMETYPE’: ‘image/png’, ‘PERMALINK_PUBLIC’: ‘https://slack-files.com/TUMKD5EGJ-F071JGL2XEF-b06a959258’, ‘FILETYPE’: ‘png’, ‘EDIT_LINK’: None, ‘URL_PRIVATE’: ‘Slack’, ‘HAS_RICH_PREVIEW’: False, ‘TITLE’: ‘image.png’, ‘IS_STARRED’: False, ‘PREVIEW_IS_TRUNCATED’: None, ‘URL_PRIVATE_DOWNLOAD’: ‘Slack’, ‘PREVIEW’: None, ‘PUBLIC_URL_SHARED’: False, ‘MESSAGE_TS’: ‘1714581996.092929’, ‘PARENT_MESSAGE_TS’: ‘1714461624.761899’, ‘MESSAGE_CHANNEL_ID’: ‘C029A3M079U’, ‘_FIVETRAN_DELETED’: False, ‘LINES_MORE’: None, ‘LINES’: None, ‘SIZE’: 99478, ‘_FIVETRAN_SYNCED’: ‘2024-05-05 12:59:46.288000+00:00’})
we are running with
batchDelayMs: 250```
and can also confirm the connection pool issue
increasing batchDelayMs
does not really help. Now we have a lot more active DB connections but if fails the same way like before