Troubleshooting RecordTooLargeException after Upgrading Datahub to v0.12.0

Original Slack Thread

Hello All,

After upgrading Datahub from version 0.10.2 to 0.12.0 I am getting RecordTooLargeException after running RestoreIndices from Datahub-upgrade.
Is there anything I am missing or the message size has been increased in v12?

My kafka setting has max.message.bytes 1048588 . Is there any recomended max message size limit?

Caused by: org.apache.kafka.common.errors.RecordTooLargeException: The message is 1279629 bytes when serialized which is larger than 1048576, which is the value of the max.request.size configuration.
java.util.concurrent.ExecutionException: javax.persistence.PersistenceException: Error loading on com.linkedin.metadata.entity.ebean.EbeanAspectV2.createdOn
at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
at com.linkedin.datahub.upgrade.restoreindices.SendMAEStep.iterateFutures(SendMAEStep.java:73)
at com.linkedin.datahub.upgrade.restoreindices.SendMAEStep.lambda$executable$0(SendMAEStep.java:141)
at com.linkedin.datahub.upgrade.impl.DefaultUpgradeManager.executeStepInternal(DefaultUpgradeManager.java:110)
at com.linkedin.datahub.upgrade.impl.DefaultUpgradeManager.executeInternal(DefaultUpgradeManager.java:68)
at com.linkedin.datahub.upgrade.impl.DefaultUpgradeManager.executeInternal(DefaultUpgradeManager.java:42)
at com.linkedin.datahub.upgrade.impl.DefaultUpgradeManager.execute(DefaultUpgradeManager.java:33)
at com.linkedin.datahub.upgrade.UpgradeCli.run(UpgradeCli.java:80)
at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:768)
at org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:752)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:314)
at org.springframework.boot.builder.SpringApplicationBuilder.run(SpringApplicationBuilder.java:164)
at com.linkedin.datahub.upgrade.UpgradeCliApplication.main(UpgradeCliApplication.java:23)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:49)
at org.springframework.boot.loader.Launcher.launch(Launcher.java:108)
at org.springframework.boot.loader.Launcher.launch(Launcher.java:58)
at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:65)
Caused by: javax.persistence.PersistenceException: Error loading on com.linkedin.metadata.entity.ebean.EbeanAspectV2.createdOn
at io.ebeaninternal.server.querydefn.DefaultOrmQuery.handleLoadError(DefaultOrmQuery.java:2008)
at io.ebeaninternal.server.query.CQuery.handleLoadError(CQuery.java:754)
at io.ebeaninternal.server.query.SqlBeanLoad.load(SqlBeanLoad.java:66)
at io.ebeaninternal.server.deploy.BeanProperty.load(BeanProperty.java:514)
at io.ebeaninternal.server.query.SqlTreeLoadBean$Load.loadProperties(SqlTreeLoadBean.java:213)
at io.ebeaninternal.server.query.SqlTreeLoadBean$Load.initialise(SqlTreeLoadBean.java:327)
at io.ebeaninternal.server.query.SqlTreeLoadBean$Load.perform(SqlTreeLoadBean.java:335)
at io.ebeaninternal.server.query.SqlTreeLoadBean.load(SqlTreeLoadBean.java:358)
at io.ebeaninternal.server.query.SqlTreeLoadRoot.load(SqlTreeLoadRoot.java:26)
at io.ebeaninternal.server.query.CQuery.readNextBean(CQuery.java:409)
at io.ebeaninternal.server.query.CQuery.hasNext(CQuery.java:489)
at io.ebeaninternal.server.query.CQuery.readCollection(CQuery.java:518)
at io.ebeaninternal.server.query.CQueryEngine.findMany(CQueryEngine.java:356)
at io.ebeaninternal.server.query.DefaultOrmQueryEngine.findMany(DefaultOrmQueryEngine.java:131)
at io.ebeaninternal.server.core.OrmQueryRequest.findList(OrmQueryRequest.java:404)
at io.ebeaninternal.server.core.DefaultServer.findList(DefaultServer.java:1463)
at io.ebeaninternal.server.core.DefaultServer.findList(DefaultServer.java:1442)
at io.ebeaninternal.server.query.LimitOffsetPagedList.getList(LimitOffsetPagedList.java:66)
at com.linkedin.metadata.entity.EntityServiceImpl.restoreIndices(EntityServiceImpl.java:937)
at com.linkedin.datahub.upgrade.restoreindices.SendMAEStep$KafkaJob.call(SendMAEStep.java:49)
at com.linkedin.datahub.upgrade.restoreindices.SendMAEStep$KafkaJob.call(SendMAEStep.java:40)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: javax.persistence.PersistenceException: Error readSet on com.linkedin.metadata.entity.ebean.EbeanAspectV2.createdOn
at io.ebeaninternal.server.deploy.BeanProperty.readSet(BeanProperty.java:539)
at io.ebeaninternal.server.deploy.BeanProperty.readSet(BeanProperty.java:548)
at io.ebeaninternal.server.query.SqlBeanLoad.load(SqlBeanLoad.java:63)
… 22 more
Caused by: java.lang.NullPointerException: Cannot invoke “com.mysql.cj.protocol.ColumnDefinition.getFields()” because “this.columnDefinition” is null
at com.mysql.cj.jdbc.result.ResultSetImpl.checkColumnBounds(ResultSetImpl.java:508)
at com.mysql.cj.jdbc.result.ResultSetImpl.getTimestamp(ResultSetImpl.java:977)
at io.ebeaninternal.server.type.RsetDataReader.getTimestamp(RsetDataReader.java:171)
at io.ebeaninternal.server.type.ScalarTypeTimestamp.read(ScalarTypeTimestamp.java:70)
at io.ebeaninternal.server.type.ScalarTypeTimestamp.read(ScalarTypeTimestamp.java:18)
at io.ebeaninternal.server.deploy.BeanProperty.readSet(BeanProperty.java:533)
… 24 more

<@U05SKM6KGGK> might be able to help on this one!

By default, Kafka has a default limit of 1MB per message in the topic.
maybe this blog can help you resolve the issue
https://www.conduktor.io/kafka/how-to-send-large-messages-in-apache-kafka/

if your deployment with Datahub Helm chart
you set this param KAFKA_PRODUCER_MAX_REQUEST_SIZE value higher (like 2MB or %MB)

My team ran into the same issue and were unable to get past the problem despite trying a ton of different env var settings. I’m currently waiting for 0.12.1 to come out before getting back into it in hopes that <https://github.com/datahub-project/datahub/pull/9038|this change> resolves our problem

we are also awaiting for v0.12.1 , just wondering when it will be live ?

It went live late last week!

I started our upgrade this morning and got past the Kafka errors that we were encountering earlier. Looks like #9038 resolved the problem

sorry i may be wrong here , but i don’t see it in release history .

It’s in a pre-release state right now

But the docker images are available for download under tag v0.12.1

ah ok , got it , i m waiting for final release :slightly_smiling_face: .

Fair enough. We have a dev DataHub environment so I wasn’t too worried about grabbing a pre-release