Hi all, still stuck upgrading to 0.12.0. It seems to be a chicken-egg problem where the datahub upgrade job fails being unable to connect to GMS and GMS won’t work until upgraded. Thoughts on what else I can try?
Hey there! Make sure your message includes the following information if relevant, so we can help more effectively!
- Which DataHub version are you using? (e.g. 0.12.0)
- Please post any relevant error logs on the thread!
more logs with context
2023-11-20 22:56:31,121 [main] INFO c.l.d.u.impl.DefaultUpgradeReport:16 - Cleanup has not been requested.
2023-11-20 22:56:31,121 [main] INFO c.l.d.u.impl.DefaultUpgradeReport:16 - Skipping Step 1/6: RemoveAspectV2TableStep...
2023-11-20 22:56:31,121 [main] INFO c.l.d.u.impl.DefaultUpgradeReport:16 - Executing Step 2/6: GMSQualificationStep...
ANTLR Tool version 4.5 used for code generation does not match the current runtime version 4.8ANTLR Runtime version 4.5 used for parser compilation does not match the current runtime version 4.8ANTLR Tool version 4.5 used for code generation does not match the current runtime version 4.8ANTLR Runtime version 4.5 used for parser compilation does not match the current runtime version 4.8java.net.ConnectException: Connection refused (Connection refused)
at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:412)
at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:255)
at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:237)
at java.base/java.net.Socket.connect(Socket.java:609)
at java.base/java.net.Socket.connect(Socket.java:558)
at java.base/sun.net.NetworkClient.doConnect(NetworkClient.java:182)
at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:509)
at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:604)
at java.base/sun.net.www.http.HttpClient.<init>(HttpClient.java:277)
at java.base/sun.net.www.http.HttpClient.New(HttpClient.java:376)
at java.base/sun.net.www.http.HttpClient.New(HttpClient.java:397)
at java.base/sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1253)
at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1187)
at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1081)
at java.base/sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:1015)
at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1592)
at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1520)
at com.linkedin.datahub.upgrade.common.steps.GMSQualificationStep.lambda$executable$0(GMSQualificationStep.java:80)
at com.linkedin.datahub.upgrade.impl.DefaultUpgradeManager.executeStepInternal(DefaultUpgradeManager.java:110)
at com.linkedin.datahub.upgrade.impl.DefaultUpgradeManager.executeInternal(DefaultUpgradeManager.java:68)
at com.linkedin.datahub.upgrade.impl.DefaultUpgradeManager.executeInternal(DefaultUpgradeManager.java:42)
at com.linkedin.datahub.upgrade.impl.DefaultUpgradeManager.execute(DefaultUpgradeManager.java:33)
at com.linkedin.datahub.upgrade.UpgradeCli.run(UpgradeCli.java:80)
at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:768)
at org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:752)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:314)
at org.springframework.boot.builder.SpringApplicationBuilder.run(SpringApplicationBuilder.java:164)
at com.linkedin.datahub.upgrade.UpgradeCliApplication.main(UpgradeCliApplication.java:23)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:49)
at org.springframework.boot.loader.Launcher.launch(Launcher.java:108)
at org.springframework.boot.loader.Launcher.launch(Launcher.java:58)
at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:65)
[ ..... SNIP ...... ] 2 retries truncated
2023-11-20 22:56:34,308 [main] INFO c.l.d.u.impl.DefaultUpgradeReport:16 - ERROR: Cannot connect to GMSat <http://host> datahub-datahub-gms port 8080. Make sure GMS is on the latest version and is running at that host before starting the migration.
2023-11-20 22:56:34,308 [main] INFO c.l.d.u.impl.DefaultUpgradeReport:16 - Failed Step 2/6: GMSQualificationStep. Failed after 2 retries.
2023-11-20 22:56:34,308 [main] INFO c.l.d.u.impl.DefaultUpgradeReport:16 - Exiting upgrade NoCodeDataMigration with failure.
2023-11-20 22:56:34,309 [main] INFO c.l.d.u.impl.DefaultUpgradeReport:16 - Upgrade NoCodeDataMigration completed with result FAILED. Exiting...```
<@U03MF8MU5P0> could you look into this? Thanks!
The system-update log you’ve shared is not the required upgrade but a post-gms start upgrade step called
````NoCodeDataMigration````
There should be a different argument for the job (SystemUpdate
), see the helm chart here https://github.com/acryldata/datahub-helm/blob/master/charts/datahub/templates/datahub-upgrade/datahub-system-update-job.yml#L62
Similarly in the various docker compose files it uses this argument as well, for example: https://github.com/datahub-project/datahub/blob/master/docker/quickstart/docker-compose-without-neo4j-m1.quickstart.yml#L120
Thanks <@U03MF8MU5P0> for clearing up my misunderstanding, but I still don’t understand why the systemUpdate
job is apparently not running then, or if it is, where I can find its logs in ArgoCD. I can see that datahub.systemUpdate.enabled
is true by default and our config has no override for it. I also note that the comment in the code suggest this merely configures the behaviour of the datahub-upgrade job.
Based on the screenshot there should be a job called datahub-datahub-system-update
If the system-update
job has run, then this upgrade-job
is due to an error with the other pod datahub-datahub-gms
. Can you share the logs from a datahub-datahub-gms
pod?
<@U03MF8MU5P0> I’m not sure how to get the full log to you but it seems to start going off the rails at this point in the log:
And then just a bit further on
Those are from the upgrade? They look like gms logs. Please share the full log from the start to the first error. Thanks!
Yes those were the gms logs, Ok, I’ve sent the full log download from both the GMS and upgrade job in a private message.
Please send the system-update
logs, not the upgrade
logs. GMS indicates it is waiting for the system-update
job. The logs you’ve shared do not include this job. GMS is at this state: Executing bootstrap step 1/1 with name WaitForSystemUpdateStep..
- Thanks!
Hi <@U03MF8MU5P0> as I have said above, there is no system-update
job. Can you think of any reason why? https://datahubspace.slack.com/archives/C029A3M079U/p1701222814802479?thread_ts=1700708811.149289&cid=C029A3M079U
Is it possible you’re missing the global
part? global.datahub.systemUpdate.enabled
per the helm chart template https://github.com/acryldata/datahub-helm/blob/master/charts/datahub/templates/datahub-upgrade/datahub-system-update-job.yml#L1C6-L1C6|here?
Missing it in what sense <@U03MF8MU5P0> ?
As far as I can tell, our values file is not overwriting anything related to the update job from the chart template.