Troubleshooting Errors in Custom Datahub Installation

user-2 · March 4, 2024, 5:42pm

Hello Dear Community,

I’m doing custom Datahub installation and my kafka-setup, elastic-setup and postgres-setup containers executed successfully, however datahub-update and datahub-gms give me some errors:

datahub-update:
ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to log to the console...ANTLR Tool version 4.5 used for code generation does not match the current runtime version 4.7.2ANTLR Runtime version 4.5 used for parser compilation does not match the current runtime version 4.7.2ANTLR Tool version 4.5 used for code generation does not match the current runtime version 4.7.2ANTLR Runtime version 4.5 used for parser compilation does not match the current runtime version 4.7.2ERROR SpringApplication Application run failed
java.lang.IllegalStateException: Failed to execute CommandLineRunner
at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:771)
at org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:752)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:314)
at org.springframework.boot.builder.SpringApplicationBuilder.run(SpringApplicationBuilder.java:164)
at com.linkedin.datahub.upgrade.UpgradeCliApplication.main(UpgradeCliApplication.java:23)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:49)
at org.springframework.boot.loader.Launcher.launch(Launcher.java:108)
at org.springframework.boot.loader.Launcher.launch(Launcher.java:58)
at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:65)
Caused by: java.lang.NullPointerException
at com.linkedin.datahub.upgrade.UpgradeCli.run(UpgradeCli.java:80)
at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:768)
... 12 more

While datahub-gms gives me error:
[pool-10-thread-1] ERROR c.l.m.s.e.query.ESSearchDAO:91 - Search query failed
[pool-10-thread-1] ERROR c.d.authorization.DataHubAuthorizer:230 - Failed to retrieve policy urns! Skipping updating policy cache until next refresh. start: 0, count: 30
[pool-14-thread-1] ERROR c.l.m.boot.OnBootApplicationListener:76 - Failed to bootstrap DataHub, OpenAPI servlet was not ready after 30 seconds
[pool-10-thread-1] ERROR c.l.m.s.e.query.ESSearchDAO:91 - Search query failed
[pool-10-thread-1] ERROR c.d.authorization.DataHubAuthorizer:230 - Failed to retrieve policy urns! Skipping updating policy cache until next refresh. start: 0, count: 30
[pool-14-thread-1] ERROR c.l.m.boot.OnBootApplicationListener:76 - Failed to bootstrap DataHub, OpenAPI servlet was not ready after 30 seconds
[pool-10-thread-1] ERROR c.l.m.s.e.query.ESSearchDAO:91 - Search query failed
[pool-10-thread-1] ERROR c.d.authorization.DataHubAuthorizer:230 - Failed to retrieve policy urns! Skipping updating policy cache until next refresh. start: 0, count: 30
[pool-14-thread-1] ERROR c.l.m.boot.OnBootApplicationListener:76 - Failed to bootstrap DataHub, OpenAPI servlet was not ready after 30 seconds
[pool-10-thread-1] ERROR c.l.m.s.e.query.ESSearchDAO:91 - Search query failed
[pool-10-thread-1] ERROR c.d.authorization.DataHubAuthorizer:230 - Failed to retrieve policy urns! Skipping updating policy cache until next refresh. start: 0, count: 30
[pool-14-thread-1] ERROR c.l.m.boot.OnBootApplicationListener:76 - Failed to bootstrap DataHub, OpenAPI servlet was not ready after 30 seconds

Any idea what the problem can be? I have searched couple of other threads but usually people were using quickstart and docker compose. I am using podman and deploying containers one by one manually. I’m deploying version 0.10.5 from acryl

user-2 · March 4, 2024, 5:42pm

<@U03MF8MU5P0> <@U01GCJKA8P9> <@U02J5L9JCTE> <@U04LF1XJ4LQ> <@U05AW4DVBAA> do you have any clue? I think you were working on similar issues

user-3 · March 4, 2024, 5:42pm

Hello <@U05QWP29CUV>, could you elaborate a little more? Are you deploying based on a docker compose file? If yes, which one? Are you ensuring that all images are using the same TAG? Are you using the KAFKA schema registry or the INTERNAL schema registry? Have you checked whether the environment variables of the containers you are uploading match the environment variables of the docker compose files used by the community?

user-2 · March 4, 2024, 5:42pm

Hey <@U05AW4DVBAA> many thanks for your post, I can not use docker-compose in my envirionment so I took quickstart yml file and try to reproduce it step by step.

Firstly I have deployed:
• postgres-setup
• kafka-setup
• elasticsearch-setup
• neo4j
All have executed successfully
However when running datahub-upgrade I encounter the following error:
[pool-11-thread-1] ERROR c.l.m.s.e.query.ESSearchDAO:91 - Search query failed
java.lang.RuntimeException: Request cannot be executed; I/O reactor status: STOPPED
at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:887)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:283)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:270)
at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1632)
at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1602)
at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1572)
at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:1088)
at com.linkedin.metadata.search.elasticsearch.query.ESSearchDAO.executeAndExtract(ESSearchDAO.java:85)
at com.linkedin.metadata.search.elasticsearch.query.ESSearchDAO.search(ESSearchDAO.java:189)
at com.linkedin.metadata.search.elasticsearch.ElasticSearchService.search(ElasticSearchService.java:120)
at com.linkedin.metadata.search.elasticsearch.ElasticSearchService.search(ElasticSearchService.java:111)
at com.linkedin.metadata.client.JavaEntityClient.search(JavaEntityClient.java:332)
at com.datahub.authorization.PolicyFetcher.fetchPolicies(PolicyFetcher.java:51)
at com.datahub.authorization.PolicyFetcher.fetchPolicies(PolicyFetcher.java:43)
at com.datahub.authorization.DataHubAuthorizer$PolicyRefreshRunnable.run(DataHubAuthorizer.java:223)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.IllegalStateException: Request cannot be executed; I/O reactor status: STOPPED
at org.apache.http.util.Asserts.check(Asserts.java:46)
at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase.ensureRunning(CloseableHttpAsyncClientBase.java:90)
at org.apache.http.impl.nio.client.InternalHttpAsyncClient.execute(InternalHttpAsyncClient.java:123)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:279)
... 19 common frames omitted

I use KAFKA schema registry.
All images are under tag 0.10.5.
My environmental variables are coherent and correct but not identical as in docker-compose (for example instead of ‘datahub’ database user user I have created a name that is in line with our internal naming convention)

user-3 · March 4, 2024, 5:42pm

Did you deployed all prerequisites containers?
• Elasticsearch
• MySQL/PotsgreSQL …
• Kafka
• Zookeeper
• cp-schema-registry

datahub_team · March 4, 2024, 5:42pm

The error is definitely from connecting to elasticsearch. Please check the health of that container and the configuration used by the upgrade job to connect to it.

user-2 · March 4, 2024, 5:42pm

Hey <@U05AW4DVBAA>, <@U03MF8MU5P0> many thanks for your messages.
Correct I have installed Elasticsearch, Postgres, Kafka, Zookeeper and schema registry.

Regarding the ElasticSearch connectivity that looks like this, but I am really puzzled, because when I do curl from the container, it can connect to Elastic API, my elastic-setup container did run successfully as well.

user-4 · March 4, 2024, 5:42pm

Can you share the entire upgrade job log? Probably the cause is eariler.

user-2 · March 4, 2024, 5:42pm

Thanks <@U03MF8MU5P0>, please find the full log attached:

Hey <@U05AW4DVBAA>, <@U01GZEETMEZ> maybe you have any idea as well?

Just to remind my installation is custom, because my Kafka and Elastic are already running and serving other apps. I want to compose datahub into current architecture. attachment

datahub_team · March 4, 2024, 5:42pm

Definitely not connecting to your kafka infrastructure. Are you actually using a broker called my_kafka_host_server:9094 ? The schema registry and broker are typically different services. In the log both are configured with that same values.

user-2 · March 4, 2024, 5:42pm

Hello <@U03MF8MU5P0> Yes, correct: my Kafka
Broker runs under:
my_kafka_host_server:9094
And my schema registry is under:
my_kafka_host_server:9081

datahub_team · March 4, 2024, 5:42pm

Depending on your kafka instance you might have something misconfigured for your kafka instance. Perhaps related to the authentication or encryption options. If for example DataHub is not configured for the right security protocol, configuration https://datahubproject.io/docs/how/kafka-config/|docs, the kafka clients can receive disconnect messages similar to those in your log, <Kafka-connect, Bootstrap broker disconnected - Stack Overflow reference>.

user-2 · March 4, 2024, 5:42pm

Hello <@U03MF8MU5P0>, <@U05AW4DVBAA> many thanks for your help, I have created new kafka cluster without ssl on the same server, but I’m still facing some errors (log attached). Is this still kafka connection problem?

(kafka, schemaregistry, postgres, elastic are up & running)

Here is my config deploying kafka and schema registry (as before I as using already existing instances on other servers):
ansible my_datahub_host --become-user datahub -b -m shell -a '{% raw %} podman run --name schemaregistry -d \
--env "SCHEMA_REGISTRY_HOST_NAME=schemaregistry" \
--env "SCHEMA_REGISTRY_KAFKASTORE_SECURITY_PROTOCOL=PLAINTEXT" \
--env "SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS=<PLAINTEXT://my_datahub_host:9092>" \
--env "SCHEMA_REGISTRY_HOST_NAME: my_datahub_host" \
--env "container=oci" \
--env "COMPONENT=schema-registry" \
--env "SCHEMA_REGISTRY_INTER_INSTANCE_PROTOCOL=http" \
--env "HOSTNAME=my_datahub_host" \
--env "SCHEMA_REGISTRY_LOG4J_ROOT_LOGLEVEL=DEBUG" \
--env "SCHEMA_REGISTRY_LISTENERS=<http://my_datahub_host:8081>" \
-p 8081:8081 \
<http://docker.io/confluentinc/cp-schema-registry:7.2.2|docker.io/confluentinc/cp-schema-registry:7.2.2>
{% endraw %}'

ansible my_datahub_host --become-user datahub -b -m shell -a '{% raw %} podman run --name broker -d \
--env "KAFKA_BROKER_ID=1" \
--env "KAFKA_ZOOKEEPER_CONNECT=my_datahub_host:2181" \
--env "KAFKA_LISTENER_SECURITY_PROTOCOL_MAP=PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT" \
--env "KAFKA_LISTENERS=PLAINTEXT://:29092,PLAINTEXT_HOST://:9092" \
--env "KAFKA_ADVERTISED_LISTENERS=<PLAINTEXT://my_datahub_host:29092>,PLAINTEXT_<HOST://my_datahub_host:9092>" \
--env "KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1" \
--env "KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS=0" \
--env "KAFKA_HEAP_OPTS=-Xms256m -Xmx256m" \
--env "KAFKA_CONFLUENT_SUPPORT_METRICS_ENABLE=false" \
--env "HOSTNAME=my_datahub_host" \
-p 9092:9092 \
-p 29092:29092 \
<http://docker.io/confluentinc/cp-kafka:7.2.2|docker.io/confluentinc/cp-kafka:7.2.2> \
{% endraw %}' attachment

datahub_team · March 4, 2024, 5:42pm

The errors now indicate incorrect or missing environment variables for GMS. I.e. DATAHUB_GMS_HOST and/or DATAHUB_GMS_PORT. Is the configured hostname datahub-datahub-gms resolvable?

datahub_team · March 4, 2024, 5:42pm

The port is 8080

user-2 · March 4, 2024, 5:42pm

<@U03MF8MU5P0> I have changed dns to IP:8080 and hopefully I have moved on, however I still have issues attached (configs from upgrade and gms containers + my container configuration).

Elastic, kafka postgres are up and running and setup scripts were executed successful so I really struggle to find out what is the problem… attachment

datahub_team · March 4, 2024, 5:42pm

The datahub_upgrade job needs to be run with the environment variables and also the arguments like in the helm template. See the helm chart https://github.com/acryldata/datahub-helm/blob/master/charts/datahub/templates/datahub-upgrade/datahub-system-update-job.yml#L60|here. This same container performs multiple functions and the cli arguments select which mode.

user-2 · March 4, 2024, 5:42pm

that was it

user-2 · March 4, 2024, 5:43pm

thanks!

Topic		Replies	Views
Fixing Datahub Upgrade Errors from 0.8.45 to 0.10.4 all-things-deployment	15	139	March 4, 2024
Troubleshooting issues with datahub-gms pod and datahub upgrade all-things-deployment	2	105	March 4, 2024
Troubleshooting DataHub Upgrade to 0.12.0: Connecting GMS and Solving Job Failure troubleshoot	22	263	March 4, 2024
Error in Datahub datahub-gms POD in GKE deployment - looking for solutions all-things-deployment	1	44	March 4, 2024
Issue with Kafka Version Mismatch during Datahub Upgrade in Kubernetes Deployment with Helm Chart getting-started	6	63	March 4, 2024

Troubleshooting Errors in Custom Datahub Installation

Related topics