Troubleshooting Errors in Custom Datahub Installation

Original Slack Thread

Hello Dear Community,

I’m doing custom Datahub installation and my kafka-setup, elastic-setup and postgres-setup containers executed successfully, however datahub-update and datahub-gms give me some errors:

datahub-update:
ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to log to the console...ANTLR Tool version 4.5 used for code generation does not match the current runtime version 4.7.2ANTLR Runtime version 4.5 used for parser compilation does not match the current runtime version 4.7.2ANTLR Tool version 4.5 used for code generation does not match the current runtime version 4.7.2ANTLR Runtime version 4.5 used for parser compilation does not match the current runtime version 4.7.2ERROR SpringApplication Application run failed
java.lang.IllegalStateException: Failed to execute CommandLineRunner
at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:771)
at org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:752)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:314)
at org.springframework.boot.builder.SpringApplicationBuilder.run(SpringApplicationBuilder.java:164)
at com.linkedin.datahub.upgrade.UpgradeCliApplication.main(UpgradeCliApplication.java:23)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:49)
at org.springframework.boot.loader.Launcher.launch(Launcher.java:108)
at org.springframework.boot.loader.Launcher.launch(Launcher.java:58)
at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:65)
Caused by: java.lang.NullPointerException
at com.linkedin.datahub.upgrade.UpgradeCli.run(UpgradeCli.java:80)
at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:768)
... 12 more

While datahub-gms gives me error:
[pool-10-thread-1] ERROR c.l.m.s.e.query.ESSearchDAO:91 - Search query failed
[pool-10-thread-1] ERROR c.d.authorization.DataHubAuthorizer:230 - Failed to retrieve policy urns! Skipping updating policy cache until next refresh. start: 0, count: 30
[pool-14-thread-1] ERROR c.l.m.boot.OnBootApplicationListener:76 - Failed to bootstrap DataHub, OpenAPI servlet was not ready after 30 seconds
[pool-10-thread-1] ERROR c.l.m.s.e.query.ESSearchDAO:91 - Search query failed
[pool-10-thread-1] ERROR c.d.authorization.DataHubAuthorizer:230 - Failed to retrieve policy urns! Skipping updating policy cache until next refresh. start: 0, count: 30
[pool-14-thread-1] ERROR c.l.m.boot.OnBootApplicationListener:76 - Failed to bootstrap DataHub, OpenAPI servlet was not ready after 30 seconds
[pool-10-thread-1] ERROR c.l.m.s.e.query.ESSearchDAO:91 - Search query failed
[pool-10-thread-1] ERROR c.d.authorization.DataHubAuthorizer:230 - Failed to retrieve policy urns! Skipping updating policy cache until next refresh. start: 0, count: 30
[pool-14-thread-1] ERROR c.l.m.boot.OnBootApplicationListener:76 - Failed to bootstrap DataHub, OpenAPI servlet was not ready after 30 seconds
[pool-10-thread-1] ERROR c.l.m.s.e.query.ESSearchDAO:91 - Search query failed
[pool-10-thread-1] ERROR c.d.authorization.DataHubAuthorizer:230 - Failed to retrieve policy urns! Skipping updating policy cache until next refresh. start: 0, count: 30
[pool-14-thread-1] ERROR c.l.m.boot.OnBootApplicationListener:76 - Failed to bootstrap DataHub, OpenAPI servlet was not ready after 30 seconds

Any idea what the problem can be? I have searched couple of other threads but usually people were using quickstart and docker compose. I am using podman and deploying containers one by one manually. I’m deploying version 0.10.5 from acryl

<@U03MF8MU5P0> <@U01GCJKA8P9> <@U02J5L9JCTE> <@U04LF1XJ4LQ> <@U05AW4DVBAA> do you have any clue? I think you were working on similar issues

Hello <@U05QWP29CUV>, could you elaborate a little more? Are you deploying based on a docker compose file? If yes, which one? Are you ensuring that all images are using the same TAG? Are you using the KAFKA schema registry or the INTERNAL schema registry? Have you checked whether the environment variables of the containers you are uploading match the environment variables of the docker compose files used by the community?

Hey <@U05AW4DVBAA> many thanks for your post, I can not use docker-compose in my envirionment so I took quickstart yml file and try to reproduce it step by step.

Firstly I have deployed:
• postgres-setup
• kafka-setup
• elasticsearch-setup
• neo4j
All have executed successfully
However when running datahub-upgrade I encounter the following error:
[pool-11-thread-1] ERROR c.l.m.s.e.query.ESSearchDAO:91 - Search query failed
java.lang.RuntimeException: Request cannot be executed; I/O reactor status: STOPPED
at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:887)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:283)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:270)
at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1632)
at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1602)
at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1572)
at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:1088)
at com.linkedin.metadata.search.elasticsearch.query.ESSearchDAO.executeAndExtract(ESSearchDAO.java:85)
at com.linkedin.metadata.search.elasticsearch.query.ESSearchDAO.search(ESSearchDAO.java:189)
at com.linkedin.metadata.search.elasticsearch.ElasticSearchService.search(ElasticSearchService.java:120)
at com.linkedin.metadata.search.elasticsearch.ElasticSearchService.search(ElasticSearchService.java:111)
at com.linkedin.metadata.client.JavaEntityClient.search(JavaEntityClient.java:332)
at com.datahub.authorization.PolicyFetcher.fetchPolicies(PolicyFetcher.java:51)
at com.datahub.authorization.PolicyFetcher.fetchPolicies(PolicyFetcher.java:43)
at com.datahub.authorization.DataHubAuthorizer$PolicyRefreshRunnable.run(DataHubAuthorizer.java:223)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.IllegalStateException: Request cannot be executed; I/O reactor status: STOPPED
at org.apache.http.util.Asserts.check(Asserts.java:46)
at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase.ensureRunning(CloseableHttpAsyncClientBase.java:90)
at org.apache.http.impl.nio.client.InternalHttpAsyncClient.execute(InternalHttpAsyncClient.java:123)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:279)
... 19 common frames omitted

I use KAFKA schema registry.
All images are under tag 0.10.5.
My environmental variables are coherent and correct but not identical as in docker-compose (for example instead of ‘datahub’ database user user I have created a name that is in line with our internal naming convention)

Did you deployed all prerequisites containers?
• Elasticsearch
• MySQL/PotsgreSQL …
• Kafka
• Zookeeper
• cp-schema-registry

The error is definitely from connecting to elasticsearch. Please check the health of that container and the configuration used by the upgrade job to connect to it.

Hey <@U05AW4DVBAA>, <@U03MF8MU5P0> many thanks for your messages.
Correct I have installed Elasticsearch, Postgres, Kafka, Zookeeper and schema registry.

Regarding the ElasticSearch connectivity that looks like this, but I am really puzzled, because when I do curl from the container, it can connect to Elastic API, my elastic-setup container did run successfully as well.

Can you share the entire upgrade job log? Probably the cause is eariler.

Thanks <@U03MF8MU5P0>, please find the full log attached:

Hey <@U05AW4DVBAA>, <@U01GZEETMEZ> maybe you have any idea as well?

Just to remind my installation is custom, because my Kafka and Elastic are already running and serving other apps. I want to compose datahub into current architecture.attachment

Definitely not connecting to your kafka infrastructure. Are you actually using a broker called my_kafka_host_server:9094 ? The schema registry and broker are typically different services. In the log both are configured with that same values.

Hello <@U03MF8MU5P0> Yes, correct: my Kafka
Broker runs under:
my_kafka_host_server:9094
And my schema registry is under:
my_kafka_host_server:9081

Depending on your kafka instance you might have something misconfigured for your kafka instance. Perhaps related to the authentication or encryption options. If for example DataHub is not configured for the right security protocol, configuration https://datahubproject.io/docs/how/kafka-config/|docs, the kafka clients can receive disconnect messages similar to those in your log, <Kafka-connect, Bootstrap broker disconnected - Stack Overflow reference>.

Hello <@U03MF8MU5P0>, <@U05AW4DVBAA> many thanks for your help, I have created new kafka cluster without ssl on the same server, but I’m still facing some errors (log attached). Is this still kafka connection problem?

(kafka, schemaregistry, postgres, elastic are up & running)

Here is my config deploying kafka and schema registry (as before I as using already existing instances on other servers):
ansible my_datahub_host --become-user datahub -b -m shell -a '{% raw %} podman run --name schemaregistry -d \
--env "SCHEMA_REGISTRY_HOST_NAME=schemaregistry" \
--env "SCHEMA_REGISTRY_KAFKASTORE_SECURITY_PROTOCOL=PLAINTEXT" \
--env "SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS=<PLAINTEXT://my_datahub_host:9092>" \
--env "SCHEMA_REGISTRY_HOST_NAME: my_datahub_host" \
--env "container=oci" \
--env "COMPONENT=schema-registry" \
--env "SCHEMA_REGISTRY_INTER_INSTANCE_PROTOCOL=http" \
--env "HOSTNAME=my_datahub_host" \
--env "SCHEMA_REGISTRY_LOG4J_ROOT_LOGLEVEL=DEBUG" \
--env "SCHEMA_REGISTRY_LISTENERS=<http://my_datahub_host:8081>" \
-p 8081:8081 \
<http://docker.io/confluentinc/cp-schema-registry:7.2.2|docker.io/confluentinc/cp-schema-registry:7.2.2>
{% endraw %}'

ansible my_datahub_host --become-user datahub -b -m shell -a '{% raw %} podman run --name broker -d \
--env "KAFKA_BROKER_ID=1" \
--env "KAFKA_ZOOKEEPER_CONNECT=my_datahub_host:2181" \
--env "KAFKA_LISTENER_SECURITY_PROTOCOL_MAP=PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT" \
--env "KAFKA_LISTENERS=PLAINTEXT://:29092,PLAINTEXT_HOST://:9092" \
--env "KAFKA_ADVERTISED_LISTENERS=<PLAINTEXT://my_datahub_host:29092>,PLAINTEXT_<HOST://my_datahub_host:9092>" \
--env "KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1" \
--env "KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS=0" \
--env "KAFKA_HEAP_OPTS=-Xms256m -Xmx256m" \
--env "KAFKA_CONFLUENT_SUPPORT_METRICS_ENABLE=false" \
--env "HOSTNAME=my_datahub_host" \
-p 9092:9092 \
-p 29092:29092 \
<http://docker.io/confluentinc/cp-kafka:7.2.2|docker.io/confluentinc/cp-kafka:7.2.2> \
{% endraw %}'attachment

The errors now indicate incorrect or missing environment variables for GMS. I.e. DATAHUB_GMS_HOST and/or DATAHUB_GMS_PORT. Is the configured hostname datahub-datahub-gms resolvable?

The port is 8080

<@U03MF8MU5P0> I have changed dns to IP:8080 and hopefully I have moved on, however I still have issues attached (configs from upgrade and gms containers + my container configuration).

Elastic, kafka postgres are up and running and setup scripts were executed successful so I really struggle to find out what is the problem…attachmentattachmentattachmentattachment

The datahub_upgrade job needs to be run with the environment variables and also the arguments like in the helm template. See the helm chart https://github.com/acryldata/datahub-helm/blob/master/charts/datahub/templates/datahub-upgrade/datahub-system-update-job.yml#L60|here. This same container performs multiple functions and the cli arguments select which mode.

that was it

thanks!