-
Notifications
You must be signed in to change notification settings - Fork 271
Description
Java API client version
9.0.4
Java version
21
Elasticsearch Version
9.0.4
Problem description
Hello,
We are trying to diagnose an issue which seems to have only started to come up with the new transport. This looks similar to #1003, but it's with the httpclient5 transport.
We have an application which makes various periodic queries, some more involved than others, and phases of intense indexing work. This app has been using Elasticsearch since 2.x era, and has used every client along the way: TransportClient, HLRC, "new" Java client with elasticsearch-rest-client, and now the client with HTTPClient5 transport. The structure and query patterns of the app have not changed in any notable way over this period.
This past weekend, the app was deployed for the first time with the HTTPClient5 transport to two different machines, one with 10 nodes on a low-cpu-constrained client environment without TLS or users, and another with a cpu-constrained environment on a single node (same machine) with TLS and users. The nodes all run on the dockerized version of ES 9.0.4.
Both environments went through an index migration from our side, using the bulk ingester class, and then resumed the normal activities of occasional queries and occasional indexing requests.
The constrained environment has now twice this weekend gotten into a state where queries cannot be run, and there are log entries that I/O reactor has been shut down
every time the client tries to operate, and nothing brings it back except restarts / reinitializations of the client code.
Some of our code is more funky regarding the CompleteableFuture
patterns, but even simple code like this can bring it out once it's in this state:
final String finalIndexPattern = "myIndexv*";
final int INDEX_OPERATION_DEFAULT_TIMEOUT_SEC = 45;
List<IndicesRecord> r = client.cat().indices(t -> t.index(finalIndexPattern)).get(INDEX_OPERATION_DEFAULT_TIMEOUT_SEC, TimeUnit.SECONDS).indices();
An example of the trace:
org.apache.hc.core5.reactor.IOReactorShutdownException: I/O reactor has been shut down
...
Caused by: java.util.concurrent.ExecutionException: org.apache.hc.core5.reactor.IOReactorShutdownException: I/O reactor has been shut down
at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2096)
at <snip - private code>
... 106 more
Caused by: org.apache.hc.core5.reactor.IOReactorShutdownException: I/O reactor has been shut down
at org.apache.hc.core5.reactor.AbstractIOReactorBase.connect(AbstractIOReactorBase.java:50)
at org.apache.hc.client5.http.impl.nio.MultihomeIOSessionRequester$2.executeNext(MultihomeIOSessionRequester.java:139)
at org.apache.hc.client5.http.impl.nio.MultihomeIOSessionRequester$2.run(MultihomeIOSessionRequester.java:188)
at org.apache.hc.client5.http.impl.nio.MultihomeIOSessionRequester.connect(MultihomeIOSessionRequester.java:192)
at org.apache.hc.client5.http.impl.nio.DefaultAsyncClientConnectionOperator.connect(DefaultAsyncClientConnectionOperator.java:115)
at org.apache.hc.client5.http.impl.nio.PoolingAsyncClientConnectionManager.connect(PoolingAsyncClientConnectionManager.java:453)
at org.apache.hc.client5.http.impl.async.InternalHttpAsyncExecRuntime.connectEndpoint(InternalHttpAsyncExecRuntime.java:226)
at org.apache.hc.client5.http.impl.async.AsyncConnectExec.doProceedToNextHop(AsyncConnectExec.java:220)
at org.apache.hc.client5.http.impl.async.AsyncConnectExec.proceedToNextHop(AsyncConnectExec.java:195)
at org.apache.hc.client5.http.impl.async.AsyncConnectExec.access$000(AsyncConnectExec.java:90)
at org.apache.hc.client5.http.impl.async.AsyncConnectExec$1.completed(AsyncConnectExec.java:162)
at org.apache.hc.client5.http.impl.async.AsyncConnectExec$1.completed(AsyncConnectExec.java:151)
at org.apache.hc.client5.http.impl.async.InternalHttpAsyncExecRuntime$1.completed(InternalHttpAsyncExecRuntime.java:128)
at org.apache.hc.client5.http.impl.async.InternalHttpAsyncExecRuntime$1.completed(InternalHttpAsyncExecRuntime.java:120)
at org.apache.hc.core5.concurrent.BasicFuture.completed(BasicFuture.java:148)
at org.apache.hc.client5.http.impl.nio.PoolingAsyncClientConnectionManager$3$1.leaseCompleted(PoolingAsyncClientConnectionManager.java:336)
at org.apache.hc.client5.http.impl.nio.PoolingAsyncClientConnectionManager$3$1.completed(PoolingAsyncClientConnectionManager.java:321)
at org.apache.hc.client5.http.impl.nio.PoolingAsyncClientConnectionManager$3$1.completed(PoolingAsyncClientConnectionManager.java:282)
at org.apache.hc.core5.concurrent.BasicFuture.completed(BasicFuture.java:148)
at org.apache.hc.core5.pool.StrictConnPool.fireCallbacks(StrictConnPool.java:401)
at org.apache.hc.core5.pool.StrictConnPool.lease(StrictConnPool.java:219)
at org.apache.hc.client5.http.impl.nio.PoolingAsyncClientConnectionManager$3.<init>(PoolingAsyncClientConnectionManager.java:279)
at org.apache.hc.client5.http.impl.nio.PoolingAsyncClientConnectionManager.lease(PoolingAsyncClientConnectionManager.java:274)
at org.apache.hc.client5.http.impl.async.InternalHttpAsyncExecRuntime.acquireEndpoint(InternalHttpAsyncExecRuntime.java:115)
at org.apache.hc.client5.http.impl.async.AsyncConnectExec.execute(AsyncConnectExec.java:150)
at org.apache.hc.client5.http.impl.async.AsyncExecChainElement.execute(AsyncExecChainElement.java:54)
at org.apache.hc.client5.http.impl.async.AsyncProtocolExec.internalExecute(AsyncProtocolExec.java:207)
at org.apache.hc.client5.http.impl.async.AsyncProtocolExec.execute(AsyncProtocolExec.java:172)
at org.apache.hc.client5.http.impl.async.AsyncExecChainElement.execute(AsyncExecChainElement.java:54)
at org.apache.hc.client5.http.impl.async.AsyncHttpRequestRetryExec.internalExecute(AsyncHttpRequestRetryExec.java:113)
at org.apache.hc.client5.http.impl.async.AsyncHttpRequestRetryExec.execute(AsyncHttpRequestRetryExec.java:211)
at org.apache.hc.client5.http.impl.async.AsyncExecChainElement.execute(AsyncExecChainElement.java:54)
at org.apache.hc.client5.http.impl.async.AsyncRedirectExec.internalExecute(AsyncRedirectExec.java:111)
at org.apache.hc.client5.http.impl.async.AsyncRedirectExec.execute(AsyncRedirectExec.java:278)
at org.apache.hc.client5.http.impl.async.AsyncExecChainElement.execute(AsyncExecChainElement.java:54)
at org.apache.hc.client5.http.impl.async.InternalAbstractHttpAsyncClient.executeImmediate(InternalAbstractHttpAsyncClient.java:386)
at org.apache.hc.client5.http.impl.async.InternalAbstractHttpAsyncClient.lambda$doExecute$0(InternalAbstractHttpAsyncClient.java:242)
at org.apache.hc.core5.http.nio.support.BasicRequestProducer.sendRequest(BasicRequestProducer.java:93)
at org.apache.hc.client5.http.impl.async.InternalAbstractHttpAsyncClient.doExecute(InternalAbstractHttpAsyncClient.java:209)
at org.apache.hc.client5.http.impl.async.CloseableHttpAsyncClient.execute(CloseableHttpAsyncClient.java:96)
at org.apache.hc.client5.http.impl.async.CloseableHttpAsyncClient.execute(CloseableHttpAsyncClient.java:106)
at co.elastic.clients.transport.rest5_client.low_level.Rest5Client.lambda$performRequestAsync$1(Rest5Client.java:407)
at co.elastic.clients.transport.rest5_client.low_level.Cancellable$RequestCancellable.runIfNotCancelled(Cancellable.java:98)
at co.elastic.clients.transport.rest5_client.low_level.Rest5Client.performRequestAsync(Rest5Client.java:404)
at co.elastic.clients.transport.rest5_client.low_level.Rest5Client.performRequestAsync(Rest5Client.java:391)
at co.elastic.clients.transport.rest5_client.Rest5ClientHttpClient.performRequestAsync(Rest5ClientHttpClient.java:114)
at co.elastic.clients.transport.ElasticsearchTransportBase.performRequestAsync(ElasticsearchTransportBase.java:190)
at co.elastic.clients.elasticsearch.cat.ElasticsearchCatAsyncClient.indices(ElasticsearchCatAsyncClient.java:538)
at co.elastic.clients.elasticsearch.cat.ElasticsearchCatAsyncClient.indices(ElasticsearchCatAsyncClient.java:576)
... 107 more
The original exception, before all requests show the reactor error, is that the connection is closed:
Caused by: java.util.concurrent.ExecutionException: org.apache.hc.core5.http.RequestNotExecutedException: Connection is closed
at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2096)
at <snip - private code>
... 106 more
Caused by: org.apache.hc.core5.http.RequestNotExecutedException: Connection is closed
at org.apache.hc.core5.http.nio.command.CommandSupport.cancelCommands(CommandSupport.java:68)
at org.apache.hc.core5.http.impl.nio.AbstractHttp1StreamDuplexer.onDisconnect(AbstractHttp1StreamDuplexer.java:415)
at org.apache.hc.core5.http.impl.nio.AbstractHttp1IOEventHandler.disconnected(AbstractHttp1IOEventHandler.java:95)
at org.apache.hc.core5.http.impl.nio.ClientHttp1IOEventHandler.disconnected(ClientHttp1IOEventHandler.java:41)
at org.apache.hc.core5.reactor.ssl.SSLIOSession$1.disconnected(SSLIOSession.java:253)
at org.apache.hc.core5.reactor.InternalDataChannel.disconnected(InternalDataChannel.java:205)
at org.apache.hc.core5.reactor.SingleCoreIOReactor.processClosedSessions(SingleCoreIOReactor.java:229)
at org.apache.hc.core5.reactor.SingleCoreIOReactor.doExecute(SingleCoreIOReactor.java:131)
at org.apache.hc.core5.reactor.AbstractSingleCoreIOReactor.execute(AbstractSingleCoreIOReactor.java:92)
at org.apache.hc.core5.reactor.IOReactorWorker.run(IOReactorWorker.java:44)
... 1 more
I'm not surprised about the occasional close with the heavy environment, but like I've said, nothing has really changed except for the update to 9.x and the httpclient. We were on elasticsearch-java
8.18.0 with elasticsearch-rest-client
9.0.3 immediately before this deployment.
What other information would be preferable? I am running more tests with the CPU constraint relaxed a bit more, to see if this is bringing out the situation, as a data point to help distinguish the other different factors between systems (single node, same host, tls, user auth). My next plan after this is to switch back to the elasticsearch-rest-client
9.0.4 and revert the client initialization changes to see if it pops up again with this environment under load.
Thanks for your time!