Skip to content

Permanently unusable client using httpclient5-based transport (I/O reactor has been shut down) #1046

@marcreichman-pfi

Description

@marcreichman-pfi

Java API client version

9.0.4

Java version

21

Elasticsearch Version

9.0.4

Problem description

Hello,

We are trying to diagnose an issue which seems to have only started to come up with the new transport. This looks similar to #1003, but it's with the httpclient5 transport.

We have an application which makes various periodic queries, some more involved than others, and phases of intense indexing work. This app has been using Elasticsearch since 2.x era, and has used every client along the way: TransportClient, HLRC, "new" Java client with elasticsearch-rest-client, and now the client with HTTPClient5 transport. The structure and query patterns of the app have not changed in any notable way over this period.

This past weekend, the app was deployed for the first time with the HTTPClient5 transport to two different machines, one with 10 nodes on a low-cpu-constrained client environment without TLS or users, and another with a cpu-constrained environment on a single node (same machine) with TLS and users. The nodes all run on the dockerized version of ES 9.0.4.

Both environments went through an index migration from our side, using the bulk ingester class, and then resumed the normal activities of occasional queries and occasional indexing requests.

The constrained environment has now twice this weekend gotten into a state where queries cannot be run, and there are log entries that I/O reactor has been shut down every time the client tries to operate, and nothing brings it back except restarts / reinitializations of the client code.

Some of our code is more funky regarding the CompleteableFuture patterns, but even simple code like this can bring it out once it's in this state:

final String finalIndexPattern = "myIndexv*";
final int INDEX_OPERATION_DEFAULT_TIMEOUT_SEC = 45;
List<IndicesRecord> r = client.cat().indices(t -> t.index(finalIndexPattern)).get(INDEX_OPERATION_DEFAULT_TIMEOUT_SEC, TimeUnit.SECONDS).indices();

An example of the trace:

org.apache.hc.core5.reactor.IOReactorShutdownException: I/O reactor has been shut down
...
Caused by: java.util.concurrent.ExecutionException: org.apache.hc.core5.reactor.IOReactorShutdownException: I/O reactor has been shut down
        at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
        at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2096)
        at <snip - private code>
        ... 106 more
Caused by: org.apache.hc.core5.reactor.IOReactorShutdownException: I/O reactor has been shut down
        at org.apache.hc.core5.reactor.AbstractIOReactorBase.connect(AbstractIOReactorBase.java:50)
        at org.apache.hc.client5.http.impl.nio.MultihomeIOSessionRequester$2.executeNext(MultihomeIOSessionRequester.java:139)
        at org.apache.hc.client5.http.impl.nio.MultihomeIOSessionRequester$2.run(MultihomeIOSessionRequester.java:188)
        at org.apache.hc.client5.http.impl.nio.MultihomeIOSessionRequester.connect(MultihomeIOSessionRequester.java:192)
        at org.apache.hc.client5.http.impl.nio.DefaultAsyncClientConnectionOperator.connect(DefaultAsyncClientConnectionOperator.java:115)
        at org.apache.hc.client5.http.impl.nio.PoolingAsyncClientConnectionManager.connect(PoolingAsyncClientConnectionManager.java:453)
        at org.apache.hc.client5.http.impl.async.InternalHttpAsyncExecRuntime.connectEndpoint(InternalHttpAsyncExecRuntime.java:226)
        at org.apache.hc.client5.http.impl.async.AsyncConnectExec.doProceedToNextHop(AsyncConnectExec.java:220)
        at org.apache.hc.client5.http.impl.async.AsyncConnectExec.proceedToNextHop(AsyncConnectExec.java:195)
        at org.apache.hc.client5.http.impl.async.AsyncConnectExec.access$000(AsyncConnectExec.java:90)
        at org.apache.hc.client5.http.impl.async.AsyncConnectExec$1.completed(AsyncConnectExec.java:162)
        at org.apache.hc.client5.http.impl.async.AsyncConnectExec$1.completed(AsyncConnectExec.java:151)
        at org.apache.hc.client5.http.impl.async.InternalHttpAsyncExecRuntime$1.completed(InternalHttpAsyncExecRuntime.java:128)
        at org.apache.hc.client5.http.impl.async.InternalHttpAsyncExecRuntime$1.completed(InternalHttpAsyncExecRuntime.java:120)
        at org.apache.hc.core5.concurrent.BasicFuture.completed(BasicFuture.java:148)
        at org.apache.hc.client5.http.impl.nio.PoolingAsyncClientConnectionManager$3$1.leaseCompleted(PoolingAsyncClientConnectionManager.java:336)
        at org.apache.hc.client5.http.impl.nio.PoolingAsyncClientConnectionManager$3$1.completed(PoolingAsyncClientConnectionManager.java:321)
        at org.apache.hc.client5.http.impl.nio.PoolingAsyncClientConnectionManager$3$1.completed(PoolingAsyncClientConnectionManager.java:282)
        at org.apache.hc.core5.concurrent.BasicFuture.completed(BasicFuture.java:148)
        at org.apache.hc.core5.pool.StrictConnPool.fireCallbacks(StrictConnPool.java:401)
        at org.apache.hc.core5.pool.StrictConnPool.lease(StrictConnPool.java:219)
        at org.apache.hc.client5.http.impl.nio.PoolingAsyncClientConnectionManager$3.<init>(PoolingAsyncClientConnectionManager.java:279)
        at org.apache.hc.client5.http.impl.nio.PoolingAsyncClientConnectionManager.lease(PoolingAsyncClientConnectionManager.java:274)
        at org.apache.hc.client5.http.impl.async.InternalHttpAsyncExecRuntime.acquireEndpoint(InternalHttpAsyncExecRuntime.java:115)
        at org.apache.hc.client5.http.impl.async.AsyncConnectExec.execute(AsyncConnectExec.java:150)
        at org.apache.hc.client5.http.impl.async.AsyncExecChainElement.execute(AsyncExecChainElement.java:54)
        at org.apache.hc.client5.http.impl.async.AsyncProtocolExec.internalExecute(AsyncProtocolExec.java:207)
        at org.apache.hc.client5.http.impl.async.AsyncProtocolExec.execute(AsyncProtocolExec.java:172)
        at org.apache.hc.client5.http.impl.async.AsyncExecChainElement.execute(AsyncExecChainElement.java:54)
        at org.apache.hc.client5.http.impl.async.AsyncHttpRequestRetryExec.internalExecute(AsyncHttpRequestRetryExec.java:113)
        at org.apache.hc.client5.http.impl.async.AsyncHttpRequestRetryExec.execute(AsyncHttpRequestRetryExec.java:211)
        at org.apache.hc.client5.http.impl.async.AsyncExecChainElement.execute(AsyncExecChainElement.java:54)
        at org.apache.hc.client5.http.impl.async.AsyncRedirectExec.internalExecute(AsyncRedirectExec.java:111)
        at org.apache.hc.client5.http.impl.async.AsyncRedirectExec.execute(AsyncRedirectExec.java:278)
        at org.apache.hc.client5.http.impl.async.AsyncExecChainElement.execute(AsyncExecChainElement.java:54)
        at org.apache.hc.client5.http.impl.async.InternalAbstractHttpAsyncClient.executeImmediate(InternalAbstractHttpAsyncClient.java:386)
        at org.apache.hc.client5.http.impl.async.InternalAbstractHttpAsyncClient.lambda$doExecute$0(InternalAbstractHttpAsyncClient.java:242)
        at org.apache.hc.core5.http.nio.support.BasicRequestProducer.sendRequest(BasicRequestProducer.java:93)
        at org.apache.hc.client5.http.impl.async.InternalAbstractHttpAsyncClient.doExecute(InternalAbstractHttpAsyncClient.java:209)
        at org.apache.hc.client5.http.impl.async.CloseableHttpAsyncClient.execute(CloseableHttpAsyncClient.java:96)
        at org.apache.hc.client5.http.impl.async.CloseableHttpAsyncClient.execute(CloseableHttpAsyncClient.java:106)
        at co.elastic.clients.transport.rest5_client.low_level.Rest5Client.lambda$performRequestAsync$1(Rest5Client.java:407)
        at co.elastic.clients.transport.rest5_client.low_level.Cancellable$RequestCancellable.runIfNotCancelled(Cancellable.java:98)
        at co.elastic.clients.transport.rest5_client.low_level.Rest5Client.performRequestAsync(Rest5Client.java:404)
        at co.elastic.clients.transport.rest5_client.low_level.Rest5Client.performRequestAsync(Rest5Client.java:391)
        at co.elastic.clients.transport.rest5_client.Rest5ClientHttpClient.performRequestAsync(Rest5ClientHttpClient.java:114)
        at co.elastic.clients.transport.ElasticsearchTransportBase.performRequestAsync(ElasticsearchTransportBase.java:190)
        at co.elastic.clients.elasticsearch.cat.ElasticsearchCatAsyncClient.indices(ElasticsearchCatAsyncClient.java:538)
        at co.elastic.clients.elasticsearch.cat.ElasticsearchCatAsyncClient.indices(ElasticsearchCatAsyncClient.java:576)
        ... 107 more

The original exception, before all requests show the reactor error, is that the connection is closed:

Caused by: java.util.concurrent.ExecutionException: org.apache.hc.core5.http.RequestNotExecutedException: Connection is closed
        at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
        at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2096)
        at <snip - private code>
        ... 106 more
Caused by: org.apache.hc.core5.http.RequestNotExecutedException: Connection is closed
        at org.apache.hc.core5.http.nio.command.CommandSupport.cancelCommands(CommandSupport.java:68)
        at org.apache.hc.core5.http.impl.nio.AbstractHttp1StreamDuplexer.onDisconnect(AbstractHttp1StreamDuplexer.java:415)
        at org.apache.hc.core5.http.impl.nio.AbstractHttp1IOEventHandler.disconnected(AbstractHttp1IOEventHandler.java:95)
        at org.apache.hc.core5.http.impl.nio.ClientHttp1IOEventHandler.disconnected(ClientHttp1IOEventHandler.java:41)
        at org.apache.hc.core5.reactor.ssl.SSLIOSession$1.disconnected(SSLIOSession.java:253)
        at org.apache.hc.core5.reactor.InternalDataChannel.disconnected(InternalDataChannel.java:205)
        at org.apache.hc.core5.reactor.SingleCoreIOReactor.processClosedSessions(SingleCoreIOReactor.java:229)
        at org.apache.hc.core5.reactor.SingleCoreIOReactor.doExecute(SingleCoreIOReactor.java:131)
        at org.apache.hc.core5.reactor.AbstractSingleCoreIOReactor.execute(AbstractSingleCoreIOReactor.java:92)
        at org.apache.hc.core5.reactor.IOReactorWorker.run(IOReactorWorker.java:44)
        ... 1 more

I'm not surprised about the occasional close with the heavy environment, but like I've said, nothing has really changed except for the update to 9.x and the httpclient. We were on elasticsearch-java 8.18.0 with elasticsearch-rest-client 9.0.3 immediately before this deployment.

What other information would be preferable? I am running more tests with the CPU constraint relaxed a bit more, to see if this is bringing out the situation, as a data point to help distinguish the other different factors between systems (single node, same host, tls, user auth). My next plan after this is to switch back to the elasticsearch-rest-client 9.0.4 and revert the client initialization changes to see if it pops up again with this environment under load.

Thanks for your time!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions