I get error `unavailable (14): Transport became inactive` with a big number (>20) of unary requests #2202

sergeiromanchuk · 2025-02-27T09:09:39Z

Describe the bug

I have a table in which in each cell I send two queries to GRPC using async let. If I scroll through the table slowly, I almost never encounter any problems. But if I start scrolling the table quickly, every query I send throws the error unavailable (14): Transport became inactive.

This usually happens when the number of streams in use exceeds 20.

channel log
[{Default}] connectionUtilizationChanged(id:streamsUsed:streamCapacity:)::ChannelProvider.swift::191:19: [API v2] → ChannelProvider::connectionUtilizationChanged(id:streamsUsed:streamCapacity:). Connection ID: ObjectIdentifier(0x000000011ee58900). Stream capacity: 2147483647. Streams used: 21

channel error
[{Default}] connectionClosed(id:error:)::ChannelProvider.swift::205:19: [API v2] → ChannelProvider::connectionClosed(id:error:). Connection ID: ObjectIdentifier(0x000000011ee58900). Error: Optional(NIOCore.ChannelError.ioOnClosedChannel)

request error
[{Default}] xxx()::XXXService.swift::67:23: [API] → XXXService::xxx(). XXX service completed with result: success(unavailable (14): Transport became inactive)

trace
2025-02-27T09:45:36+0100 trace [CallOptions] : call_state=closed grpc.conn.addr_local=10.179.0.18 grpc.conn.addr_remote=10.229.0.160 grpc_request_id=5BFD6881-842F-42AB-95F1-AC0E322817FE [GRPC] failing buffered writes

SPM:
Protobuf 1.25.2
gRPC 1.65.1
gRPC-Swift 1.21.0

Platform: iOS

Expected behaviour

I expect that the channel will normally handle this number of requests as we can see that it is only 20-30 requests.

Additional information

Options for each unary call:

static var unaryCallOptions: CallOptions {
    let timeLimit: TimeLimit = .timeout(.seconds(5))
    #if DEBUG
    var logger = Logger(label: "[\(Self.self)]")
    logger.logLevel = .trace
    return CallOptions(timeLimit: timeLimit, logger: logger)
    #else
    return CallOptions(timeLimit: timeLimit)
    #endif
}

Keepalive

let keepalive = ClientConnectionKeepalive(
    interval: .minutes(5),
    timeout: .seconds(30),
    permitWithoutCalls: true
)

GRPCChannelPool configuration:

let channel = try GRPCChannelPool.with(
    target: .hostAndPort(
        configuration.host,
        configuration.port
    ),
    transportSecurity: .tls(
        configuration.tlsConfiguration
    ),
    eventLoopGroup: configuration.eventLoopGroup,
    { [weak self] channelConfiguration in
        channelConfiguration.keepalive = configuration.keepalive
        channelConfiguration.delegate = self
        channelConfiguration.connectionBackoff = ConnectionBackoff(
            initialBackoff: 1,
            maximumBackoff: 15,
            multiplier: 1.5,
            minimumConnectionTimeout: 15
        )
        #if DEBUG
        var logger = Logger(label: "[Channel gRPC]")
        logger.logLevel = .trace
        channelConfiguration.backgroundActivityLogger = logger
        #endif
    }
)

Request example:

public func getData(
    for id: UUID,
    options: CallOptions
) async throws -> Data {
    let client = try dependency.getClient()
    async let task1 = client.getX1(
        X1Request(uids: [id]),
        callOptions: options
    )
    async let task2 = client.getX2(
        X2Request(uids: [id]),
        callOptions: options
    )
    let (response1, response2) = try await (task1, task1) // ! error !
    return ...
}

Generated code:

public func getX(
  _ request: Frontend_PpX_V1_XRequest,
  callOptions: CallOptions? = nil
) async throws -> Frontend_PpX_V1_XResponse {
  return try await self.performAsyncUnaryCall(
    path: Frontend_PpX_V1_XServiceClientMetadata.Methods.getX.path,
    request: request,
    callOptions: callOptions ?? self.defaultCallOptions,
    interceptors: self.interceptors?.makeXInterceptors() ?? []
  )
}

public func performAsyncUnaryCall<Request: Message & Sendable, Response: Message & Sendable>(
  path: String,
  request: Request,
  callOptions: CallOptions? = nil,
  interceptors: [ClientInterceptor<Request, Response>] = [],
  responseType: Response.Type = Response.self
) async throws -> Response {
  let call = self.channel.makeAsyncUnaryCall(
    path: path,
    request: request,
    callOptions: callOptions ?? self.defaultCallOptions,
    interceptors: interceptors
  )

  return try await withTaskCancellationHandler {
    try await call.response
  } onCancel: {
    call.cancel()
  }
}

The text was updated successfully, but these errors were encountered:

glbrntt · 2025-02-27T10:29:34Z

unavailable (14): Transport became inactive means that the connection was unexpectedly dropped (which is backed up by NIOCore.ChannelError.ioOnClosedChannel).

If the client closed the connection because of a local error condition we should see that. So the working hypothesis is that the server dropped the connection. It's unclear why that's the case.

Could you provide the complete trace logs for when this happens? It might shed some light on this.

sergeiromanchuk · 2025-02-27T10:41:49Z

@glbrntt Is that valuable logs?

2025-02-27T09:45:36+0100 trace [CallOptions] : call_state=closed grpc.conn.addr_local=10.179.0.18 grpc.conn.addr_remote=10.229.0.160 grpc_request_id=5BFD6881-842F-42AB-95F1-AC0E322817FE [GRPC] failing buffered writes

glbrntt · 2025-02-27T10:47:41Z

No, that will be a result of the connection being dropped, not the reason it drops.

sergeiromanchuk · 2025-02-27T10:51:25Z

GRPCTraceLogs.txt

cc: @glbrntt

sergeiromanchuk · 2025-02-27T11:26:03Z

In my code, I do the following

task?.cancel()
task = Task {
    let data = try await service.getData(...)
    // handle data
}

I noticed that the error appears when I make a cancel for the task on which I execute the request, but by removing the cancellation of the task I found that the number of executed requests goes up to 1000+ and there is no problem. Why is this?

cc: @glbrntt

sergeiromanchuk · 2025-02-27T12:50:35Z

I also found out that if timeout is triggered in requests, my channel also goes down as a result

LogsWithGRPCPool.txt.zip

sergeiromanchuk · 2025-02-27T13:58:30Z

No such stream logs

When debugging, I also found that gRPC calls channelRead when receiving a response and tries to find the corresponding stream, which was previously closed and removed. Then, it sends the error NoSuchStream.

2025-02-27T14:45:58+0100 error [Channel gRPC] : error=NoSuchStream(streamID: HTTP2StreamID(233), location: "NIOHTTP2/ConnectionStreamsState.swift:305") grpc.conn.addr_local=10.179.0.62 grpc.conn.addr_remote=10.229.2.186 grpc_connection_id=7F6EA520-3D88-4156-9712-30DC60993782/2 [GRPC] grpc client error
2025-02-27T14:45:58+0100 error [Channel gRPC] : error=ioOnClosedChannel grpc.conn.addr_local=10.179.0.62 grpc.conn.addr_remote=10.229.2.186 grpc_connection_id=7F6EA520-3D88-4156-9712-30DC60993782/2 [GRPC] grpc client error
2025-02-27T14:45:58+0100 debug [Channel gRPC] : connectivity_state=ready grpc_connection_id=7F6EA520-3D88-4156-9712-30DC60993782/2 [GRPC] deactivating connection
2025-02-27T14:45:58+0100 debug [Channel gRPC] : delay=0 grpc_connection_id=7F6EA520-3D88-4156-9712-30DC60993782/2 [GRPC] scheduling connection attempt
2025-02-27T14:45:58+0100 debug [Channel gRPC] : grpc_connection_id=7F6EA520-3D88-4156-9712-30DC60993782/2 new_state=transientFailure old_state=ready [GRPC] connectivity state change
2025-02-27T14:45:58+0100 error [CallOptions] : error=unavailable (14): Transport became inactive grpc.conn.addr_local=10.179.0.62 grpc.conn.addr_remote=10.229.2.186 grpc_connection_id=7F6EA520-3D88-4156-9712-30DC60993782/2 grpc_request_id=BEE94883-7EFB-4287-8F56-60F6BB3EFF76 [GRPC] grpc client error
2025-02-27T14:45:58+0100 trace [CallOptions] : call_state=closed grpc.conn.addr_local=10.179.0.62 grpc.conn.addr_remote=10.229.2.186 grpc_connection_id=7F6EA520-3D88-4156-9712-30DC60993782/2 grpc_request_id=BEE94883-7EFB-4287-8F56-60F6BB3EFF76 [GRPC] failing buffered writes
2025-02-27T14:45:58+0100 error [CallOptions] : error=unavailable (14): Transport became inactive grpc.conn.addr_local=10.179.0.62 grpc.conn.addr_remote=10.229.2.186 grpc_connection_id=7F6EA520-3D88-4156-9712-30DC60993782/2 grpc_request_id=F0BE824A-04A4-4074-9A5B-43700DF739EB [GRPC] grpc client error
2025-02-27T14:45:58+0100 trace [CallOptions] : call_state=closed grpc.conn.addr_local=10.179.0.62 grpc.conn.addr_remote=10.229.2.186 grpc_connection_id=7F6EA520-3D88-4156-9712-30DC60993782/2 grpc_request_id=F0BE824A-04A4-4074-9A5B-43700DF739EB [GRPC] failing buffered writes
2025-02-27T14:45:58+0100 error [CallOptions] : error=unavailable (14): Transport became inactive grpc.conn.addr_local=10.179.0.62 grpc.conn.addr_remote=10.229.2.186 grpc_connection_id=7F6EA520-3D88-4156-9712-30DC60993782/2 grpc_request_id=95EFBB4C-09FF-44BF-8DB5-D8E11ADD7262 [GRPC] grpc client error
2025-02-27T14:45:58+0100 trace [CallOptions] : call_state=closed grpc.conn.addr_local=10.179.0.62 grpc.conn.addr_remote=10.229.2.186 grpc_connection_id=7F6EA520-3D88-4156-9712-30DC60993782/2 grpc_request_id=95EFBB4C-09FF-44BF-8DB5-D8E11ADD7262 [GRPC] failing buffered writes
2025-02-27T14:45:58+0100 error [CallOptions] : error=unavailable (14): Transport became inactive grpc.conn.addr_local=10.179.0.62 grpc.conn.addr_remote=10.229.2.186 grpc_connection_id=7F6EA520-3D88-4156-9712-30DC60993782/2 grpc_request_id=95F10C40-D09F-493F-87A0-2C322C25653E [GRPC] grpc client error
2025-02-27T14:45:58+0100 trace [CallOptions] : call_state=closed grpc.conn.addr_local=10.179.0.62 grpc.conn.addr_remote=10.229.2.186 grpc_connection_id=7F6EA520-3D88-4156-9712-30DC60993782/2 grpc_request_id=95F10C40-D09F-493F-87A0-2C322C25653E [GRPC] failing buffered writes
2025-02-27T14:45:58+0100 debug [Channel gRPC] : connectivity_state=transientFailure grpc_connection_id=7F6EA520-3D88-4156-9712-30DC60993782/3 [GRPC] vending multiplexer future

sergeiromanchuk · 2025-02-27T14:04:48Z

https://stackoverflow.com/questions/69422236/grpc-swift-idle-stream-cause-next-rpc-call-to-fail-with-unavailable-14-due-to

sergeiromanchuk · 2025-02-27T19:11:21Z

It looks like the main problem is that in the list of streams is not found a particular stream after which the channel is closed. If I load data into a cell and cancel tasks that are no longer needed, I always catch error on stream 29.

(lldb) p underlyingError
(NIOHTTP2.NIOHTTP2Errors.NoSuchStream) {
  streamID = (networkStreamID = 221)
  location = "NIOHTTP2/ConnectionStreamsState.swift:305"
}

Why is that happening?

cc: @glbrntt

sergeiromanchuk · 2025-02-27T20:47:32Z

Found related issue: #1421

sergeiromanchuk · 2025-02-27T22:04:05Z

I tried a solution to create 1 channel for 1 request. In the end, it helped. Well, it's impossible to work with this because the loading time is incredibly long. I really need some kind of solution that will allow me not to close the channel with a large number of requests. I believe that this behavior is indeed a bug.

I also haven't found a way to optimize the connection time of the channel to the server in any way, so the question is relevant.

I also can't use your new version of the library (2.0.0) because I see that it only supports iOS 18 and our project supports iOS 15.

cc: @glbrntt

sergeiromanchuk · 2025-02-28T09:03:52Z

Basically, I understand it like this, and I'm almost sure it's true. We cancel a stream, and it gets added to recentlyResetStreams. This list has a limit of 32 streams. When it fills up, the last stream is removed, and a new one is added.

If we scroll quickly, this reset list fills up very fast, and streams get removed from it just as quickly. Then, packets arrive from old streams/requests, which we try to process. We check if the corresponding stream exists in activeStreams—it doesn't. Then we check recentlyResetStreams, but the stream isn't there either because the list filled up with new entries and our stream got evicted before the packet arrived.

As a result, we return a noSuchStream error, and the channel closes. That's it.

Oh, and yeah—this limit of 32 is hardcoded, so we can't change it. The TODO for fixing it has been there since 2019, so I guess it's about time to do it. 😅

sergeiromanchuk · 2025-02-28T09:06:06Z

What do you think?

сс: @jackystd @glbrntt

glbrntt · 2025-02-28T09:36:16Z

Hey @sergeiromanchuk -- thanks for providing all of this info and good job on diagnosing the issue!

I think allowing the max reset streams limit to be configurable and then exposing that config in gRPC is reasonable. I'll open a separate issue in https://github.com/apple/swift-nio-http2 to track doing that. We can track the gRPC changes in this issue.

In the meantime you can workaround this by not cancelling the streams.

sergeiromanchuk · 2025-02-28T09:45:21Z

Thanks for the reply @glbrntt. Unfortunately, it's not enough to just not cancel streams, I'll also have to limit the number of calls because the channel also drops with an error if some request is completed with timeout. I think it is another issue.

glbrntt · 2025-02-28T09:53:25Z

Here's the http/2 issue: apple/swift-nio-http2#493

glbrntt · 2025-02-28T09:54:53Z

I'll also have to limit the number of calls because the channel also drops with an error if some request is completed with timeout.

Can you elaborate on this one? An RPC timing out won't close the connection, but an RPC may time out because of a connection timeout.

sergeiromanchuk · 2025-02-28T10:24:17Z

When I sent a lot of tasks without canceling ~1000, the connection would process them all for probably about 3 minutes and they would drop with a timout error. After all the streams were processed, the connection would also close with ioOnClosedChannel error.

sergeiromanchuk · 2025-02-28T14:11:39Z

@glbrntt I don't cancel unary calls, while limiting parallel execution to 4 tasks. However, the channel still closes with an error. It seems that the problem is much deeper. Please examine the logs in the ioOnCloseChannel places. (It's not noSuchStream issue for that case)

Logs1.txt
Logs2.txt

glbrntt added status/triage Collecting information required to triage the issue. version/v1 Relates to v1 labels Feb 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I get error `unavailable (14): Transport became inactive` with a big number (>20) of unary requests #2202

I get error `unavailable (14): Transport became inactive` with a big number (>20) of unary requests #2202

sergeiromanchuk commented Feb 27, 2025 •

edited

Loading

glbrntt commented Feb 27, 2025

sergeiromanchuk commented Feb 27, 2025 •

edited

Loading

glbrntt commented Feb 27, 2025

sergeiromanchuk commented Feb 27, 2025 •

edited

Loading

sergeiromanchuk commented Feb 27, 2025

sergeiromanchuk commented Feb 27, 2025

sergeiromanchuk commented Feb 27, 2025

sergeiromanchuk commented Feb 27, 2025

sergeiromanchuk commented Feb 27, 2025

sergeiromanchuk commented Feb 27, 2025

sergeiromanchuk commented Feb 27, 2025 •

edited

Loading

sergeiromanchuk commented Feb 28, 2025

sergeiromanchuk commented Feb 28, 2025

glbrntt commented Feb 28, 2025

sergeiromanchuk commented Feb 28, 2025

glbrntt commented Feb 28, 2025

glbrntt commented Feb 28, 2025

sergeiromanchuk commented Feb 28, 2025

sergeiromanchuk commented Feb 28, 2025 •

edited

Loading

I get error unavailable (14): Transport became inactive with a big number (>20) of unary requests #2202

I get error unavailable (14): Transport became inactive with a big number (>20) of unary requests #2202

Comments

sergeiromanchuk commented Feb 27, 2025 • edited Loading

Describe the bug

Expected behaviour

Additional information

glbrntt commented Feb 27, 2025

sergeiromanchuk commented Feb 27, 2025 • edited Loading

glbrntt commented Feb 27, 2025

sergeiromanchuk commented Feb 27, 2025 • edited Loading

sergeiromanchuk commented Feb 27, 2025

sergeiromanchuk commented Feb 27, 2025

sergeiromanchuk commented Feb 27, 2025

No such stream logs

sergeiromanchuk commented Feb 27, 2025

sergeiromanchuk commented Feb 27, 2025

sergeiromanchuk commented Feb 27, 2025

sergeiromanchuk commented Feb 27, 2025 • edited Loading

sergeiromanchuk commented Feb 28, 2025

sergeiromanchuk commented Feb 28, 2025

glbrntt commented Feb 28, 2025

sergeiromanchuk commented Feb 28, 2025

glbrntt commented Feb 28, 2025

glbrntt commented Feb 28, 2025

sergeiromanchuk commented Feb 28, 2025

sergeiromanchuk commented Feb 28, 2025 • edited Loading

I get error `unavailable (14): Transport became inactive` with a big number (>20) of unary requests #2202

I get error `unavailable (14): Transport became inactive` with a big number (>20) of unary requests #2202

sergeiromanchuk commented Feb 27, 2025 •

edited

Loading

sergeiromanchuk commented Feb 27, 2025 •

edited

Loading

sergeiromanchuk commented Feb 27, 2025 •

edited

Loading

sergeiromanchuk commented Feb 27, 2025 •

edited

Loading

sergeiromanchuk commented Feb 28, 2025 •

edited

Loading