[Bug] Can't establish connection to Redis during AWS MemoryDB failover

Fred version - 10.1.0
Redis version - Redis 7.1 (AWS MemoryDB)
Platform - linux
Deployment type - cluster

**Describe the bug**

When failover happens in AWS MemoryDB, fred is unable to reconnect to the cluster for ~10 minutes, although the failover happens within seconds.

All requests return "Timeout Error: Request timed out."

**To Reproduce**

I've prepared a minimal reproducer at https://github.com/dmitryvk/fred-repro-failover

Steps to reproduce the behavior:

1. clone the repo https://github.com/dmitryvk/fred-repro-failover
2. run `docker compose -f deps/docker-compose.yml up -d && ./print-addrs.sh && cargo run`
3. After 10 seconds, it prints
```
thread 'main' panicked at src/main.rs:32:37:
called `Result::unwrap()` on an `Err` value: Error { details: "Request timed out.", kind: Timeout }
```


**Logs**
(If possible set `RUST_LOG=fred=trace` and run with `--features debug-ids`)

```
connecting
2025-07-29T12:52:39.326481Z DEBUG fred::router::commands: fred-ZrJNA4t5uH: Initializing router with policy: None    
2025-07-29T12:52:39.326542Z DEBUG fred::router::centralized: fred-ZrJNA4t5uH: Initializing centralized connection.    
2025-07-29T12:52:39.326585Z TRACE fred::protocol::connection: fred-ZrJNA4t5uH: Checking connection type. Native-tls: false, Rustls: false    
2025-07-29T12:52:39.326808Z DEBUG hickory_proto::xfer::dns_handle: querying: redis.redis. A
2025-07-29T12:52:39.326873Z DEBUG hickory_resolver::name_server::name_server_pool: sending request: [Query { name: Name("redis.redis."), query_type: A, query_class: IN }]
2025-07-29T12:52:39.326932Z DEBUG hickory_resolver::name_server::name_server: reconnecting: NameServerConfig { socket_addr: 127.0.0.1:1053, protocol: Udp, tls_dns_name: None, http_endpoint: None, trust_negative_responses: true, bind_addr: None }
2025-07-29T12:52:39.326996Z DEBUG hickory_proto::xfer: enqueueing message:QUERY:[Query { name: Name("redis.redis."), query_type: A, query_class: IN }]
2025-07-29T12:52:39.327084Z DEBUG hickory_proto::udp::udp_client_stream: final message: ; header 23987:QUERY:RD:NoError:QUERY:0/0/0
; query
;; redis.redis. IN A

2025-07-29T12:52:39.327200Z TRACE hickory_proto::udp::udp_stream: binding UDP socket port=1028
2025-07-29T12:52:39.327267Z DEBUG hickory_proto::udp::udp_stream: created socket successfully
2025-07-29T12:52:39.327340Z TRACE hickory_proto::udp::udp_client_stream: creating UDP receive buffer with size 512
2025-07-29T12:52:39.327825Z TRACE hickory_proto::rr::record_data: reading A
2025-07-29T12:52:39.327859Z TRACE hickory_proto::rr::record_data: reading A
2025-07-29T12:52:39.327885Z DEBUG hickory_proto::udp::udp_client_stream: received message id: 23987
2025-07-29T12:52:39.327927Z DEBUG hickory_proto::error: response: ; header 23987:RESPONSE:RD,AA,RA:NoError:QUERY:2/0/0
; query
;; redis.redis. IN A
; answers 2
redis.redis. 0 IN A 172.29.0.2
redis.redis. 0 IN A 172.29.0.3
; nameservers 0
; additionals 0

2025-07-29T12:52:39.327974Z DEBUG hickory_proto::error: response: ; header 23987:RESPONSE:RD,AA,RA:NoError:QUERY:2/0/0
; query
;; redis.redis. IN A
; answers 2
redis.redis. 0 IN A 172.29.0.2
redis.redis. 0 IN A 172.29.0.3
; nameservers 0
; additionals 0

resolved [172.29.0.2:6379, 172.29.0.3:6379]
2025-07-29T12:52:39.328099Z DEBUG fred::protocol::connection: fred-ZrJNA4t5uH: Creating TCP connection to redis.redis at 172.29.0.2:6379    
2025-07-29T12:52:39.328387Z TRACE fred::protocol::codec: fred-ZrJNA4t5uH: Encoded 14 bytes to redis.redis:6379. Buffer len: 14 (RESP2)    
2025-07-29T12:52:49.329987Z DEBUG fred::modules::inner: fred-ZrJNA4t5uH: No `on_error` listener. The error was: Error { details: "Request timed out.", kind: Timeout }    
2025-07-29T12:52:49.330043Z TRACE fred::runtime::_tokio: fred-ZrJNA4t5uH: Ending connection task with Err(Error { details: "Request timed out.", kind: Timeout })    

thread 'main' panicked at src/main.rs:32:37:
called `Result::unwrap()` on an `Err` value: Error { details: "Request timed out.", kind: Timeout }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
2025-07-29T12:52:49.330495Z DEBUG hickory_proto::xfer::dns_exchange: io_stream is done, shutting down
```

**My analysis**

When failover happens in AWS MemoryDB, the DNS entry for the MemoryDB endpoint returns 2 A entries - one for the failed cluster, the other one for functioning (promoted) cluster. Both endpoints accept connections, but the endpoint for the failed cluster does not send anything back after having accepted the connection.
I have simulated that situation using docker and dnsmasq - there's one DNS entry for the functioning cluster and one for the failed cluster.

I can see that fred establishes a TCP connection to the first endpoint and discards the other entries. The endpoint that it connected to is non-functioning, and fred keeps using that endpoint and get only timeouts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Can't establish connection to Redis during AWS MemoryDB failover #358

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Bug] Can't establish connection to Redis during AWS MemoryDB failover #358

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions