[websocket]: Handling timeouts/errors in from tcp_transport #882
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
WebSocket Client Timeout Handling Fix
Overview
This document describes a critical fix for the ESP WebSocket client that resolves timeout handling issues that could cause the client to enter a corrupted state and emit fragmented/random data.
Issue Description
Problem Summary
The WebSocket client was not properly handling transport timeout errors, causing it to:
Original Issue Report
Symptoms
Root Cause Analysis
The Core Problem
The WebSocket client's timeout detection logic was fundamentally flawed:
Why This Failed
Ambiguous Return Value:
esp_transport_read()
returns0
for both:network_timeout_ms
)Insufficient Detection: The condition
rlen == 0 && client->last_opcode == WS_TRANSPORT_OPCODES_NONE
was unreliable because:State Corruption: When timeout was misidentified as valid data:
WEBSOCKET_EVENT_DATA
withlen=0
and potentially invalid opcodeTransport Layer Analysis
TCP Transport Component Architecture
The ESP-IDF transport layer provides sophisticated error reporting through multiple layers:
Transport Layer Error Codes
The transport layer distinguishes between different types of "zero" returns:
ESP_ERR_ESP_TLS_CONNECTION_TIMEOUT
ERR_TCP_TRANSPORT_CONNECTION_TIMEOUT
ESP_ERR_ESP_TLS_TCP_CLOSED_FIN
ERR_TCP_TRANSPORT_CONNECTION_CLOSED_BY_FIN
0
0
Transport Layer Implementation
The transport layer (
transport_ssl.c
) already handles these cases properly:The Fix
Solution Approach
Instead of trying to guess timeout conditions from WebSocket frame state, the fix leverages the transport layer's existing error reporting infrastructure.
Key Changes
1. Enhanced Timeout Detection
2. Improved Error Handling for Negative Returns
3. Invalid Opcode Detection
Benefits of the Fix
1. Leverages Existing Infrastructure
2. Comprehensive Error Handling
3. Prevents State Corruption
4. Better Debugging
Testing the Fix
Test Scenarios
Timeout Test
Empty Message Test
Connection Closure Test
Corruption Test
Expected Behavior After Fix
Implementation Details
Files Modified
components/esp_websocket_client/esp_websocket_client.c
esp_websocket_client_recv()
functionDependencies
esp_transport.h
- Transport layer interfaceesp_tls.h
- TLS error handlingesp_log.h
- Logging functionalityConfiguration
No configuration changes required. The fix works with existing WebSocket client configuration.
Backward Compatibility
The fix is fully backward compatible:
Performance Impact
The fix has minimal performance impact:
Conclusion
This fix resolves a critical issue in the WebSocket client by properly leveraging the transport layer's error reporting infrastructure. The solution is:
The fix prevents the WebSocket client from entering corrupted states and ensures reliable operation in all network conditions.
References