-
Notifications
You must be signed in to change notification settings - Fork 544
Description
Description
The SFTP binding (added in v1.15) has critical connection lifecycle management issues when used with Axway MFT (SecureTransport). Any operation that encounters an error permanently corrupts the SFTP connection, causing ALL subsequent operations to fail until Dapr restart.
This makes the binding unsuitable for production use with enterprise SFTP servers.
Expected Behavior
- Multiple file operations (upload, download, list) should work sequentially within the same Dapr session
- Connection errors should be handled gracefully with automatic connection cleanup and recreation
- Failed operations should not prevent subsequent operations from succeeding
- The SFTP binding should work consistently with both OpenSSH and enterprise SFTP servers like Axway MFT
Actual Behavior
Critical Pattern: Connection Poisoning
Any operation that encounters an error permanently corrupts the SFTP connection:
- First operation that errors (upload, download) → Connection becomes poisoned
- All subsequent operations fail (including read-only operations like list)
- Only fix: Restart Dapr to get a fresh connection
NOT "second operation fails" - It's "first error poisons connection"
Detailed Test Scenarios
Scenario 1: Upload Triggers Connection Poisoning
Upload 1: Success
Upload 2: FAILS - mkdir upload/: not a directory
[Connection now poisoned]
List: FAILS - permission denied
Get: FAILS - permission denied
Upload 3: FAILS - permission denied
Result: All operations fail until Dapr restart
Scenario 2: List Works Indefinitely (Proof Connection Can Be Stable)
List 1: Success
List 2: Success
List 3: Success
...
List 100: Success
Result: Can repeat indefinitely without failures
Scenario 3: Get (Download) Triggers Connection Poisoning
Setup: 5 files exist in upload directory
Get File 1: Success
Get File 2: FAILS - permission denied or path error
[Connection now poisoned]
Get File 3: FAILS - permission denied
Get File 4: FAILS - permission denied
Get File 5: FAILS - permission denied
List: FAILS - permission denied
Upload: FAILS - permission denied
Result: All operations fail until Dapr restart
Steps to Reproduce
Environment Setup
SFTP Binding Configuration:
apiVersion: dapr.io/v1alpha1
kind: Component
metadata:
name: mft-sftp-dev
spec:
type: bindings.sftp
version: v1
metadata:
- name: rootPath
value: "upload"
- name: address
value: "10.67.0.238:2222"
- name: username
value: "opfedwtsftp"
- name: privateKey
value: |
-----BEGIN OPENSSH PRIVATE KEY-----
[key content]
-----END OPENSSH PRIVATE KEY-----
- name: insecureIgnoreHostKey
value: "true"
scopes:
- mft-integration-serviceSFTP Server:
- Type: Axway MFT (SecureTransport)
- Directory
upload/exists with permissions755(drwxr-xr-x)
Test Case 1: Upload Poisoning
First upload - SUCCESS:
curl -X POST http://localhost:3500/v1.0/bindings/mft-sftp-dev \
-H "Content-Type: application/json" \
-d '{
"operation": "create",
"data": "test content",
"metadata": {
"fileName": "test1.yaml"
}
}'Response: {"fileName":"test1.yaml"}
Status: Success
Second upload - FAILURE (Triggers Poisoning):
curl -X POST http://localhost:3500/v1.0/bindings/mft-sftp-dev \
-H "Content-Type: application/json" \
-d '{
"operation": "create",
"data": "test content",
"metadata": {
"fileName": "test2.yaml"
}
}'Response:
{
"errorCode": "ERR_INVOKE_OUTPUT_BINDING",
"message": "error invoking output binding mft-sftp-dev: sftp binding error: error create dir upload/: mkdir upload/: not a directory"
}Status: Failed - Connection now poisoned
Dapr Log:
time="2025-10-28T15:09:41.549751+08:00" level=debug msg="api error: code = OK desc = error invoking output binding mft-sftp-dev: sftp binding error: error create dir upload/: mkdir upload/: not a directory"
Subsequent list - FAILS (Poisoned Connection):
curl -d '{"operation": "list"}' \
http://localhost:3500/v1.0/bindings/mft-sftp-devResponse:
{
"errorCode": "ERR_INVOKE_OUTPUT_BINDING",
"message": "error invoking output binding mft-sftp-dev: sftp binding error: error read dir upload: permission denied"
}Status: Failed - Even read-only operation fails due to poisoned connection
Test Case 2: List Operations Work Indefinitely
Without any errors, list operations succeed repeatedly:
# Test 50 consecutive list operations
for i in {1..50}; do
curl -d '{"operation": "list"}' http://localhost:3500/v1.0/bindings/mft-sftp-dev
doneResult: All 50 succeed
Conclusion: Proves connection can remain stable when no errors occur.
Test Case 3: Get (Download) Poisoning
After Dapr restart:
First download - SUCCESS:
curl -d '{"operation": "get", "metadata": {"fileName": "test1.yaml"}}' \
http://localhost:3500/v1.0/bindings/mft-sftp-devResponse: "test content"
Status: Success
Second download - FAILURE (Triggers Poisoning):
curl -d '{"operation": "get", "metadata": {"fileName": "test1.yaml"}}' \
http://localhost:3500/v1.0/bindings/mft-sftp-devResponse:
{
"errorCode": "ERR_INVOKE_OUTPUT_BINDING",
"message": "error invoking output binding mft-sftp-dev: sftp binding error: error open file /upload/test1.yaml: permission denied"
}Status: Failed - Connection now poisoned
Note: Error shows absolute path /upload/test1.yaml suggesting path construction issue.
Subsequent operations all fail:
# Try list
curl -d '{"operation": "list"}' http://localhost:3500/v1.0/bindings/mft-sftp-dev
# Result: FAILS - permission denied
# Try upload
curl -X POST ... # FAILS - permission denied
# Try another download
curl -d '{"operation": "get", ...}' # FAILS - permission deniedAll operations fail until Dapr restart.
Proof It's NOT an Axway Issue
Manual SFTP Session - All Operations Work
sftp -P 2222 [email protected]Multiple uploads in same session:
sftp> cd upload
sftp> put test1.yaml
test1.yaml 100% 11 0.1KB/s 00:00
sftp> put test2.yaml
test2.yaml 100% 11 0.1KB/s 00:00
sftp> put test3.yaml
test3.yaml 100% 11 0.1KB/s 00:00
Result: All succeed
Multiple downloads in same session:
sftp> get test1.yaml
test1.yaml 100% 11 0.3KB/s 00:00
sftp> get test1.yaml
test1.yaml 100% 11 0.3KB/s 00:00
sftp> get test1.yaml
test1.yaml 100% 11 0.3KB/s 00:00
Result: All succeed
Multiple list operations in same session:
sftp> ls -l
-rw-r--r-- 1 1283530774 1283530774 11 Oct 27 12:52 test1.yaml
-rw-r--r-- 1 1283530774 1283530774 11 Oct 28 15:30 test2.yaml
sftp> ls -l
-rw-r--r-- 1 1283530774 1283530774 11 Oct 27 12:52 test1.yaml
-rw-r--r-- 1 1283530774 1283530774 11 Oct 28 15:30 test2.yaml
Result: All succeed
Conclusion: Axway MFT handles multiple operations correctly in a single SFTP session. The issue is in Dapr's SFTP binding connection lifecycle management.
Comparison: OpenSSH vs Axway MFT
Local Test with OpenSSH (atmoz/sftp)
Binding config:
metadata:
- name: rootPath
value: "upload"
- name: address
value: "localhost:2222"
- name: username
value: "testuser"
- name: password
value: "password"Test results:
# First upload
curl -X POST http://localhost:3500/v1.0/bindings/local-sftp \
-d '{"operation": "create", "data": "test", "metadata": {"fileName": "test1.yaml"}}'
# Result: Success
# Second upload
curl -X POST http://localhost:3500/v1.0/bindings/local-sftp \
-d '{"operation": "create", "data": "test", "metadata": {"fileName": "test2.yaml"}}'
# Result: Success
# Third upload
curl -X POST http://localhost:3500/v1.0/bindings/local-sftp \
-d '{"operation": "create", "data": "test", "metadata": {"fileName": "test3.yaml"}}'
# Result: SuccessMultiple operations work with OpenSSH.
Analysis: Why OpenSSH Works But Axway Fails
| Aspect | OpenSSH | Axway MFT | Result |
|---|---|---|---|
| mkdir on existing dir | Returns "Failure", continues | Returns "not a directory", fails | Dapr treats Axway error as fatal |
| Session management | Lenient, stateless | Stricter, stateful validation | Axway exposes connection bugs |
| Error codes | Standard SFTP errors | Enterprise-specific errors | Dapr doesn't handle gracefully |
| Path validation | Minimal | Strict | Absolute paths like /upload/file fail |
| Error handling | Doesn't break connection | Breaks Dapr's connection state | Connection poisoning occurs |
Conclusion: OpenSSH's lenient error handling masks Dapr's connection management bugs. Axway's strict validation exposes them.
Root Cause Analysis
The Real Issue: Broken Connection Lifecycle Management
The SFTP binding in bindings/sftp/sftp.go has fundamentally broken error handling that poisons connections:
- First operation that encounters an error (mkdir, permission, path issue)
- Error handling does NOT properly clean up the SFTP connection state
- Connection becomes poisoned/corrupted but remains in the connection pool
- All subsequent operations reuse the poisoned connection
- No connection validation, health checks, or recreation occurs
- Only fix: Restart Dapr to force new connection
Why List Works But Create/Get Don't
| Operation | Behavior | Triggers Errors? | Connection Impact |
|---|---|---|---|
| list | Read-only, no mkdir, no path changes | No | Connection stays clean |
| create | Calls mkdir, writes files | Yes | Triggers mkdir error → Poisons connection |
| get | Reads specific file, path validation | Yes | Triggers permission/path error → Poisons connection |
Key Insight: List operations succeed indefinitely because they don't trigger the operations that cause errors with Axway MFT's strict validation.
Multiple Bugs Identified in bindings/sftp/sftp.go
1. Connection Poisoning (Critical)
Location: Invoke() method error handling
Problem:
// Current (broken)
func (s *SFTP) Invoke(ctx context.Context, req *bindings.InvokeRequest) (*bindings.InvokeResponse, error) {
// Performs operation
// If error occurs, returns error but keeps broken connection
// No cleanup, no marking connection as unhealthy
}Fix Needed:
- Mark connection as unhealthy on error
- Close poisoned connection
- Validate connection health before reuse
- Automatic reconnection on next operation
2. No Connection Validation
Location: Missing from Invoke() method
Problem: No health check before operations
Fix Needed: Add validateConnection() before each operation
3. No Connection Recreation
Location: Missing reconnection logic
Problem: No automatic recovery from errors
Fix Needed: Add reconnect() method
4. Unnecessary mkdir on Every Upload
Location: create() method
Problem:
// Always calls mkdir, even on rootPath
dir := filepath.Dir(fullPath)
s.client.MkdirAll(dir) // BUG: Calls mkdir on configured rootPathFix Needed: Only mkdir for subdirectories, not rootPath
5. No Directory Existence Caching
Location: SFTP struct
Problem: No dirCache field
Fix Needed: Add caching to prevent repeated mkdir
6. Path Construction Issues
Location: get() method
Problem: Constructs absolute paths like /upload/file.yaml
Fix Needed: Use consistent relative paths
Required Code Changes to bindings/sftp/sftp.go
See detailed code changes with line-by-line fixes for sftp.go file.
Summary of Required Changes:
1. Update SFTP Struct
type SFTP struct {
client *sftp.Client
sshClient *ssh.Client
metadata metadata
logger logger.Logger
isHealthy bool // NEW: Track connection health
dirCache map[string]bool // NEW: Cache verified directories
mu sync.RWMutex // NEW: Protect concurrent access
}2. Add Connection Validation
func (s *SFTP) validateConnection(ctx context.Context) error {
// Check if connection is healthy
// Perform lightweight health check
// Reconnect if needed
}3. Add Reconnection Logic
func (s *SFTP) reconnect(ctx context.Context) error {
// Close poisoned connections
// Create fresh SSH session
// Create fresh SFTP client
// Clear directory cache
}4. Update Invoke Method
func (s *SFTP) Invoke(...) {
// BEFORE operation: Validate connection
if err := s.validateConnection(ctx); err != nil {
return nil, err
}
// Perform operation
// AFTER error: Mark unhealthy
if err != nil {
s.isHealthy = false
}
}5. Fix Directory Creation
func (s *SFTP) create(...) {
// Only mkdir for subdirectories
relDir := filepath.Dir(fileName)
if relDir != "." && relDir != "/" {
s.ensureDirectory(targetDir) // With caching
}
}6. Add Keepalive Support
func (s *SFTP) createSSHClient(...) {
// Enable SSH keepalive
go sendKeepAlive(client)
}Required New Metadata Configuration
metadata:
# Connection management
- name: keepAliveInterval
value: "300" # Send keepalive every 5 minutes (seconds)
- name: keepAliveTimeout
value: "10" # Timeout for keepalive response (seconds)
- name: connectionTimeout
value: "30" # Connection establishment timeout (seconds)Impact
Severity: CRITICAL
Justification:
- SFTP binding is completely unusable with Axway MFT
- ANY error permanently breaks the binding until Dapr restart
- No configuration-based workaround available
- Affects production file transfer scenarios
- Connection poisoning means unreliable operation even when most operations would succeed
Affected Users
- Anyone using Dapr SFTP binding with Axway MFT (widely-used enterprise MFT)
- Potentially affects other enterprise SFTP servers with strict session management
- Blocks the adoption of Dapr in enterprise file transfer scenarios
- Any scenario where occasional errors occur (the norm in production)
Tested Workarounds
Failed Attempts
- Changed directory permissions (750 to 755): No effect
- Used empty rootPath: Different error, still fails
- Used absolute rootPath paths: Same failure
- Tried various path configurations: All fail
- Implemented retry logic: Still fails (connection is poisoned)
Only Working Solution
Restart Dapr between operations - Not viable for production
Environment Details
- Dapr Version: latest (October 2025)
- SFTP Binding: v1 (introduced in Dapr v1.15, October 2024)
- Source File:
bindings/sftp/sftp.goin release-1.16 branch - Operating System: macOS (client), Linux (Axway server)
- SFTP Servers Tested:
- OpenSSH (atmoz/sftp Docker image) - Works (lenient error handling masks bugs)
- Axway MFT (SecureTransport) - Broken (strict validation exposes bugs)
Additional Context
- The SFTP binding was just added in Dapr v1.15 (October 2024)
- This is a brand new component with limited production testing
- Connection lifecycle management was not properly implemented in
sftp.go - May not have been tested with enterprise SFTP servers or error scenarios
- This could be the first detailed report of the connection poisoning issue
Summary
| Operation | 1st Call | 2nd Call (if 1st errors) | Manual SFTP | OpenSSH |
|---|---|---|---|---|
| create | Success | Errors + poisons connection | Works | Works |
| get | Success | Errors + poisons connection | Works | Works |
| list | Success | Success (if no prior errors) | Works | Works |
| Any op after error | N/A | Fails (poisoned connection) | Works | Works |
The issue is NOT "second operation fails" or "Axway timeout."
The issue is: Error handling in bindings/sftp/sftp.go leaves connections in a poisoned state, and there's no recovery mechanism.
Proof:
- List (read-only) works indefinitely → Connection can be stable
- First error (any type) breaks the connection → Error handling in sftp.go is broken
- All operations after error fail → Connection is not recreated
- Only Dapr restart fixes it → No automatic recovery in code
The SFTP binding requires fundamental fixes to connection lifecycle management in bindings/sftp/sftp.go to be usable in production scenarios where errors can occasionally occur.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status