Skip to content

SFTP binding broken with Axway MFT - connection poisoning on first error #4078

@kayaj1009

Description

@kayaj1009

Description

The SFTP binding (added in v1.15) has critical connection lifecycle management issues when used with Axway MFT (SecureTransport). Any operation that encounters an error permanently corrupts the SFTP connection, causing ALL subsequent operations to fail until Dapr restart.

This makes the binding unsuitable for production use with enterprise SFTP servers.


Expected Behavior

  • Multiple file operations (upload, download, list) should work sequentially within the same Dapr session
  • Connection errors should be handled gracefully with automatic connection cleanup and recreation
  • Failed operations should not prevent subsequent operations from succeeding
  • The SFTP binding should work consistently with both OpenSSH and enterprise SFTP servers like Axway MFT

Actual Behavior

Critical Pattern: Connection Poisoning

Any operation that encounters an error permanently corrupts the SFTP connection:

  1. First operation that errors (upload, download) → Connection becomes poisoned
  2. All subsequent operations fail (including read-only operations like list)
  3. Only fix: Restart Dapr to get a fresh connection

NOT "second operation fails" - It's "first error poisons connection"


Detailed Test Scenarios

Scenario 1: Upload Triggers Connection Poisoning

Upload 1:     Success
Upload 2:     FAILS - mkdir upload/: not a directory
[Connection now poisoned]
List:         FAILS - permission denied
Get:          FAILS - permission denied
Upload 3:     FAILS - permission denied
Result: All operations fail until Dapr restart

Scenario 2: List Works Indefinitely (Proof Connection Can Be Stable)

List 1:       Success
List 2:       Success
List 3:       Success
...
List 100:     Success
Result: Can repeat indefinitely without failures

Scenario 3: Get (Download) Triggers Connection Poisoning

Setup: 5 files exist in upload directory
Get File 1:   Success
Get File 2:   FAILS - permission denied or path error
[Connection now poisoned]
Get File 3:   FAILS - permission denied
Get File 4:   FAILS - permission denied
Get File 5:   FAILS - permission denied
List:         FAILS - permission denied
Upload:       FAILS - permission denied
Result: All operations fail until Dapr restart

Steps to Reproduce

Environment Setup

SFTP Binding Configuration:

apiVersion: dapr.io/v1alpha1
kind: Component
metadata:
  name: mft-sftp-dev
spec:
  type: bindings.sftp
  version: v1
  metadata:
    - name: rootPath
      value: "upload"
    - name: address
      value: "10.67.0.238:2222"
    - name: username
      value: "opfedwtsftp"
    - name: privateKey
      value: |
        -----BEGIN OPENSSH PRIVATE KEY-----
        [key content]
        -----END OPENSSH PRIVATE KEY-----
    - name: insecureIgnoreHostKey
      value: "true"
  scopes:
    - mft-integration-service

SFTP Server:

  • Type: Axway MFT (SecureTransport)
  • Directory upload/ exists with permissions 755 (drwxr-xr-x)

Test Case 1: Upload Poisoning

First upload - SUCCESS:

curl -X POST http://localhost:3500/v1.0/bindings/mft-sftp-dev \
  -H "Content-Type: application/json" \
  -d '{
        "operation": "create",
        "data": "test content",
        "metadata": {
          "fileName": "test1.yaml"
        }
      }'

Response: {"fileName":"test1.yaml"}

Status: Success


Second upload - FAILURE (Triggers Poisoning):

curl -X POST http://localhost:3500/v1.0/bindings/mft-sftp-dev \
  -H "Content-Type: application/json" \
  -d '{
        "operation": "create",
        "data": "test content",
        "metadata": {
          "fileName": "test2.yaml"
        }
      }'

Response:

{
  "errorCode": "ERR_INVOKE_OUTPUT_BINDING",
  "message": "error invoking output binding mft-sftp-dev: sftp binding error: error create dir upload/: mkdir upload/: not a directory"
}

Status: Failed - Connection now poisoned

Dapr Log:

time="2025-10-28T15:09:41.549751+08:00" level=debug msg="api error: code = OK desc = error invoking output binding mft-sftp-dev: sftp binding error: error create dir upload/: mkdir upload/: not a directory"

Subsequent list - FAILS (Poisoned Connection):

curl -d '{"operation": "list"}' \
  http://localhost:3500/v1.0/bindings/mft-sftp-dev

Response:

{
  "errorCode": "ERR_INVOKE_OUTPUT_BINDING",
  "message": "error invoking output binding mft-sftp-dev: sftp binding error: error read dir upload: permission denied"
}

Status: Failed - Even read-only operation fails due to poisoned connection


Test Case 2: List Operations Work Indefinitely

Without any errors, list operations succeed repeatedly:

# Test 50 consecutive list operations
for i in {1..50}; do
  curl -d '{"operation": "list"}' http://localhost:3500/v1.0/bindings/mft-sftp-dev
done

Result: All 50 succeed

Conclusion: Proves connection can remain stable when no errors occur.


Test Case 3: Get (Download) Poisoning

After Dapr restart:

First download - SUCCESS:

curl -d '{"operation": "get", "metadata": {"fileName": "test1.yaml"}}' \
  http://localhost:3500/v1.0/bindings/mft-sftp-dev

Response: "test content"

Status: Success


Second download - FAILURE (Triggers Poisoning):

curl -d '{"operation": "get", "metadata": {"fileName": "test1.yaml"}}' \
  http://localhost:3500/v1.0/bindings/mft-sftp-dev

Response:

{
  "errorCode": "ERR_INVOKE_OUTPUT_BINDING",
  "message": "error invoking output binding mft-sftp-dev: sftp binding error: error open file /upload/test1.yaml: permission denied"
}

Status: Failed - Connection now poisoned

Note: Error shows absolute path /upload/test1.yaml suggesting path construction issue.


Subsequent operations all fail:

# Try list
curl -d '{"operation": "list"}' http://localhost:3500/v1.0/bindings/mft-sftp-dev
# Result: FAILS - permission denied

# Try upload
curl -X POST ... # FAILS - permission denied

# Try another download
curl -d '{"operation": "get", ...}' # FAILS - permission denied

All operations fail until Dapr restart.


Proof It's NOT an Axway Issue

Manual SFTP Session - All Operations Work

sftp -P 2222 [email protected]

Multiple uploads in same session:

sftp> cd upload
sftp> put test1.yaml
test1.yaml                         100%   11     0.1KB/s   00:00
sftp> put test2.yaml
test2.yaml                         100%   11     0.1KB/s   00:00
sftp> put test3.yaml
test3.yaml                         100%   11     0.1KB/s   00:00

Result: All succeed


Multiple downloads in same session:

sftp> get test1.yaml
test1.yaml                         100%   11     0.3KB/s   00:00
sftp> get test1.yaml
test1.yaml                         100%   11     0.3KB/s   00:00
sftp> get test1.yaml
test1.yaml                         100%   11     0.3KB/s   00:00

Result: All succeed


Multiple list operations in same session:

sftp> ls -l
-rw-r--r-- 1 1283530774 1283530774  11 Oct 27 12:52 test1.yaml
-rw-r--r-- 1 1283530774 1283530774  11 Oct 28 15:30 test2.yaml
sftp> ls -l
-rw-r--r-- 1 1283530774 1283530774  11 Oct 27 12:52 test1.yaml
-rw-r--r-- 1 1283530774 1283530774  11 Oct 28 15:30 test2.yaml

Result: All succeed

Conclusion: Axway MFT handles multiple operations correctly in a single SFTP session. The issue is in Dapr's SFTP binding connection lifecycle management.


Comparison: OpenSSH vs Axway MFT

Local Test with OpenSSH (atmoz/sftp)

Binding config:

metadata:
  - name: rootPath
    value: "upload"
  - name: address
    value: "localhost:2222"
  - name: username
    value: "testuser"
  - name: password
    value: "password"

Test results:

# First upload
curl -X POST http://localhost:3500/v1.0/bindings/local-sftp \
  -d '{"operation": "create", "data": "test", "metadata": {"fileName": "test1.yaml"}}'
# Result: Success

# Second upload
curl -X POST http://localhost:3500/v1.0/bindings/local-sftp \
  -d '{"operation": "create", "data": "test", "metadata": {"fileName": "test2.yaml"}}'
# Result: Success

# Third upload
curl -X POST http://localhost:3500/v1.0/bindings/local-sftp \
  -d '{"operation": "create", "data": "test", "metadata": {"fileName": "test3.yaml"}}'
# Result: Success

Multiple operations work with OpenSSH.


Analysis: Why OpenSSH Works But Axway Fails

Aspect OpenSSH Axway MFT Result
mkdir on existing dir Returns "Failure", continues Returns "not a directory", fails Dapr treats Axway error as fatal
Session management Lenient, stateless Stricter, stateful validation Axway exposes connection bugs
Error codes Standard SFTP errors Enterprise-specific errors Dapr doesn't handle gracefully
Path validation Minimal Strict Absolute paths like /upload/file fail
Error handling Doesn't break connection Breaks Dapr's connection state Connection poisoning occurs

Conclusion: OpenSSH's lenient error handling masks Dapr's connection management bugs. Axway's strict validation exposes them.


Root Cause Analysis

The Real Issue: Broken Connection Lifecycle Management

The SFTP binding in bindings/sftp/sftp.go has fundamentally broken error handling that poisons connections:

  1. First operation that encounters an error (mkdir, permission, path issue)
  2. Error handling does NOT properly clean up the SFTP connection state
  3. Connection becomes poisoned/corrupted but remains in the connection pool
  4. All subsequent operations reuse the poisoned connection
  5. No connection validation, health checks, or recreation occurs
  6. Only fix: Restart Dapr to force new connection

Why List Works But Create/Get Don't

Operation Behavior Triggers Errors? Connection Impact
list Read-only, no mkdir, no path changes No Connection stays clean
create Calls mkdir, writes files Yes Triggers mkdir error → Poisons connection
get Reads specific file, path validation Yes Triggers permission/path error → Poisons connection

Key Insight: List operations succeed indefinitely because they don't trigger the operations that cause errors with Axway MFT's strict validation.


Multiple Bugs Identified in bindings/sftp/sftp.go

1. Connection Poisoning (Critical)

Location: Invoke() method error handling

Problem:

// Current (broken)
func (s *SFTP) Invoke(ctx context.Context, req *bindings.InvokeRequest) (*bindings.InvokeResponse, error) {
    // Performs operation
    // If error occurs, returns error but keeps broken connection
    // No cleanup, no marking connection as unhealthy
}

Fix Needed:

  • Mark connection as unhealthy on error
  • Close poisoned connection
  • Validate connection health before reuse
  • Automatic reconnection on next operation

2. No Connection Validation

Location: Missing from Invoke() method

Problem: No health check before operations
Fix Needed: Add validateConnection() before each operation

3. No Connection Recreation

Location: Missing reconnection logic

Problem: No automatic recovery from errors
Fix Needed: Add reconnect() method

4. Unnecessary mkdir on Every Upload

Location: create() method

Problem:

// Always calls mkdir, even on rootPath
dir := filepath.Dir(fullPath)
s.client.MkdirAll(dir)  // BUG: Calls mkdir on configured rootPath

Fix Needed: Only mkdir for subdirectories, not rootPath

5. No Directory Existence Caching

Location: SFTP struct

Problem: No dirCache field
Fix Needed: Add caching to prevent repeated mkdir

6. Path Construction Issues

Location: get() method

Problem: Constructs absolute paths like /upload/file.yaml
Fix Needed: Use consistent relative paths


Required Code Changes to bindings/sftp/sftp.go

See detailed code changes with line-by-line fixes for sftp.go file.

Summary of Required Changes:

1. Update SFTP Struct

type SFTP struct {
    client      *sftp.Client
    sshClient   *ssh.Client
    metadata    metadata
    logger      logger.Logger
    isHealthy   bool              // NEW: Track connection health
    dirCache    map[string]bool   // NEW: Cache verified directories
    mu          sync.RWMutex      // NEW: Protect concurrent access
}

2. Add Connection Validation

func (s *SFTP) validateConnection(ctx context.Context) error {
    // Check if connection is healthy
    // Perform lightweight health check
    // Reconnect if needed
}

3. Add Reconnection Logic

func (s *SFTP) reconnect(ctx context.Context) error {
    // Close poisoned connections
    // Create fresh SSH session
    // Create fresh SFTP client
    // Clear directory cache
}

4. Update Invoke Method

func (s *SFTP) Invoke(...) {
    // BEFORE operation: Validate connection
    if err := s.validateConnection(ctx); err != nil {
        return nil, err
    }
    
    // Perform operation
    
    // AFTER error: Mark unhealthy
    if err != nil {
        s.isHealthy = false
    }
}

5. Fix Directory Creation

func (s *SFTP) create(...) {
    // Only mkdir for subdirectories
    relDir := filepath.Dir(fileName)
    if relDir != "." && relDir != "/" {
        s.ensureDirectory(targetDir)  // With caching
    }
}

6. Add Keepalive Support

func (s *SFTP) createSSHClient(...) {
    // Enable SSH keepalive
    go sendKeepAlive(client)
}

Required New Metadata Configuration

metadata:
  # Connection management
  - name: keepAliveInterval
    value: "300"  # Send keepalive every 5 minutes (seconds)
  - name: keepAliveTimeout
    value: "10"   # Timeout for keepalive response (seconds)
  - name: connectionTimeout
    value: "30"   # Connection establishment timeout (seconds)

Impact

Severity: CRITICAL

Justification:

  • SFTP binding is completely unusable with Axway MFT
  • ANY error permanently breaks the binding until Dapr restart
  • No configuration-based workaround available
  • Affects production file transfer scenarios
  • Connection poisoning means unreliable operation even when most operations would succeed

Affected Users

  • Anyone using Dapr SFTP binding with Axway MFT (widely-used enterprise MFT)
  • Potentially affects other enterprise SFTP servers with strict session management
  • Blocks the adoption of Dapr in enterprise file transfer scenarios
  • Any scenario where occasional errors occur (the norm in production)

Tested Workarounds

Failed Attempts

  • Changed directory permissions (750 to 755): No effect
  • Used empty rootPath: Different error, still fails
  • Used absolute rootPath paths: Same failure
  • Tried various path configurations: All fail
  • Implemented retry logic: Still fails (connection is poisoned)

Only Working Solution

Restart Dapr between operations - Not viable for production


Environment Details

  • Dapr Version: latest (October 2025)
  • SFTP Binding: v1 (introduced in Dapr v1.15, October 2024)
  • Source File: bindings/sftp/sftp.go in release-1.16 branch
  • Operating System: macOS (client), Linux (Axway server)
  • SFTP Servers Tested:
    • OpenSSH (atmoz/sftp Docker image) - Works (lenient error handling masks bugs)
    • Axway MFT (SecureTransport) - Broken (strict validation exposes bugs)

Additional Context

  • The SFTP binding was just added in Dapr v1.15 (October 2024)
  • This is a brand new component with limited production testing
  • Connection lifecycle management was not properly implemented in sftp.go
  • May not have been tested with enterprise SFTP servers or error scenarios
  • This could be the first detailed report of the connection poisoning issue

Summary

Operation 1st Call 2nd Call (if 1st errors) Manual SFTP OpenSSH
create Success Errors + poisons connection Works Works
get Success Errors + poisons connection Works Works
list Success Success (if no prior errors) Works Works
Any op after error N/A Fails (poisoned connection) Works Works

The issue is NOT "second operation fails" or "Axway timeout."

The issue is: Error handling in bindings/sftp/sftp.go leaves connections in a poisoned state, and there's no recovery mechanism.

Proof:

  • List (read-only) works indefinitely → Connection can be stable
  • First error (any type) breaks the connection → Error handling in sftp.go is broken
  • All operations after error fail → Connection is not recreated
  • Only Dapr restart fixes it → No automatic recovery in code

The SFTP binding requires fundamental fixes to connection lifecycle management in bindings/sftp/sftp.go to be usable in production scenarios where errors can occasionally occur.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

Backlog

Relationships

None yet

Development

No branches or pull requests

Issue actions