S3 deduplication proxy server with Filetracker protocol compatibility.
s3dedup
is an S3 proxy layer that adds content-based deduplication capabilities while maintaining backwards compatibility with the Filetracker protocol (v2). Files with identical content are stored only once in S3, reducing storage costs and improving efficiency.
- Content Deduplication: Files are stored by SHA256 hash, identical content is stored only once
- Filetracker Compatible: Drop-in replacement for legacy Filetracker servers
- Pluggable Storage: Support for SQLite and PostgreSQL metadata storage
- Migration Support: Offline and live migration from old Filetracker instances
- Auto Cleanup: Background cleaner removes unreferenced S3 objects
- Multi-bucket: Run multiple independent buckets on different ports
Pull the image from GitHub Container Registry:
docker pull ghcr.io/sio2project/s3dedup:latest
Run with environment variables:
docker run -d \
--name s3dedup \
-p 8080:8080 \
-v s3dedup-data:/app/data \
-e S3_ENDPOINT=http://minio:9000 \
-e S3_ACCESS_KEY=minioadmin \
-e S3_SECRET_KEY=minioadmin \
ghcr.io/sio2project/s3dedup:latest
Or use an environment file:
# Copy and customize .env.example
cp .env.example .env
# Run with env file
docker run -d \
--name s3dedup \
-p 8080:8080 \
-v s3dedup-data:/app/data \
--env-file .env \
ghcr.io/sio2project/s3dedup:latest
Variable | Default | Description |
---|---|---|
LOG_LEVEL |
info |
Logging level (trace, debug, info, warn, error) |
LOG_JSON |
false |
Enable JSON logging |
BUCKET_NAME |
default |
Bucket name identifier |
LISTEN_ADDRESS |
0.0.0.0 |
Server bind address |
LISTEN_PORT |
8080 |
Server port |
KVSTORAGE_TYPE |
sqlite |
KV storage backend (sqlite, postgres) |
SQLITE_PATH |
/app/data/kv.db |
SQLite database path |
SQLITE_MAX_CONNECTIONS |
10 |
SQLite connection pool size |
S3_ENDPOINT |
required | S3/MinIO endpoint URL |
S3_ACCESS_KEY |
required | S3 access key |
S3_SECRET_KEY |
required | S3 secret key |
S3_FORCE_PATH_STYLE |
true |
Use path-style S3 URLs |
CLEANER_ENABLED |
true |
Enable background cleaner |
CLEANER_INTERVAL |
3600 |
Cleaner run interval (seconds) |
CLEANER_BATCH_SIZE |
1000 |
Cleaner batch size |
CLEANER_MAX_DELETES |
10000 |
Max deletions per cleaner run |
FILETRACKER_URL |
- | Old Filetracker URL for live migration |
For PostgreSQL, use:
KVSTORAGE_TYPE=postgres
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_USER=postgres
POSTGRES_PASSWORD=password
POSTGRES_DB=s3dedup
POSTGRES_MAX_CONNECTIONS=10
Alternatively, use a JSON config file:
docker run -d \
-p 8080:8080 \
-v $(pwd)/config.json:/app/config.json \
-v s3dedup-data:/app/data \
ghcr.io/sio2project/s3dedup:latest \
server --config /app/config.json
Environment variables override config file values.
📖 Complete Migration Guide: See docs/migration.md for comprehensive migration instructions from Filetracker v2.1+
Note: Migration from Filetracker v1.x will be supported in a future release.
Migrate all files from old Filetracker while the proxy is offline:
docker run --rm \
--env-file .env \
-v s3dedup-data:/app/data \
ghcr.io/sio2project/s3dedup:latest \
migrate --env \
--filetracker-url http://old-filetracker:8000 \
--max-concurrency 10
Run the proxy while migrating in the background:
# Set FILETRACKER_URL in your .env file
echo "FILETRACKER_URL=http://old-filetracker:8000" >> .env
# Start in live migration mode
docker run -d \
--name s3dedup \
-p 8080:8080 \
-v s3dedup-data:/app/data \
--env-file .env \
ghcr.io/sio2project/s3dedup:latest \
live-migrate --env --max-concurrency 10
During live migration:
- GET: Falls back to old Filetracker if file not found, migrates on-the-fly
- PUT: Writes to both s3dedup and old Filetracker
- DELETE: Deletes from both systems
For detailed migration strategies, performance tuning, troubleshooting, and rollback procedures, see the Migration Guide.
Compatible with Filetracker protocol v2:
GET /ft/version
- Get protocol versionGET /ft/list/{path}
- List filesGET /ft/files/{path}
- Download fileHEAD /ft/files/{path}
- Get file metadataPUT /ft/files/{path}
- Upload fileDELETE /ft/files/{path}
- Delete file
# Build binary
cargo build --release
# Build Docker image
docker build -t s3dedup:1.0.0-dev .
# Run tests
cargo test
# Run with Docker Compose (includes MinIO)
docker-compose up
# Run locally
cargo run -- server --config config.json
- API Layer: Axum-based HTTP server with Filetracker routes
- Deduplication: SHA256-based content addressing
- Storage Backend: S3-compatible object storage (MinIO, AWS S3, etc.)
- Metadata Store: SQLite or PostgreSQL for file metadata and reference counts
- Lock Manager: In-memory file-level locks for concurrent operations
- Cleaner: Background worker that removes unreferenced S3 objects
For detailed architecture documentation, see docs/deduplication.md.
- Migration Guide - Migrating from Filetracker v2.1+ (offline and live migration strategies)
- Deduplication Architecture - How content-based deduplication works, data flows, and performance characteristics
See LICENSE file for details.