Welcome to TensorTrade! This curriculum will teach you to build RL trading agents.
We trained RL agents to trade BTC/USD and discovered:
| Experiment | Test P&L | vs Buy-and-Hold |
|---|---|---|
| Agent (0% commission) | +$239 | +$594 |
| Agent (0.1% commission) | -$650 | -$295 |
The agent CAN predict direction. The challenge is overtrading.
Get something working fast:
- Three Pillars - Understand the domains
- Your First Run - Run
train_simple.py - First Training - Train a real agent
Comprehensive understanding:
Module 1: Foundations
├── 01-three-pillars.md # RL + Trading + Data
├── 02-architecture.md # How components work
└── 03-your-first-run.md # Run and understand output
Module 2: Domain Knowledge (choose your track)
├── Track A: Trading for RL People
│ ├── 01-trading-basics.md
│ └── 02-oms-deep-dive.md
├── Track B: RL for Traders
│ ├── 01-rl-fundamentals.md
│ └── 02-common-failures.md ← CRITICAL
└── Track C: Full Introduction
└── README.md
Module 3: Core Components
├── 01-action-schemes.md # BSH explained
├── 02-reward-schemes.md # Why PBR works
└── 03-observers-feeds.md # Feature engineering
Module 4: Training
├── 01-first-training.md # Train with Ray RLlib
├── 02-ray-rllib.md # Configuration deep dive
└── 03-optuna.md # Hyperparameter optimization
Module 5: Advanced
├── 01-overfitting.md # Detection and prevention
├── 02-commission.md # THE breakthrough finding
└── 03-walk-forward.md # Proper validation
Start with Trading Basics
Start with RL Fundamentals
Start with Full Introduction
Go directly to First Training
Before you invest serious time, read these:
- Common Failures - What destroys RL trading agents
- Commission Analysis - Our breakthrough discovery
- Overfitting - The default failure mode
| Script | Purpose |
|---|---|
train_simple.py |
Demo with wallet balances |
train_ray_long.py |
Distributed training |
train_optuna.py |
Hyperparameter optimization |
train_best.py |
Best configuration |
| Component | Default | Purpose |
|---|---|---|
| ActionScheme | BSH | Convert actions to trades |
| RewardScheme | PBR | Learning signal |
| Observer | TensorTrade | Create observations |
{
"lr": 3.29e-05,
"gamma": 0.992,
"entropy": 0.015,
"clip": 0.123,
"hidden": [128, 128],
}┌─────────────────────────────────────────────────────────────────┐
│ TradingEnv │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Episode Loop │ │
│ │ │ │
│ │ ┌─────────┐ ┌──────────┐ ┌────────────────┐ │ │
│ │ │Observer │───>│ Agent │───>│ ActionScheme │ │ │
│ │ │(features) │(RL model)│ │ (BSH/Orders) │ │ │
│ │ └─────────┘ └──────────┘ └───────┬────────┘ │ │
│ │ ^ │ │ │
│ │ │ ┌──────────┐ v │ │
│ │ │ │ Reward │<───── Portfolio │ │
│ │ │ │ Scheme │ (Wallets) │ │
│ │ │ │ (PBR) │ │ │
│ │ │ └────┬─────┘ │ │
│ │ └──────────────┘ │ │
│ └──────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
After completing this curriculum, you should be able to:
| Level | Time | Capability |
|---|---|---|
| 1 | 5 min | Run code and see a trading agent |
| 2 | 30 min | Understand the core architecture |
| 3 | 2 hours | Modify components and run experiments |
| 4 | 1 day | Train a real agent and understand results |
| 5 | 1 week | Build custom components, avoid pitfalls |
- EXPERIMENTS.md - Full research log
- API Documentation - Reference docs
- Discord - Community support
TensorTrade needs help with:
- Reduce overtrading - The agent trades too frequently
- Position sizing - Replace binary BSH with continuous actions
- Commission-aware rewards - Include fees in learning signal
See CONTRIBUTING.md for guidelines.
Happy trading!