TensorTrade Tutorials

Welcome to TensorTrade! This curriculum will teach you to build RL trading agents.

The Discovery

We trained RL agents to trade BTC/USD and discovered:

Experiment	Test P&L	vs Buy-and-Hold
Agent (0% commission)	+$239	+$594
Agent (0.1% commission)	-$650	-$295

The agent CAN predict direction. The challenge is overtrading.

Learning Paths

Path 1: Quick Start (30 minutes)

Get something working fast:

Three Pillars - Understand the domains
Your First Run - Run train_simple.py
First Training - Train a real agent

Path 2: Full Curriculum (1-2 days)

Comprehensive understanding:

Module 1: Foundations
├── 01-three-pillars.md      # RL + Trading + Data
├── 02-architecture.md       # How components work
└── 03-your-first-run.md     # Run and understand output

Module 2: Domain Knowledge (choose your track)
├── Track A: Trading for RL People
│   ├── 01-trading-basics.md
│   └── 02-oms-deep-dive.md
├── Track B: RL for Traders
│   ├── 01-rl-fundamentals.md
│   └── 02-common-failures.md  ← CRITICAL
└── Track C: Full Introduction
    └── README.md

Module 3: Core Components
├── 01-action-schemes.md     # BSH explained
├── 02-reward-schemes.md     # Why PBR works
└── 03-observers-feeds.md    # Feature engineering

Module 4: Training
├── 01-first-training.md     # Train with Ray RLlib
├── 02-ray-rllib.md          # Configuration deep dive
└── 03-optuna.md             # Hyperparameter optimization

Module 5: Advanced
├── 01-overfitting.md        # Detection and prevention
├── 02-commission.md         # THE breakthrough finding
└── 03-walk-forward.md       # Proper validation

Where to Start?

Want to Jump In?

Go directly to First Training

Critical Reading

Before you invest serious time, read these:

Common Failures - What destroys RL trading agents
Commission Analysis - Our breakthrough discovery
Overfitting - The default failure mode

Quick Reference

Training Scripts

Script	Purpose
`train_simple.py`	Demo with wallet balances
`train_ray_long.py`	Distributed training
`train_optuna.py`	Hyperparameter optimization
`train_best.py`	Best configuration

Key Components

Component	Default	Purpose
ActionScheme	BSH	Convert actions to trades
RewardScheme	PBR	Learning signal
Observer	TensorTrade	Create observations

Best Hyperparameters

{
    "lr": 3.29e-05,
    "gamma": 0.992,
    "entropy": 0.015,
    "clip": 0.123,
    "hidden": [128, 128],
}

Architecture Overview

┌─────────────────────────────────────────────────────────────────┐
│                        TradingEnv                               │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │                    Episode Loop                           │  │
│  │                                                           │  │
│  │   ┌─────────┐    ┌──────────┐    ┌────────────────┐     │  │
│  │   │Observer │───>│  Agent   │───>│  ActionScheme  │     │  │
│  │   │(features)    │(RL model)│    │  (BSH/Orders)  │     │  │
│  │   └─────────┘    └──────────┘    └───────┬────────┘     │  │
│  │        ^                                  │              │  │
│  │        │         ┌──────────┐            v              │  │
│  │        │         │ Reward   │<───── Portfolio           │  │
│  │        │         │ Scheme   │       (Wallets)           │  │
│  │        │         │ (PBR)    │                           │  │
│  │        │         └────┬─────┘                           │  │
│  │        └──────────────┘                                  │  │
│  └──────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘

Success Metrics

After completing this curriculum, you should be able to:

Level	Time	Capability
1	5 min	Run code and see a trading agent
2	30 min	Understand the core architecture
3	2 hours	Modify components and run experiments
4	1 day	Train a real agent and understand results
5	1 week	Build custom components, avoid pitfalls

Additional Resources

EXPERIMENTS.md - Full research log
API Documentation - Reference docs
Discord - Community support

Contributing

TensorTrade needs help with:

Reduce overtrading - The agent trades too frequently
Position sizing - Replace binary BSH with continuous actions
Commission-aware rewards - Include fees in learning signal

See CONTRIBUTING.md for guidelines.

Happy trading!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TensorTrade Tutorials

The Discovery

Learning Paths

Path 1: Quick Start (30 minutes)

Path 2: Full Curriculum (1-2 days)

Where to Start?

Coming from RL?

Coming from Trading/Quant?

New to Both?

Want to Jump In?

Critical Reading

Quick Reference

Training Scripts

Key Components

Best Hyperparameters

Architecture Overview

Success Metrics

Additional Resources

Contributing

FilesExpand file tree

index.md

Latest commit

History

index.md

File metadata and controls

TensorTrade Tutorials

The Discovery

Learning Paths

Path 1: Quick Start (30 minutes)

Path 2: Full Curriculum (1-2 days)

Where to Start?

Coming from RL?

Coming from Trading/Quant?

New to Both?

Want to Jump In?

Critical Reading

Quick Reference

Training Scripts

Key Components

Best Hyperparameters

Architecture Overview

Success Metrics

Additional Resources

Contributing