Intelligent pathfinding system using RRT* algorithms enhanced with Q-learning.
An intelligent pathfinding system that combines RRT (Rapidly-exploring Random Tree Star)* algorithms with Q-learning reinforcement learning to navigate through maze environments. The agent learns optimal sampling strategies through experience, evolving from random exploration to intelligent goal-directed movement.
- Hybrid Algorithm: Combines traditional RRT* pathfinding with Q-learning reinforcement learning
- Adaptive Learning: Agent learns from experience across multiple episodes with epsilon-decay exploration
- Multiple Sampling Strategies:
- Random sampling for exploration
- Goal-biased sampling for directed progress
- Exploration-focused sampling for unvisited areas
- Path-optimised sampling near promising routes
- Real-time Visualisation: Live turtle graphics animation showing tree growth and pathfinding process
- Model Persistence: Save and load trained Q-tables for continuous learning
- Path Optimisation: RRT* rewiring for shorter, more efficient routes
- Performance Tracking: Training statistics including success rates and convergence metrics
When you run the program, you'll be presented with three options:
Choose an option:
1. Train new RRT* model (100 episodes)
2. Load existing RRT* model and demonstrate
3. Train new RRT* model and demonstrate
- Trains the Q-learning agent for 100 episodes
- Displays training progress with success rates and rewards
- Saves the trained model as
rrt_star_q_model.pkl
- Loads a pre-trained model (if available)
- Visualises the pathfinding process with turtle graphics
- Shows the learned optimal path in orange
- Tree Growth: Builds a tree of connected nodes from start toward goal
- Rewiring: Continuously optimises paths for shorter routes
- Collision Detection: Ensures paths avoid obstacles
- State Representation: Distance to goal, direction, and nearest node proximity
- Action Space: Four sampling strategies (random, goal-biased, exploration, path-optimized)
- Reward System:
- Progress toward goal: +10 × distance_improvement
- Exploration bonus: +1 for distant nodes
- Time penalty: -1 per iteration
- Success reward: +10 for reaching goal
- Exploration Phase: High epsilon (ε=0.3) for random action selection
- Exploitation Phase: Gradual epsilon decay (0.995) toward greedy policy
- Experience Retention: Q-table persistence across training sessions
- Implements Q-learning with epsilon-greedy policy
- State feature extraction and action selection
- Q-value updates using temporal difference learning
- 20×20 grid environment with walls and free spaces
- Turtle graphics visualisation
- Collision detection and coordinate transformations
- Enhanced RRT* with Q-learning integration
- Adaptive sampling based on learned policies
- Tree rewiring for path optimisation
- Training orchestration and statistics tracking
- Model persistence (save/load functionality)
- Demonstration mode with visualisation
- Green: Start position
- Red: Goal position
- Black: Walls/obstacles
- Blue: RRT* tree edges
- Yellow: Current path to goal
- Orange: Final learned optimal path
Modify the TILES array to create custom maze layouts:
TILES = [
0, 0, 0, 0, 0, # 0 = wall
0, 1, 1, 1, 0, # 1 = free space
0, 1, 0, 1, 0,
0, 1, 1, 1, 0,
0, 0, 0, 0, 0,
]Adjust learning parameters in QLearningAgent:
learning_rate: Q-value update rate (default: 0.1)discount_factor: Future reward importance (default: 0.95)epsilon: Exploration rate (default: 0.3)epsilon_decay: Exploration decay (default: 0.995)
Modify RRT* behavior in SmartRRTStar:
search_radius: Rewiring neighborhood size (default: 3)max_iterations: Maximum tree growth steps (default: 1000)step_size: Node expansion distance (default: 1)
- Large mazes may require increased
max_iterations - Turtle graphics can be slow for real-time visualisation
- Model convergence depends on maze complexity
- RRT* Algorithm - Karaman & Frazzoli
- Q-Learning - Watkins & Dayan
- Reinforcement Learning: An Introduction - Sutton & Barto
This project is licensed under the MIT License.
- Multi-goal pathfinding
- Dynamic obstacle environments
- Deep Q-Networks (DQN) implementation
- 3D maze environments
- Comparative algorithm benchmarks
- Web-based visualization interface