The Q-Learning GridWorld Simulator by Fahmi Zainal is an interactive web application that demonstrates the fundamentals of reinforcement learning through a visual and intuitive interface. This project implements a Q-learning agent that learns to navigate through a grid environment with obstacles to reach a goal state. Users can modify learning parameters, observe the training process in real-time, and see how different settings affect the agent's learning capabilities. It's a perfect educational tool for understanding the core concepts of reinforcement learning.
- 🎯 Objectives
- 🔧 Technologies Used
- 📝 Directory Structure
- ⚙️ Environment Setup
- 🧠 Q-Learning Algorithm
- 🔍 Features
- 🖥️ Interface Components
- 💡 Parameter Optimization
- 📊 Visualization Components
- 🔄 Project Workflow
- 🚀 Running the Application
- 🌐 Deployment Options
- 🔮 Future Enhancements
- 🎉 Conclusion
- 📚 References
- 📜 License
- 🎓 Educational Tool: Provide an accessible way to understand reinforcement learning concepts
- 🧪 Experimentation Platform: Allow users to observe how different parameters affect learning
- 👁️ Visualization: Create intuitive visualizations of the Q-learning process
- 🔬 Interactive Learning: Enable users to interact with and modify the learning environment
- 📱 Accessibility: Make reinforcement learning concepts accessible through a web interface
This project leverages several key technologies:
- Python: Core programming language for the implementation
- NumPy: For efficient numerical operations and array manipulation
- Matplotlib: For creating visualization components and plots
- Gradio: For building the interactive web interface
- HuggingFace Spaces: For hosting the deployed application
.
├── LICENSE # Fahmi Zainal Custom License information
├── README.md # Project documentation
├── rl_gradio.py # Main application file with Gradio interface
├── requirements.txt # Project dependencies
├── .github # GitHub configuration
│ └── workflows
│ └── huggingface-space-sync.yml # Automatic deployment workflow
├── examples # Example screenshots and animations
│ ├── training_visualization.gif # Training process animation
│ ├── interface_components.png # UI component overview
│ └── parameter_effects.png # Visual comparison of parameters
└── space.yml # HuggingFace Spaces configuration
-
Access the project via HuggingFace:
# Note: This is a proprietary project by Fahmi Zainal # Please contact the owner for access to the repository # Visit: https://huggingface.co/spaces/fahmizainal17/Q-Learning_GridWorld_Simulator
-
Set up a virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Run the application:
python rl_gradio.py
The application is configured for easy deployment to HuggingFace Spaces:
-
Requirements file includes:
gradio>=4.0.0 matplotlib>=3.5.0 numpy>=1.20.0 -
Space configuration (space.yml):
title: Q-Learning GridWorld Simulator emoji: 🤖 colorFrom: blue colorTo: green sdk: gradio sdk_version: 4.0.0 app_file: rl_gradio.py pinned: false author: fahmizainal17
The core of this project is the Q-Learning algorithm, a model-free reinforcement learning technique that learns the value of actions in states through trial and error.
- Q-Table: A matrix that stores expected rewards for each state-action pair
- Exploration vs. Exploitation: Balancing random actions vs. using current knowledge
- Reward Function: Positive reward at goal, negative at obstacles
- Update Rule: Q(s,a) ← Q(s,a) + α[r + γ·max Q(s',a') - Q(s,a)]
| Parameter | Description | Typical Range |
|---|---|---|
| Learning Rate (α) | How quickly new information overrides old information | 0.01 - 0.5 |
| Discount Factor (γ) | How much future rewards are valued | 0.8 - 0.99 |
| Exploration Rate (ε) | Probability of taking a random action | 0.1 - 1.0 |
| Exploration Decay | Rate at which exploration decreases | 0.9 - 0.999 |
Initialize Q-table with zeros
For each episode:
Reset environment to starting state
While not terminal state:
With probability ε, select random action
Otherwise, select action with highest Q-value
Take action, observe reward and next state
Update Q-table using: Q(s,a) ← Q(s,a) + α[r + γ·max Q(s',a') - Q(s,a)]
Move to next state
Reduce exploration rate by decay factor
- Adjustable grid size (3×3 up to 8×8)
- Customizable obstacle placement
- Visual representation of the grid world
- Real-time modification of learning parameters
- Immediate feedback on parameter changes
- Preset configurations for quick experimentation
- Real-time updates of the Q-table during training
- Visual representation of the agent's policy
- Heatmap of state visitation frequency
- Reward history tracking
- Exploration rate visualization
- Episode completion statistics
- Test mode to evaluate learned policies
- Path visualization and analysis
- Performance comparison tools
- Interactive explanations of reinforcement learning concepts
- Step-by-step visualization of the learning process
- Comparative analysis of different parameter settings
The Gradio interface is divided into three main tabs:
- Grid size selection controls
- Environment visualization
- Environment information display
- Learning parameter sliders
- Learning Rate (α)
- Discount Factor (γ)
- Exploration Rate (ε)
- Exploration Decay
- Episode count selection
- Training button
- Training visualizations
- Environment state
- Visit heatmap
- Q-value visualization
- Training metrics
- Reward chart
- Exploration rate chart
- Training log display
- Test execution button
- Path visualization
- Performance metrics display
- Path analysis tools
| Parameter | Low Value Effect | High Value Effect |
|---|---|---|
| Learning Rate (α) | Slow, stable learning | Fast, potentially unstable learning |
| Discount Factor (γ) | Focus on immediate rewards | Value future rewards more |
| Exploration Rate (ε) | Limited exploration | Extensive exploration |
| Exploration Decay | Quick transition to exploitation | Extended exploration phase |
-
Balanced Learning (Default):
- Learning Rate: 0.1
- Discount Factor: 0.9
- Exploration Rate: 1.0
- Exploration Decay: 0.995
-
Fast Learning:
- Learning Rate: 0.3
- Discount Factor: 0.8
- Exploration Rate: 1.0
- Exploration Decay: 0.95
-
Thorough Exploration:
- Learning Rate: 0.05
- Discount Factor: 0.95
- Exploration Rate: 1.0
- Exploration Decay: 0.998
- Shows the current state of the environment
- Highlights agent position, obstacles, and goal
- Displays learned policy with directional arrows
- Color-coded visualization of state visit frequency
- Helps identify exploration patterns
- Reveals the agent's learned paths
- Displays learned Q-values as arrows with varying sizes
- Shows the relative value of different actions in each state
- Provides insight into the agent's decision-making process
- Reward per episode trend line
- Exploration rate decay visualization
- Convergence analysis tools
-
Environment Design:
- Implement GridWorld class with customizable parameters
- Define state transitions and reward structure
- Create visualization utilities
-
Agent Implementation:
- Develop Q-learning algorithm
- Implement exploration strategies
- Build tracking mechanisms for training metrics
-
UI Development:
- Design the Gradio interface layout
- Implement interactive components
- Create dynamic visualizations
-
Integration and Testing:
- Connect the backend reinforcement learning components with the UI
- Test with various parameter configurations
- Optimize performance and usability
-
Deployment:
- Package the application for deployment
- Configure HuggingFace Spaces integration
- Set up GitHub Actions for automated updates
# Install requirements
pip install -r requirements.txt
# Run the application
python rl_gradio.pyThis will start the Gradio server locally, typically accessible at http://127.0.0.1:7860.
-
Environment Setup:
- Set your desired grid size
- Click "Setup Environment" to initialize
-
Training:
- Adjust learning parameters as needed
- Set the number of episodes
- Click "Train Agent" to begin training
- Observe the visualizations as training progresses
-
Testing:
- After training, switch to the "Test Agent" tab
- Click "Test Trained Agent" to see how it performs
- Analyze the path taken and performance metrics
The application is configured for easy deployment to HuggingFace Spaces:
-
Create a HuggingFace account at https://huggingface.co/join
-
Install the HuggingFace CLI:
pip install huggingface_hub
-
Contact the project owner for deployment instructions:
# This project is owned by Fahmi Zainal # Please contact the owner for proper deployment instructions # The project is already deployed at: # https://huggingface.co/spaces/fahmizainal17/Q-Learning_GridWorld_Simulator
-
For authorized collaborators only:
- Request proper access credentials from Fahmi Zainal
- Follow the proprietary deployment guidelines provided by the owner
The application can also be deployed to:
- Streamlit Cloud: With minor modifications to use Streamlit instead of Gradio
- Heroku: Using a Procfile to specify the web process
- Docker: By containerizing the application for consistent deployment
-
Additional Algorithms:
- SARSA implementation
- Deep Q-Network (DQN) integration
- Policy Gradient methods
-
Enhanced Environments:
- Continuous state spaces
- Stochastic environments
- Multi-agent scenarios
-
Advanced Visualizations:
- 3D environment representation
- Animation of learning progress over time
- Interactive policy exploration
-
Educational Enhancements:
- Step-by-step algorithm explanations
- Interactive tutorials
- Challenge scenarios with specific learning objectives
-
Performance Optimizations:
- Faster training algorithms
- Parallel processing options
- Pre-computed examples for instant demonstration
The Q-Learning GridWorld Simulator developed by Fahmi Zainal provides an accessible and interactive platform for exploring reinforcement learning concepts. By visualizing the Q-learning process and allowing real-time parameter adjustments, it bridges the gap between theoretical understanding and practical implementation of reinforcement learning algorithms.
The project demonstrates how agents can learn optimal policies through trial and error, showcasing the power of Q-learning in a simple yet instructive environment. As an educational tool, it offers intuitive insights into the mechanics of reinforcement learning, making complex concepts more approachable for students, researchers, and AI enthusiasts.
This project represents Fahmi Zainal's work in the field of reinforcement learning visualization and is protected under a custom license that prohibits unauthorized use or distribution.
- Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An Introduction. MIT Press.
- OpenAI. (2018). Spinning Up in Deep RL.
- Mnih, V., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529-533.
- NumPy Documentation
- Matplotlib Documentation
- Gradio Documentation
Fahmi Zainal Custom License
Copyright (c) 2025 Fahmi Zainal
Unauthorized copying, distribution, or modification of this project is prohibited. This project and its source code are the intellectual property of Fahmi Zainal. This is not free to copy or distribute. For inquiries about usage, licensing, or collaboration, contact the project owner.
All rights reserved.