A custom Gymnasium environment simulating movement on a frozen lake to collect treasure and return to the starting point. This project implements and tests a Q-learning agent, enhanced with a Curriculum Learning strategy, to effectively navigate and solve the environment. It features a Pygame-based graphical interface for real-time visualization of the agent's actions and learning process.
- Custom Gymnasium Environment:
ArcticDashEnvis a custom-built environment extending Gymnasium's API, complete with a defined observation space, action space, dynamics, and reward system. - Q-learning Agent: A robust Q-learning algorithm is implemented for the agent to learn optimal policies.
- Curriculum Learning: The agent is trained across multiple stages with varying maximum jump limits (
max_jumps), allowing for observation of adaptation to resource constraints and potential knowledge transfer between stages. - Pygame Graphical Interface:
- Real-time visualization of the game board, agent's position, and dynamic ice states.
- Interactive mode with path tracking and an information panel (jumps left, treasure status, FPS, camera modes).
- User control over zoom, camera follow, FPS, and full-screen mode.
- Factored State Representation: The agent's state is composed of its spatial position, remaining jumps, and treasure possession status, mapped to a unique index for the Q-table.
- Dynamic Hyperparameter Scheduling: Epsilon-greedy exploration and learning rate are dynamically adjusted throughout training phases to balance exploration and exploitation effectively.
- Logging and Plotting: Detailed logging of training progress and automatic generation of performance plots (rewards, epsilon, learning rate over episodes).
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
- Python 3.8+
-
Clone the repository:
git clone https://github.com/miskrz0421/arctic-dash-project.git cd ArcticDashEnv -
Install the required Python packages:
It's recommended to use a virtual environment.
python -m venv venv # On Windows: .\venv\Scripts\activate # On macOS/Linux: source venv/bin/activate
Then install the packages:
pip install gymnasium pygame numpy matplotlib pillow
The project is designed to be run within a Jupyter Notebook environment (e.g., Jupyter Lab, Jupyter Notebook, or Google Colab).
This mode allows you to load a pre-trained agent and visualize its performance.
-
Open the Notebook: Launch Jupyter Lab/Notebook and open the
ArcticDash_sprawozdanie.ipynbnotebook. -
Execute Setup Cells: Run all cells from Section 0.
Wstęp i Konfiguracja Środowiska Colab/Jupyterup to the configuration block in Section 5.1.Ustawienia Eksperymentu. -
Configure for Demonstration: In Cell 5 (Configuration), ensure the following parameters are set:
PERFORM_TRAINING_OVERALL = False RENDER_TRAINING_STAGES = False RENDER_EVALUATION_AFTER_EACH_STAGE = True NUM_EVALUATION_EPISODES_PER_STAGE = 1 # 1 is sufficient for interactive mode
- Adjust Jump Limits (Optional): You can change
CURRICULUM_START_JUMPSandCURRICULUM_END_JUMPS(e.g., from 6 down to 1) to observe agents trained with different jump limits.- Important: For evaluation, the corresponding pre-trained Q-table (
q_table_mjX.pkl) for the chosenMAX_JUMPSvalue must exist in theq_learning_results_curriculum_e5/map_e5_mjX/directory.
- Important: For evaluation, the corresponding pre-trained Q-table (
- Adjust Jump Limits (Optional): You can change
-
Run the Demonstration: Execute Cell 6 (
Uruchomienie Procesu Curriculum Learning / Demonstracji). A Pygame window will open, displaying the agent's performance. -
Interactive Mode Controls (in Pygame window):
Rkey: Reset the current episode (only in interactive mode).Ckey: Toggle camera follow mode.+/-keys: Adjust zoom level.PgUp/PgDnkeys: Adjust frames per second (FPS).F11key: Toggle full-screen mode.- Close window: Exit the game.
This mode allows you to train a new Q-learning agent from scratch or continue training.
-
Open the Notebook: Launch Jupyter Lab/Notebook and open the
ArcticDash_sprawozdanie.ipynbnotebook. -
Execute Setup Cells: Run all cells from Section 0.
Wstęp i Konfiguracja Środowiska Colab/Jupyterup to the configuration block in Section 5.1.Ustawienia Eksperymentu. -
Configure for Training: In Cell 5 (Configuration), set:
PERFORM_TRAINING_OVERALL = True RENDER_TRAINING_STAGES = False # Set to True to visualize training (may slow down significantly) RENDER_EVALUATION_AFTER_EACH_STAGE = True # To see a demo after training
- Training Episodes: Adjust
EPISODES_PER_STAGE(e.g.,50_000_000or100_000_000) based on desired training time and performance. Be aware that training can be very time-consuming, potentially taking many hours or even days depending on the number of episodes and hardware. - Preload Q-table: To start training from a specific point or transfer knowledge, set
PRELOAD_QTABLE_PATHandPRELOAD_QTABLE_JUMPSin Cell 5. To start from scratch, setPRELOAD_QTABLE_PATH = None.
- Training Episodes: Adjust
-
Start Training: Execute Cell 6 (
Uruchomienie Procesu Curriculum Learning / Demonstracji). Training progress will be logged to the console and to.logfiles in theq_learning_results_curriculum_e5directory. Trained Q-tables (.pklfiles) and plots will also be saved there.
- Grid-based Frozen Lake: The agent navigates a grid where different cell types have distinct properties.
- Goal: The primary objective is to collect a treasure ('G') and safely return to the starting position ('S').
- Hazards: The lake contains 'Frozen' (F) ice that turns 'Weak' (W) upon first step, then 'Very Weak' (V), and finally breaks into a 'Hole' (H). Stepping into a 'Hole' results in a high penalty and episode termination.
- Actions: The agent has 8 discrete actions: Move (Left, Down, Right, Up) and Jump (Left, Down, Right, Up). Jumps move the agent two squares and consume a limited resource.
- Observation Space: The state is fully observable and represented by a combination of the agent's spatial index, the number of jumps remaining, and a boolean indicating whether the treasure has been collected.
- Reward System: Rewards are structured to encourage goal achievement (high positive for treasure/goal) and discourage undesirable actions (negative for movement costs, high penalty for falling into holes or invalid actions).
The project employs a Curriculum Learning approach, primarily focusing on how the agent adapts its strategy when faced with varying resource constraints (specifically, the max_jumps parameter).
- Progressive Training: The agent is trained in a series of stages, starting with a higher
max_jumpsvalue (easier) and gradually decreasing it (making the problem more challenging). - Knowledge Transfer: Q-tables learned in an "easier" stage (more jumps) can be used to initialize the Q-table for a "harder" stage (fewer jumps). This allows the agent to adapt existing knowledge rather than learning from scratch, potentially accelerating convergence and fostering more sophisticated strategies.
- Strategic Adaptation: The curriculum helps observe how the agent's optimal policy changes. For instance, with many jumps, the agent might take riskier, shorter paths, while with fewer jumps, it becomes more conservative, saving jumps for critical obstacles or the final return path.
- Kacper Machnik
- Michał Krzempek
This project is licensed under the MIT License - see the LICENSE file for details.