Project conducted as part of the second year Monte Carlo course at ENSAE. Most of the explanations belows are extracts from preexisting articles (see bibliography), however, the code and the results are ours.
The cross-entropy method is an efficient and general optimization algorithm. However, its applicability in reinforcement learning seems to be limited although it is fast, because it often converges to suboptimal policies. A standard technique for preventing early convergence is to introduce noise. We apply the noisy cross-entropy method to the game of Tetris to demonstrate its efficiency.
Tetris.py
implements the Tetris game and the state-value functions.Tetris_tuned.py
proposes two more features to integrate in the state-value function (depht of the holes & numbers of lines with holes) as it is proposed by [2].CE_method.py
implements the classical cross-entropy method optimizer.CE_method_with_noise.py
implements the Noisy cross-entropy method (constant & decreasing noise).Simulated_annealing.py
implements the simulated annealing optimizer.
Following the approach of Thiery and Scherrer [2], we shall learn state-value functions that are linear combination of 21 basis functions.
Feature | Id | Description | Comments |
---|---|---|---|
Column height | Height of the |
There are |
|
Column difference | Absolute difference |
There are |
|
Maximum height | Maximum pile height | Prevents from having a big pile | |
Holes | Number of empty cells covered by at least one full cell | Prevents from making holes |
The value function to optimise:
where
One of our Tetris simulation for an optimised weight vector (for gif generator see Tetris.py
) :
An interesting illustration to understand this principle for
- The red crosses : the 10 best vectors we select and use to estimate next round
$\mathcal{N}\left(\mu, \sigma^2\right)$ - The black dots : the next round 100 vectors we generate and test
- Try a two pieces Tetris controller
- Use new features in the controller
- Optimize the hyperparameters
- Simulated annealing optimizer
- [1] Learning Tetris Using the Noisy Cross-Entropy Method, I. Szita , A. Lorincz
- [2] Improvements on Learning Tetris with Cross Entropy, Christophe Thiery, Bruno Scherrer, INRIA
- [3] A Tutorial on the Cross-Entropy Method, Pieter-Tjerk de Boer, Dirk P. Kroese, Shie Mannor, Reuven Y. Rubinstein
- Grégoire Brugère - [email protected]
- Léo Stepiens - [email protected]
- Corentin Pla - [email protected]