Learning Tetris using the noisy Cross Entropy method

Project conducted as part of the second year Monte Carlo course at ENSAE. Most of the explanations belows are extracts from preexisting articles (see bibliography), however, the code and the results are ours.

Main idea

The cross-entropy method is an efficient and general optimization algorithm. However, its applicability in reinforcement learning seems to be limited although it is fast, because it often converges to suboptimal policies. A standard technique for preventing early convergence is to introduce noise. We apply the noisy cross-entropy method to the game of Tetris to demonstrate its efficiency.

Code

Tetris.py implements the Tetris game and the state-value functions.
Tetris_tuned.py proposes two more features to integrate in the state-value function (depht of the holes & numbers of lines with holes) as it is proposed by [2].
CE_method.py implements the classical cross-entropy method optimizer.
CE_method_with_noise.py implements the Noisy cross-entropy method (constant & decreasing noise).
Simulated_annealing.py implements the simulated annealing optimizer.

Tetris controller

Following the approach of Thiery and Scherrer [2], we shall learn state-value functions that are linear combination of 21 basis functions.

Feature	Id	Description	Comments
Column height	$h_p$	Height of the $p$ th column of the board	There are $P$ such features where $P$ is the board width
Column difference	$\Delta h_p$	Absolute difference $\mid h_p − h_{p+1} \mid$ between adjacent columns	There are $P − 1$ such features where $P$ is the board width
Maximum height	$L$	Maximum pile height	Prevents from having a big pile
Holes	$H$	Number of empty cells covered by at least one full cell	Prevents from making holes

The value function to optimise:

$$ V_w(s):=\sum_{i=1}^{10} w_i h_i(s) + \sum_{i=1}^{9} w_{10+i}\Delta h_i(s) + w_{20}L + w_{21}H $$

where $s$ denotes a Tetris state and $w=(w_1,w_2,...,w_{21})$ the weight under which optimize.

One of our Tetris simulation for an optimised weight vector (for gif generator see Tetris.py) :

Optimizer

Cross entropy

Presentation

An interesting illustration to understand this principle for $w=(w_1,w_2)$ is proposed in this repo :

The red crosses : the 10 best vectors we select and use to estimate next round $\mathcal{N}\left(\mu, \sigma^2\right)$
The black dots : the next round 100 vectors we generate and test

Results

Cross entropy with constant noise

Presentation

Results

Cross entropy with decreasing noise

Presentation

Results

Further improvements

Try a two pieces Tetris controller
Use new features in the controller
Optimize the hyperparameters
Simulated annealing optimizer

Bibliography

[1] Learning Tetris Using the Noisy Cross-Entropy Method, I. Szita , A. Lorincz
[2] Improvements on Learning Tetris with Cross Entropy, Christophe Thiery, Bruno Scherrer, INRIA
[3] A Tutorial on the Cross-Entropy Method, Pieter-Tjerk de Boer, Dirk P. Kroese, Shie Mannor, Reuven Y. Rubinstein

Contacts

Grégoire Brugère - [email protected]
Léo Stepiens - [email protected]
Corentin Pla - [email protected]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learning Tetris using the noisy Cross Entropy method

Main idea

Code

Tetris controller

Optimizer

Cross entropy

Presentation

Results

Cross entropy with constant noise

Presentation

Results

Cross entropy with decreasing noise

Presentation

Results

Further improvements

Bibliography

Contacts

About

Releases

Packages

Contributors 4

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
figures		figures
CE_method.py		CE_method.py
CE_method_with_noise.py		CE_method_with_noise.py
README.md		README.md
Simulated_annealing.py		Simulated_annealing.py
Tetris.py		Tetris.py
Tetris_tuned.py		Tetris_tuned.py

corentinpla/Learning-Tetris-Using-the-Noisy-Cross-Entropy-Method

Folders and files

Latest commit

History

Repository files navigation

Learning Tetris using the noisy Cross Entropy method

Main idea

Code

Tetris controller

Optimizer

Cross entropy

Presentation

Results

Cross entropy with constant noise

Presentation

Results

Cross entropy with decreasing noise

Presentation

Results

Further improvements

Bibliography

Contacts

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages