yambot 🎲

Training a reinforcement learning agent to play Yamb, a Croatian dice game, because I could use all the help I can get.

Note: The demo GIF may take a few seconds to load. Please be patient.

With a couple of hours training, the model is able to achieve a score of about 850; around the score of a beginner. With GPU training, it would probably be able to compete with a decent player.

Features ✨

🎲 Masked PPO Training - Utilizes state-of-the-art Proximal Policy Optimization with invalid action masking.
🏋️‍♂️ Custom Gymnasium Environment - Integrates Gymnasium, an API standard for reinforcement learning, to simulate and train the AI model effectively.
☁️ Azure Integration - Harnesses the power of Microsoft Azure for enhanced training performance.

Rules of Yamb 📜

2 to 5 players, but can also be played solo.

Objective: Maximize Total Score below by strategically rolling dice, announcing and filling out the table.

Gameplay:

Each player takes turns rolling five dice.
You may roll up to three times per turn, choosing which dice to keep and which to re-roll.
After your final roll, you must write your result in one of the available squares in the score table.
The game consists of 48 rounds, and the player with the highest score at the end wins.

Table:

Columns
- 0. DOLJE ⬇️: You must fill this column from top to bottom in order.
- 1. GORE ⬆️: You must fill this column from bottom to top in order.
- 2. SLOBODNO ⬆️⬇️: You can fill this column in any order.
- 3. NAJAVA 🔊: After your first roll, you can choose to announce the square in this column you intend to fill out (provided it is free). After announcing, you must fill out that square after your two further rolls.
Rows
- 0. ONES: 1 x Number of 1s rolled. (e.g. [1, 1, 1, 1, 3] → 4 points).
- 1. TWOS: 2 x Number of 2s rolled. (e.g. [1, 1, 1, 1, 3] → 0 points).
- 2. THREES: 3 x Number of 3s rolled. (e.g. [1, 1, 1, 1, 3] → 3 points).
- 3. FOURS: 4 x Number of 4s rolled. (e.g. [4, 4, 1, 1, 3] → 8 points).
- 4. FIVES: 5 x Number of 5s rolled. (e.g. [5, 5, 5, 1, 3] → 15 points).
- 5. SIXES: 6 x Number of 6s rolled. (e.g. [6, 1, 1, 1, 3] → 6 points).
- 6. MAX: Total of all the dice rolled. (e.g. [5, 5, 5, 5, 5] → 25 points).
- 7. MIN: Total of all the dice rolled. (e.g. [1, 1, 1, 1, 1] → 5 points).
- 8. DVAPARA: 2 pairs. (e.g. [6, 1, 1, 1, 1] → 0 points).
- 9. TRIS: 3 of a kind. (e.g. [6, 6, 6, 1, 1] → 18 points).
- 10. SKALA: Straight - 45 scored for a small straight and 50 scored for a large. (e.g. [1, 2, 3, 4, 5] → 45 points).
- 11. FULL: Full house.
- 12. POKER: 4 of a kind.
- 13. YAMB: YAMB! 5 of a kind.

Scoring

A: If ONES + TWOS + ... + SIXES >= 60 then ONES + TWOS + ... + SIXES + 30 else ONES + TWOS + ... + SIXES.
B: (MAX - MIN) x ONES.
C: DVAPARA + TRIS + SKALA + FULL + POKER + YAMB.
Total Score: A + B + C

Installation 🔧

If you want to train, test or evaluate locally, follow these steps:

Clone the repository:

git clone https://github.com/aav31/yambot.git
cd yambot

Create the conda environment:
```
conda env create -f environment.yml
```
Activate the environment
```
conda activate yambot
```

This will install all the necessary libraries and tools required to run the project.

Folder Structure 📂

configs: Contains ways of configuring the model to play Yamb.
media: Contains images and videos demonstrating model performance.
models: Folder containing trained reinforcement learning models.
scripts: Contains the scripts for training, testing, and evaluating yambot.
tests: Contains tests for the custom Yamb environment.
yamb: A package containing the custom environment needed to simulate a game of Yamb.

A logs folder which contains log files for tensorboard is hidden from git.

Usage 🚀

Training

To train from scratch (this will delete tensorboard logs and reset the episodes trained in the config file to be zero):

python -m scripts.train --episodes 1000 --config "configs/model_default.json" --reset

To continue training a model:

python -m scripts.train --episodes 1000 --config "configs/model_default.json"

On an azure machine or cluster use:

python -m scripts.train --episodes 1000 --config "configs/model_default.json" --reset True --azure True

To look at the results for each model:

tensorboard --logdir=logs

To create a new model:

Create a new config model_new.json
Set episodes_trained=0
Rename model_name=model_new

Run:

python -m scripts.train --episodes 1000 --config "configs/model_new.json"

Evaluation

If if you want to test the model by watching it play a game of yamb:

python -m scripts.test --model_name model_default

If you want to test the model by letting it play multpile games of yamb then be evaluated:

python -m scripts.evaluate --model_name model_default --episodes 100

Playing yamb yourself

This functionality is a way to play yamb yourself, and is more a full test of whether the environment is truly working as we expect:

python -m scripts.yamb_yourself

License 📄

This project is licensed under the MIT License. See the LICENSE file for more details.

Tests ✅

To run the unit tests, from the root directory run:

python -m unittest discover -v

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

yambot 🎲

Features ✨

Rules of Yamb 📜

Installation 🔧

Folder Structure 📂

Usage 🚀

Training

Evaluation

Playing yamb yourself

License 📄

Tests ✅

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
configs		configs
media		media
models		models
scripts		scripts
tests		tests
yamb		yamb
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml

License

aav31/yambot

Folders and files

Latest commit

History

Repository files navigation

yambot 🎲

Features ✨

Rules of Yamb 📜

Installation 🔧

Folder Structure 📂

Usage 🚀

Training

Evaluation

Playing yamb yourself

License 📄

Tests ✅

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages