This repository contains:
- A Schnapsen game engine (
api). - Baseline bots (
rand,rdeep). - Four ML bot variants (
ml_none,ml_deck,ml_trump,ml_both) - Scripts to train ML models and to reproduce the ablation experiments used in the report.
- A script to perform the statistical test on the results obtained.
The main experiment compares win/loss performance of the ML variants against rand and rdeep, using seeds and seat-swapping for fairness.
api/Stateand supporting game logic (phases, legal moves, transitions)engine.play(...)runs a full game by repeatedly callingget_move()on bots and applyingState.next(...)util.load_player(name)dynamically loads bots frombots/<name>/<name>.py(expects aBotclass)
bots/rand/- Random baseline: chooses a random legal move
bots/rdeep/- Stronger baseline: uses deeper evaluation (slower than
rand)
- Stronger baseline: uses deeper evaluation (slower than
bots/ml/- Shared ML infrastructure:
featuresets.pyconverts aStateinto a feature vectorcommon.pyimplementsMLBot:- enumerates legal moves
- (phase 1) samples a plausible full-information state via
make_assumption() - simulates
next_state = state.next(move) - scores with
model.predict_proba(...) - picks max (Player 1) or min (Player 2)
SMT.pyquick smoke test runner for ML variants (for debugging).
- Shared ML infrastructure:
bots/ml_none/,bots/ml_deck/,bots/ml_trump/,bots/ml_both/- Each contains a minimal wrapper
Botclass configuring:- whether to use deck knowledge features
- whether to use trump one-hot features
- Each directory must contain a
model.pklused at runtime.
- Each contains a minimal wrapper
run_ablation.py- Runs the experiment used in the report:
- all 4 ML variants vs both opponents
- deterministic seeds
- seat swapping enabled for fairness
- Writes tables and figures to
results/
- Runs the experiment used in the report:
train-ml-bot.py- Generates a dataset by playing games and recording feature vectors per decision state
- Trains an
MLPClassifierand saves amodel.pkl
stat_tests.py- Runs statistical tests on
results/ablation_results.csv- Direct comparison: exact binomial test of win-rate vs p0.
- Indirect comparison: two-proportion z-test vs a baseline.
- Exports
results/stat_tests.csv
- Runs statistical tests on
From the project root:
python3 -m venv .venv
source .venv/bin/activatepip install -U pip
pip install pandas matplotlib scikit-learn joblib scipyEach ML variant loads its model from its own folder:
bots/ml_none/model.pklbots/ml_deck/model.pklbots/ml_trump/model.pklbots/ml_both/model.pkl
If any of these files is missing, run_ablation.py will fail with FileNotFoundError.
Note: You may encounter an InconsistentVersionWarning from scikit-learn if the model.pkl was saved using a different scikit-learnversion than your current environment.
The experiment can still run, but for strict reproducibility, either:
- install the same
scikit-learnversion used to train the model - retrain the models in your current environment
Before running long experiments, confirm that the ML bots load correctly and can play:
PYTHONPATH=$(pwd) python -m bots.ml.SMTThe expected output of that is:
- four lines printed (one per ML bot) with
winner=...andscore=....
Run the experiment script from the project root:
PYTHONPATH=$(pwd) python run_ablation.py-
Evaluates each ML bot variant against:
randrdeep
-
Uses a deterministic seed range to generate initial states
-
Uses
swap_seats=Trueso each seed is played twice (A first, then A second) to reduce first-player bias -
Aggregates:
- games
- wins
- losses
- draws
- win rate
- points for/against
-
Produces plots and exports tables
-
Outputs (in
results):- ablation_results.csv
- ablation_results.xlsx
- wins_losses_vs_rand.png
- wins_losses_vs_rdeep.png
- wins_losses_combined.png
rdeepis substantially slower thanrand- Large seed ranges with seat swapping can take a long time (because total games doubles).
During our personal attempt, maybe due to the machine we used, the script took between 4 & 5 hours to finish running.
After running run_ablation.py, you will have the obtained results in results/ablation_results.csv
You can obtain the statistical testing used in the report with:
python stats_testing.py --csv results/ablation_results.csvUse this only if you need to regenerate model.pkl files (or align scikit-learn versions).
PYTHONPATH=$(pwd) python train-ml-bot.py -d dataset.pkl -m model.pkl -o-dsets the dataset file path-msets the output model name under./bots/ml/-ooverwrites dataset if it already exists
Inside train-ml-bot.py, set:
USE_DECK = ...
USE_TRUMP = ...- ml_none →
USE_DECK=False,USE_TRUMP=False - ml_deck →
USE_DECK=True,USE_TRUMP=False - ml_trump →
USE_DECK=False,USE_TRUMP=True - ml_both →
USE_DECK=True,USE_TRUMP=True
train-ml-bot.py saves to ./bots/ml/<model_name>.
To use it in the experiment, copy it to the corresponding bot directory as model.pkl.
Example:
cp bots/ml/model.pkl bots/ml_trump/model.pklRepeat for each variant as needed.
You are missing a model file. Ensure all four exist:
ls -la bots/ml_none/model.pkl bots/ml_deck/model.pkl bots/ml_trump/model.pkl bots/ml_both/model.pklInstall dependencies into the active environment:
pip install pandas matplotlib scikit-learn joblibFor a fast validation, temporarily reduce the seed range in run_ablation.py (e.g., 10–100 seeds) and run:
PYTHONPATH=$(pwd) python run_ablation.pyThen restore the full seed range used in the report to reproduce final results.
Our group hopes things run flawlessly
Special thanks to Vrije University Amsterdam & all contributors of the original Schnapsen repository