We aim to investigate multi-agent systems in competitive team-vs-team scenarios in the Minecraft environment, and explore effective reinforcement learning techniques that enhance tactical play of Large Language Models.
Welcome to PillagerBench, where the blocky world of Minecraft isn't just for fun and games, it's a warzone for the machines! Our benchmark suite is designed to push the boundaries of what virtual agents can achieve by adding dimensions of competition and team play to create complex and dynamic state spaces.
Customize your benchmark with our PillagerAgent extensible API, allowing you to add custom scenarios and new multi-agent systems.
- API Keys: Obtain API keys from one or more of the following services:
- OpenAI (for access to models like GPT-4o)
- DeepSeek (for access to DeepSeek models)
- OpenRouter (for access to a wide range of models)
- Ollama (optional): You have the option to use models from Ollama running locally.
- Install Docker for your operating system.
- Clone the repository:
git clone https://github.com/aialt/PillagerBench.git
cd PillagerBench- Set-up your API keys:
- Create a file named
api_keys.pyand add your API keys in this way:
openai_api_key = "..."
deepseek_api_key = "..."
openrouter_api_key = "..."- Place this file in the root of the project directory.
- Build the Docker image:
docker build -t PillagerBench .- Install NPM packages for Mineflayer:
./js_setup_docker.ps1- Launch the Docker container:
docker compose up -d
docker attach pillagerbench- Run a benchmark from config
python main.py -cn benchmark- Install dependencies:
- Python 3.10
- Node.js 20 (with NPM)
- Java 17
- Clone the repository:
git clone https://github.com/aialt/PillagerBench.git
- Set-up your API keys:
- Create a file named
api_keys.pyand add your API keys in this way:
openai_api_key = "..." deepseek_api_key = "..." openrouter_api_key = "..."
- Place this file in the root of the project directory.
- Create a file named
- Install NPM packages for Mineflayer:
./js_setup.sh
- Create a virtual environment:
python -m venv venv source venv/bin/activate # On Windows, try venv\Scripts\activate
- Install the dependencies:
pip install -r requirements.txt
- Run a benchmark from config
python main.py -cn benchmark
- Set-up Hydra test configs in the
configsfolder. - Run your test config:
python main.py -cn config_name- Observe your test by joining the internal Minecraft server (requires Minecraft 1.19.4). The default address is
localhost:49172, but the port is configurable. - Visualize results with
collate_results.py: (requires editing the file to set options)
python collate_results.py- Add additional test scenarios by adding classes to the
scenariosfolder that inherit from theScenariobase class. You can also add additional world saves to thebench/mc_serverfolder. - Add additional multi-agent systems by adding classes to the
agentsfolder that inherit from theAgentbase class.
This project is the Master's thesis of Olivier Schipper, check out his other amazing projects!
If you find our work helpful, please leave us a star and cite our paper.
@INPROCEEDINGS{schipper2025pillagerbench,
author={Schipper, Olivier and Zhang, Yudi and Du, Yali and Pechenizkiy, Mykola and Fang, Meng},
booktitle={2025 IEEE Conference on Games (CoG)},
title={PillagerBench: Benchmarking LLM-Based Agents in Competitive Minecraft Team Environments},
year={2025},
volume={},
number={},
pages={1-15},
keywords={Adaptive learning;Games;Benchmark testing;Solids;Real-time systems;Cognition;Teamwork;Artificial intelligence;Multi-agent systems},
doi={10.1109/CoG64752.2025.11114387},
url={https://arxiv.org/abs/2509.06235}
}
This project is under the MIT License.