PillagerBench: Benchmarking LLM-Based Agents in Competitive Minecraft Team Environments

We aim to investigate multi-agent systems in competitive team-vs-team scenarios in the Minecraft environment, and explore effective reinforcement learning techniques that enhance tactical play of Large Language Models.

Welcome to PillagerBench, where the blocky world of Minecraft isn't just for fun and games, it's a warzone for the machines! Our benchmark suite is designed to push the boundaries of what virtual agents can achieve by adding dimensions of competition and team play to create complex and dynamic state spaces.

Customize your benchmark with our PillagerAgent extensible API, allowing you to add custom scenarios and new multi-agent systems.

Setup and Configuration

Requirements

API Keys: Obtain API keys from one or more of the following services:
- OpenAI (for access to models like GPT-4o)
- DeepSeek (for access to DeepSeek models)
- OpenRouter (for access to a wide range of models)
Ollama (optional): You have the option to use models from Ollama running locally.

Docker Installation Steps (recommended)

Install Docker for your operating system.
Clone the repository:

git clone https://github.com/aialt/PillagerBench.git
cd PillagerBench

Set-up your API keys:

Create a file named api_keys.py and add your API keys in this way:

openai_api_key = "..."
deepseek_api_key = "..."
openrouter_api_key = "..."

Place this file in the root of the project directory.

Build the Docker image:

docker build -t PillagerBench .

Install NPM packages for Mineflayer:

./js_setup_docker.ps1

Launch the Docker container:

docker compose up -d
docker attach pillagerbench

Run a benchmark from config

python main.py -cn benchmark

Local Installation Steps

Install dependencies:
- Python 3.10
- Node.js 20 (with NPM)
- Java 17

Clone the repository:

git clone https://github.com/aialt/PillagerBench.git

Set-up your API keys:
- Create a file named api_keys.py and add your API keys in this way:
```
openai_api_key = "..."
deepseek_api_key = "..."
openrouter_api_key = "..."
```
- Place this file in the root of the project directory.
Install NPM packages for Mineflayer:
```
./js_setup.sh
```

Create a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows, try venv\Scripts\activate

Install the dependencies:
```
pip install -r requirements.txt
```
Run a benchmark from config
```
python main.py -cn benchmark
```

QuickStart

Set-up Hydra test configs in the configs folder.
Run your test config:

python main.py -cn config_name

Observe your test by joining the internal Minecraft server (requires Minecraft 1.19.4). The default address is localhost:49172, but the port is configurable.
Visualize results with collate_results.py: (requires editing the file to set options)

python collate_results.py

Add additional test scenarios by adding classes to the scenarios folder that inherit from the Scenario base class. You can also add additional world saves to the bench/mc_server folder.
Add additional multi-agent systems by adding classes to the agents folder that inherit from the Agent base class.

Credits

This project is the Master's thesis of Olivier Schipper, check out his other amazing projects!

Citation

If you find our work helpful, please leave us a star and cite our paper.

@INPROCEEDINGS{schipper2025pillagerbench,
  author={Schipper, Olivier and Zhang, Yudi and Du, Yali and Pechenizkiy, Mykola and Fang, Meng},
  booktitle={2025 IEEE Conference on Games (CoG)}, 
  title={PillagerBench: Benchmarking LLM-Based Agents in Competitive Minecraft Team Environments}, 
  year={2025},
  volume={},
  number={},
  pages={1-15},
  keywords={Adaptive learning;Games;Benchmark testing;Solids;Real-time systems;Cognition;Teamwork;Artificial intelligence;Multi-agent systems},
  doi={10.1109/CoG64752.2025.11114387},
  url={https://arxiv.org/abs/2509.06235}
}

License

This project is under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PillagerBench: Benchmarking LLM-Based Agents in Competitive Minecraft Team Environments

Setup and Configuration

Requirements

Docker Installation Steps (recommended)

Local Installation Steps

QuickStart

Credits

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
agents		agents
bench		bench
configs		configs
scenarios		scenarios
voyager		voyager
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
collate_results.py		collate_results.py
compose.yaml		compose.yaml
js_setup.sh		js_setup.sh
js_setup_docker.ps1		js_setup_docker.ps1
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

PillagerBench: Benchmarking LLM-Based Agents in Competitive Minecraft Team Environments

Setup and Configuration

Requirements

Docker Installation Steps (recommended)

Local Installation Steps

QuickStart

Credits

Citation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages