This repository contains a Gymnasium-based environment for the Hearthstone game. The environment is based on the fireplace package, which is a Python implementation of the Hearthstone game. The environment is designed to be used with reinforcement learning algorithms to train agents to play Hearthstone.
This repository requires Python 3.12.2. The repository is tested on Windows 11.
More documentation can be found via the website: HearthGym.
Before you can run the Hearthstone environment, you need to install the fireplace package. The fireplace package is a Python implementation of the Hearthstone game. The package is not available on PyPI, so you need to install it from the source. The package is included in this repository as a submodule. To install the package, run the following commands:
cd fireplace
pip install .Check if the card data is included, this is normally in the git lfs of fireplace. If the data is not included, you can download it from the fireplace repository. The data is stored in the fireplace/data/cards folder.
Then you need to install the requirements for this repository. CAUTION: Some of the requirements are commented and written in the requirements.txt file and require specific actions to be installed.
Afterwards, all normal requirements can be installed with the following command:
cd ..
pip install -r requirements.txtOne can run a game in the Hearthstone environment with the run_game.py script, stored in the src/ folder. An example command would be:
python run_game.py \
--agent1=0 \
--agent2=0 \
--class1=2 \
--class2=2 \
--games=1 \
--log_file="logs/random_vs_random.log" \
--intern_logging=False \
--render=False \
--seed=0The run_game.py script has the following arguments:
agent1andagent2: The agents that will play the game. The agents are defined in theagentsfolder. (Default:0(RandomAgent))class1andclass2: The ids of the classes of the agents. The classes are defined in thefireplacepackage and the options are listed below. (Default:2(Hunter))games: The number of games to play. (Default:1)log_file: The file where the logs will be saved. (Default:logs/game.log)intern_logging: Whether to display the fireplace internal logs in the console. (Default:False)render: Whether to render observations during the game. (Default:False)seed: The seed for the random number generator. (Default:42)
For experiments with HearthGym, an experiment workflow has been incorporated. The run_experiments.py script is used to run experiments with different agents. Documentation on how to run experiments can be found in the src/experiments/README.md file.
The src/analysis_dashboard folder contains a dashboard for analyzing the performance of Hearthstone agents. The dashboard is based on the Streamlit framework and is designed to be used with log files generated by the run_game.py script. The dashboard allows you to visualize the performance of different agents and compare their results. Documentation on how to run the dashboard can be found in the src/analysis_dashboard/README.md file.
Currently, the following agents are implemented:
(0) RandomAgent: An agent that plays random actions.(1) HumanAgent: An agent that takes actions based on your input.(2) DynamicLookaheadAgent: An agent that uses a dynamic lookahead algorithm to choose the best action. (Based onNils Bohnhof & Jon-Mailes Graeffeimplementation)(3) GretiveCompAgent: An agent that uses another scoring agent to choose the best action. (Based onAntonio M. Mora Garcíaimplementation)(4) WeightedScoreAgent: An agent that uses a weighted scoring algorithm to choose the best action. (Based onSebastian Millerimplementation)(5) NaiveScoreLookaheadAgent: An agent that uses a naive scoring algorithm to choose the best action. (Based onSebastian Millerimplementation)(6) PPOAgent: An agent that uses the PPO algorithm to choose the best action. (trained using thesrc/models/PPOframework, read thesrc/models/PPO/README.mdfile for more information)(7) GreedyAgent: An agent that uses a greedy algorithm to choose the best action. (Based on the Hearthstone AI Competition implementation). Contains 3 heuristic scoring modes:aggro: Aggressive mode, where the agent tries to deal as much damage as possible.control: Control mode, where the agent tries to control the board and minimize damage taken.ramp: Ramp mode, where the agent tries to ramp up its mana and play big minions as soon as possible.
(8) BaseDynamicLookaheadAgent: An agent that uses a dynamic lookahead algorithm to choose the best action. (Based on the Hearthstone AI Competition implementation). This agent is not used in therun_game.pyscript, but it is available for testing purposes.(9) WorldModelAgent: An agent that uses a trained world model to choose the best action. (Trained using thesrc/models/WorldModelframework, read thesrc/models/WorldModel/README.mdfile for more information)(10) EncodedPPOAgent: An agent that uses a trained auto encoder combined with the PPO algorithm to choose the best action. (Trained using thesrc/models/EncodedPPOframework, read thesrc/models/EncodedPPO/README.mdfile for more information)
The following classes are present in the Hearthstone environment. Some of the classes are disabled because they are not viable for 1v1 games.
DeathKnight(Disabled)DruidHunterMagePaladinPriestRogueShamanWarlockWarriorDream(Disabled due to not being real player class)Neutral(Disabled due to not being real player class)Whizbang(Disabled due to not being real player class)DemonHunter(Disabled)
Decks are stored and created in the src/data/ folder. A README file is present in the src/data/ folder that describes how to add new decks to the environment. HearthGym comes with eighteen pre-built decks, two for each enabled class. The metadata for the decks is stored in the src/data/decks_metadata.csv file and the actual cards in the decks are stored in the src/data/final_decks.csv file. The decks are based on the fireplace package and are designed to be used within HearthGym.
If an invalid class is provided, the game will choose a random class for the agent. (This is printed in the logs.)
The Hearthstone environment is implemented in the src/env/hearthstone/HearthGym.py file. The environment is based on the OpenAI Gym interface. It uses various components from the fireplace package to simulate the Hearthstone game.
The environment is a turn-based game where two agents play against each other. The environment is fully observable, and the agents can take actions based on the current state of the game.
The state space is a large observation space that includes the following components:
Player's mana: The amount of mana that the player has available.Player's max mana: The maximum amount of mana that the player can have.Player's hero health: The health of the player's hero.Player's hero attack: The attack of the player's hero.Player's hero armor: The armor of the player's hero.Player's choice: 0 if the player has no choice, 1 if the player has a choice.Opponent's hero health: The health of the opponent's hero.Opponent's mana: The amount of mana that the opponent has available.Opponent's max mana: The maximum amount of mana that the opponent can have.Opponent's hand size: The number of cards in the opponent's hand.Turn number: The current turn number.Hero power availability: 0 if the hero power is not available, 1 if the hero power is available.Hero weapon attack: The attack of the player's hero weapon.Hero weapon durability: The durability of the player's hero weapon.Player board minion stats: The stats of the minions on the player's board. Including attack/health, cost, and other stats (e.g., charge, taunt, divine shield).Opponent board minion stats: The stats of the minions on the opponent's board. Including attack/health, cost, and other stats (e.g., charge, taunt, divine shield).
The optional embedded mode adds three arrays to the state space:
Player's hand: The cards in the player's hand. This vector has 384 dimensions and is the average of the encodings of the card descriptions retrieved with theMiniLM-L6-v2model using thesentence-transformerslibrary. The encoding is done using thesrc\data\create_data.pyfile.
The optional deck inclusion mode adds one array to the state space:
Player's deck: The cards in the player's deck. This vector has 384 dimensions and is the average of the encodings of the card descriptions retrieved with theMiniLM-L6-v2model using thesentence-transformerslibrary. The encoding is done using thesrc\data\create_data.pyfile. This mode adds the information of what cards the agent has in its deck.
The action space consists of six discrete values that define the actions that the agent can take:
Action type: The type of action that the agent wants to take (if there is no choise required from the player). The options are:- End turn
- Play card
- Use hero power
- Attack with minion
- Attack with hero
Card index: Index of the card in the player's hand. If the action type is notPlay card, this value is ignored.Attacker index: Index of the attacking minion in the player's field. If the action type is notAttack with minion, this value is ignored.Target index: Index of the target minion or hero. If the action type is notAttack with minionorAttack with hero, this value is ignored.Discover index: Index of the chosen card from a Discover effect. If there is no choice required from the player, this value is ignored.Choose index: Index of the chosen card from a choice effect (e.g., Discover, Adapt). If there is no choice required from the player, this value is ignored.- This differ from the discover index by only being used when the choice is presented after playing a card.