Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
270 commits
Select commit Hold shift + click to select a range
ba5819d
handle movement for stacked pieces
Suhas-13 Jan 21, 2024
0e45ad3
add pieces.py, game interface
Suhas-13 Feb 7, 2024
7759c05
accounted for beatle
Suhas-13 Feb 15, 2024
4376bea
continue implement game logic for opening
Suhas-13 Feb 15, 2024
93699db
extract to board.py and finish startup logic
Suhas-13 Feb 19, 2024
746a275
process actions separately
Suhas-13 Feb 20, 2024
01bbc0d
add game turn mechanism
Suhas-13 Feb 20, 2024
7e16300
add win condition and basic scoring function
Suhas-13 Feb 20, 2024
399cc0b
fix win condition
Suhas-13 Feb 20, 2024
b94c9c3
create testable version
Suhas-13 Feb 21, 2024
bb74084
fix empty action bug
Suhas-13 Feb 23, 2024
1e7bbf1
fix a bunch more bugs
Suhas-13 Feb 24, 2024
802b3ed
game logic working
Suhas-13 Feb 25, 2024
e0cef76
fixed incorrect move validation
Suhas-13 Feb 25, 2024
f19d0e9
don't allow movement till queen bee placed
Suhas-13 Feb 25, 2024
a33b6c1
fix queen bee except rule
Suhas-13 Feb 25, 2024
0dd1920
commit hive rules
Suhas-13 Feb 26, 2024
9701626
add image view mode
Suhas-13 Feb 27, 2024
85d3016
update human agent
Suhas-13 Feb 27, 2024
3503b58
added image capabilities to base gpt agent
acostarelli Feb 27, 2024
fbfaacd
needs to be f-string
acostarelli Feb 27, 2024
25ba2d0
add image and interactive mode
Suhas-13 Feb 27, 2024
c419c83
use image directly rather than soliciting text description first
acostarelli Feb 27, 2024
1bdb0a1
Merge branch 'hive' of https://github.com/Joshuaclymer/GameBench into…
acostarelli Feb 27, 2024
582f173
create image directory
Suhas-13 Feb 27, 2024
9844c68
Merge branch 'hive' of https://github.com/Joshuaclymer/GameBench into…
Suhas-13 Feb 27, 2024
16c94d5
modify font size
Suhas-13 Feb 27, 2024
8aa80a1
fix queen win condition
Suhas-13 Feb 28, 2024
212a437
Merge pull request #18 from Joshuaclymer/main
y-arjun-y Feb 28, 2024
1660b04
implemented base of messaging feature
y-arjun-y Feb 28, 2024
3c8a03a
check to ensure json has 'action' id
acostarelli Feb 28, 2024
0a6062a
Merge branch 'hive' of https://github.com/Joshuaclymer/GameBench into…
acostarelli Feb 28, 2024
ca7f705
added rules + hand transcribed rules + finished rules class
CarlsonCarlson Feb 29, 2024
de07d93
fixed errors typos rules class
CarlsonCarlson Feb 29, 2024
d0b5d55
init classes
CarlsonCarlson Feb 29, 2024
25f5a7f
class definitions
CarlsonCarlson Feb 29, 2024
cdd2585
finished extended agent interaction
y-arjun-y Mar 1, 2024
59a0dba
unsure if i should try getting it to take images directly, or use its…
acostarelli Mar 1, 2024
daa6cc3
gpt actually explains rules now; added cot and bap agents for gpt3.5
acostarelli Mar 1, 2024
badb03b
Merge branch 'hive' into llm-agents
acostarelli Mar 1, 2024
9eb7ddc
gpt3.5 agents can handle images now
acostarelli Mar 1, 2024
fde53d9
made market fluctuations more agent-driven
y-arjun-y Mar 2, 2024
b9e63d6
game setup coded, need to test
CarlsonCarlson Mar 5, 2024
10c5a66
removed redundant code
y-arjun-y Mar 6, 2024
1663047
fixed errors
y-arjun-y Mar 6, 2024
7c3d772
game setup complete + working on get_observation + testing + refactoring
CarlsonCarlson Mar 6, 2024
57e1af2
first commit for hive
Suhas-13 Jan 21, 2024
e36358f
handle movement for stacked pieces
Suhas-13 Jan 21, 2024
0ba141b
add pieces.py, game interface
Suhas-13 Feb 7, 2024
bc54d33
accounted for beatle
Suhas-13 Feb 15, 2024
5c8b875
continue implement game logic for opening
Suhas-13 Feb 15, 2024
21e0bcd
extract to board.py and finish startup logic
Suhas-13 Feb 19, 2024
5b2de33
process actions separately
Suhas-13 Feb 20, 2024
c3100a8
add game turn mechanism
Suhas-13 Feb 20, 2024
f97f872
add win condition and basic scoring function
Suhas-13 Feb 20, 2024
58ee3ef
fix win condition
Suhas-13 Feb 20, 2024
3d01592
create testable version
Suhas-13 Feb 21, 2024
6e5667c
fix empty action bug
Suhas-13 Feb 23, 2024
3e35c23
fix a bunch more bugs
Suhas-13 Feb 24, 2024
0f9f29b
game logic working
Suhas-13 Feb 25, 2024
b57ebdc
fixed incorrect move validation
Suhas-13 Feb 25, 2024
fa346a6
don't allow movement till queen bee placed
Suhas-13 Feb 25, 2024
3537e01
fix queen bee except rule
Suhas-13 Feb 25, 2024
64f8e61
commit hive rules
Suhas-13 Feb 26, 2024
ac6ad10
add image view mode
Suhas-13 Feb 27, 2024
1940f33
update human agent
Suhas-13 Feb 27, 2024
ba1d8b3
added image capabilities to base gpt agent
acostarelli Feb 27, 2024
9287112
needs to be f-string
acostarelli Feb 27, 2024
638a6d7
use image directly rather than soliciting text description first
acostarelli Feb 27, 2024
f3ecd8e
add image and interactive mode
Suhas-13 Feb 27, 2024
ad8a5be
check to ensure json has 'action' id
acostarelli Feb 28, 2024
54bba2b
create image directory
Suhas-13 Feb 27, 2024
ddf9e0c
modify font size
Suhas-13 Feb 27, 2024
aab17f0
fix queen win condition
Suhas-13 Feb 28, 2024
0a47c8b
wrap up hive
Suhas-13 Feb 28, 2024
28fc81a
account for coordinate system switch
Suhas-13 Feb 28, 2024
b55aea8
removed debug prints
Suhas-13 Feb 28, 2024
47d396a
fix print statements
Suhas-13 Feb 28, 2024
2446073
slight text modifications and low cost mode
Suhas-13 Feb 28, 2024
a8aadd4
add max turn count and intermediate scoring
Suhas-13 Mar 4, 2024
322a9c5
cleaned up debug
Suhas-13 Mar 4, 2024
bbaf854
add show_state and agetn kwargs
Suhas-13 Mar 6, 2024
793b3b7
add agent kwargs
Suhas-13 Mar 7, 2024
346c4af
reduce image size
Suhas-13 Mar 7, 2024
32ce522
reduce font size
Suhas-13 Mar 7, 2024
b2e8a49
reduce font size and board size
Suhas-13 Mar 7, 2024
bdda2bb
modify prompt
Suhas-13 Mar 7, 2024
109646c
observation text is good
CarlsonCarlson Mar 8, 2024
6f984d2
cleaning up code before working on tactical effect management
CarlsonCarlson Mar 8, 2024
c10457d
finished checklist
y-arjun-y Mar 8, 2024
945af27
Added script for running all agents on all games
shahofblah Mar 8, 2024
2b0afc4
update draw scoring
Suhas-13 Mar 8, 2024
7d164f5
available actions done + scaffolding tactical abilities and how to ha…
CarlsonCarlson Mar 9, 2024
977ab04
Add Python script to test Santorini game
RomanHauksson Mar 9, 2024
7d8bd15
Refactor direction names and handle invalid actions from agents
RomanHauksson Mar 9, 2024
ea7e49f
Add agent kwargs
RomanHauksson Mar 9, 2024
c2c2931
Merge branch 'main' into Santorini
RomanHauksson Mar 9, 2024
6dbf878
Merge pull request #10 from Joshuaclymer/Santorini
acostarelli Mar 9, 2024
9baaa47
more faithful implementation of pit
y-arjun-y Mar 9, 2024
81bbe38
back to high detail mode since cost is within range
acostarelli Mar 9, 2024
45d852d
finished more faithful implementation of pit
y-arjun-y Mar 9, 2024
fe1e310
actually just going to use auto...
acostarelli Mar 9, 2024
9bc80d0
Merge branch 'main' into hive
acostarelli Mar 9, 2024
8ff3338
Merge pull request #24 from Joshuaclymer/main
y-arjun-y Mar 9, 2024
f3dbaa6
Merge pull request #23 from Joshuaclymer/arctic-scavengers
acostarelli Mar 9, 2024
396a0f7
edited commodities and max_possible
y-arjun-y Mar 10, 2024
1f902bd
Merge branch 'pit' of https://github.com/Joshuaclymer/GameBench into pit
y-arjun-y Mar 10, 2024
f321115
made it four agents
y-arjun-y Mar 10, 2024
93dc7ad
allow 1 to 4 quantity of cards to be traded
y-arjun-y Mar 10, 2024
e22e457
finished checklist
y-arjun-y Mar 10, 2024
20d3c2e
fixed import
y-arjun-y Mar 10, 2024
9831a58
Merge pull request #25 from Joshuaclymer/main
y-arjun-y Mar 10, 2024
9cd6a8d
fixed observation, Aerodrome and Airdrop implemented, scaffolded extr…
CarlsonCarlson Mar 10, 2024
c3ca1b6
fixed recursion glitch
y-arjun-y Mar 11, 2024
c8d55c1
Merge branch 'pit' of https://github.com/Joshuaclymer/GameBench into pit
y-arjun-y Mar 11, 2024
2dae62e
nine per commodity and only one trade proposal at a time
y-arjun-y Mar 12, 2024
9fcee56
added small test script
y-arjun-y Mar 12, 2024
6853093
fixed update function
y-arjun-y Mar 12, 2024
593340d
fixed dataclass
y-arjun-y Mar 12, 2024
077e2f9
broadcasted trades to all players
y-arjun-y Mar 12, 2024
9e75e6d
finished checklist
y-arjun-y Mar 12, 2024
20b04a9
Merge branch 'llm-agents' into collect-elos
acostarelli Mar 13, 2024
505cc3e
some bugfixes to gpt
acostarelli Mar 13, 2024
48c11c8
some match data
acostarelli Mar 13, 2024
081fa73
adding typehints and show_state parameter to abstract method
CarlsonCarlson Mar 13, 2024
e4a6459
check_destroy_triggers + find, flip, and play from action functions +…
CarlsonCarlson Mar 13, 2024
f2b4c76
removed requested_commodity
y-arjun-y Mar 14, 2024
f48ea1f
tested aerodrome, air drop, manuever (complete), ambush (complete) + …
CarlsonCarlson Mar 15, 2024
159d77c
redeploy + reinforce + disrupt tactical abilities done + withdraw + p…
CarlsonCarlson Mar 17, 2024
aaef4a2
More data
acostarelli Mar 18, 2024
c7b9923
More data
acostarelli Mar 18, 2024
306254b
Moving to top level since I can't get it to run in here
acostarelli Mar 18, 2024
938b2f5
almost 100 games so far
acostarelli Mar 19, 2024
dd9a4a9
add validation for num guesses
Suhas-13 Mar 19, 2024
b697b9b
Some more matches. Need to change code so that if a game fails, the w…
acostarelli Mar 20, 2024
17b1835
Really hacky way to keep track of tokens
acostarelli Mar 20, 2024
4c7b323
Merge branch 'codenames' into collect-elos
acostarelli Mar 20, 2024
648e078
tripping json output bc otherwise gpt3 always returns random
acostarelli Mar 20, 2024
f152d89
can't figure out how to merge so am copypasting...
acostarelli Mar 20, 2024
8b0d30e
updating gpt agent from future branch to run testing
CarlsonCarlson Mar 20, 2024
70713bf
finished v1
CarlsonCarlson Mar 20, 2024
5d0732c
script for running experiments... mostly a living document
acostarelli Mar 20, 2024
4133de3
Merge branch 'main' into collect-elos
acostarelli Mar 20, 2024
2ba4393
Merge pull request #27 from Joshuaclymer/collect-elos
acostarelli Mar 20, 2024
1b8b24e
adding normalize score in play()
CarlsonCarlson Mar 20, 2024
5b9c98e
Merge branch 'main' into air-land-sea
acostarelli Mar 20, 2024
091274a
Merge pull request #26 from Joshuaclymer/air-land-sea
acostarelli Mar 20, 2024
96c1d73
Merge pull request #28 from Joshuaclymer/main
y-arjun-y Mar 21, 2024
a57c26a
added show_state and trade_id to go ahead with one trade only and fix…
y-arjun-y Mar 21, 2024
087c06f
Merge branch 'main' into hive
acostarelli Mar 21, 2024
3a95982
Merge pull request #17 from Joshuaclymer/hive
acostarelli Mar 21, 2024
70ef2a6
renaming agents for clarity... there were agents with incorrect label…
acostarelli Mar 21, 2024
7ec0a93
forgot to change modes
acostarelli Mar 21, 2024
4cc1444
Merge branch 'main' of https://github.com/Joshuaclymer/GameBench
acostarelli Mar 21, 2024
770c4ab
change agent teams to 1 and 2 so it's compatible with santoriniai
acostarelli Mar 21, 2024
db8edbd
fix default action
acostarelli Mar 22, 2024
ebf5977
finished gpt3 vs random on most games
acostarelli Mar 22, 2024
8f417fa
finished remaining concerns
y-arjun-y Mar 22, 2024
c7c9f42
Merge pull request #29 from Joshuaclymer/main
y-arjun-y Mar 22, 2024
a1600ef
collected gpt4 vs random
acostarelli Mar 24, 2024
f4290fa
gpt3cot matches
acostarelli Mar 24, 2024
0e95569
I noticed occasionally agents ask for rule clarification in a slightl…
acostarelli Mar 25, 2024
db916d3
add validation on num_guesses
Suhas-13 Mar 25, 2024
1b23346
more matches with gpt4cot
acostarelli Mar 25, 2024
b655e97
fixed explain again
acostarelli Mar 25, 2024
2457f56
so i didn't realize that agents were allowed to ask for action explan…
acostarelli Mar 26, 2024
9c172d7
data up until the previous explain fix
acostarelli Mar 26, 2024
ba2d63e
Finished gpt4cot
acostarelli Mar 26, 2024
0c03da6
visuals for data
acostarelli Mar 27, 2024
22f649c
edited old data to fix model names
acostarelli Mar 27, 2024
a059b80
using choix to compute bt-model scores
acostarelli Mar 27, 2024
770d222
calculates bootstrapped confidence intervals now
acostarelli Mar 27, 2024
f614b91
Merge remote-tracking branch 'origin/fix-codenames'
acostarelli Mar 27, 2024
6c8a153
couldn't figure out how to merge changes so i copied and pasted ://
acostarelli Mar 27, 2024
66f7755
merge arctic scavengers; unsure how i feel about random agent changes…
acostarelli Mar 30, 2024
8feafb3
merged fix-codenames
acostarelli Apr 3, 2024
3b726b3
added bull and bear trading capabilities
y-arjun-y Apr 5, 2024
f9a5a63
Merge branch 'pit' of https://github.com/Joshuaclymer/GameBench into pit
y-arjun-y Apr 5, 2024
200f53e
other games need a string here to work; hopefully these changes dont …
acostarelli Apr 9, 2024
dde5152
merge arctic scavenges
acostarelli Apr 9, 2024
59731fb
more matches
acostarelli Apr 9, 2024
fb161c8
some rap data
acostarelli Apr 24, 2024
15b7783
More rap data
acostarelli Apr 27, 2024
3dc5b60
more arctic scavengers data
acostarelli Apr 27, 2024
cd23757
add atari boxing data
acostarelli Apr 29, 2024
59dd736
commit before data collection
y-arjun-y Apr 29, 2024
7f0c98f
Merge pull request #15 from Joshuaclymer/pit
acostarelli Apr 29, 2024
62e7ccf
improve probability and rating calculation
acostarelli Apr 29, 2024
320cdc2
Merge branch 'main' of https://github.com/Joshuaclymer/GameBench
acostarelli Apr 29, 2024
14b9968
gpt4 v random pit data
acostarelli May 2, 2024
e6ef7d0
more pit data
acostarelli May 10, 2024
c7b6da2
visuals per game now
acostarelli May 10, 2024
63369da
generates appendix page with the game rules
acostarelli May 21, 2024
61654d4
generate all visuals in a row; need to determine good image size next
acostarelli May 21, 2024
a677b4f
generates the aggregated rating plot weighted based on number of matc…
acostarelli May 21, 2024
a344b28
added probabilities plot and n games collected plot
acostarelli May 22, 2024
ed0886b
will generate all plots per game in one figure now for the supplement…
acostarelli May 22, 2024
6d11f14
updated aggregated visuals
acostarelli May 29, 2024
3930af9
final aggregation style
acostarelli Jun 3, 2024
8465ecc
deleting chain_reaciton
acostarelli Jun 3, 2024
bc46af7
add license
acostarelli Jun 3, 2024
6c41aad
vestigial elo rating
acostarelli Jun 3, 2024
aca2bd8
never really used this script
acostarelli Jun 3, 2024
6c7b16b
remove unused b&p data
acostarelli Jun 3, 2024
e18072b
added rules to beginning to generate appendix
acostarelli Jun 3, 2024
92a1cea
most updated arctic scavengers
acostarelli Jun 3, 2024
baa22d5
deleted binary files
acostarelli Jun 3, 2024
0c37c91
remove atari boxing
acostarelli Jun 3, 2024
abe28a7
generated figures will go in here
acostarelli Jun 3, 2024
85489e6
updated single-game figures
acostarelli Jun 3, 2024
cc008f4
Fixed spacing
acostarelli Jun 3, 2024
aa1a922
add website
acostarelli Jun 3, 2024
13dd255
updated figures
acostarelli Jun 3, 2024
29c7965
updated figures
acostarelli Jun 3, 2024
b6ca7f0
better formatting for aggregated visuals
acostarelli Jun 4, 2024
15c1419
more details in readme
acostarelli Jun 4, 2024
03080f4
seaborn for aggregated visuals
acostarelli Jun 4, 2024
69c68f1
collecting more rap data; added back boxing data
acostarelli Jun 4, 2024
a6fe550
collected more rap data
acostarelli Jun 4, 2024
8336022
collected new rap data
acostarelli Jun 4, 2024
b9bc83d
removing OLD rap data
acostarelli Jun 4, 2024
6df9d42
spider plot
acostarelli Jun 4, 2024
fd089c6
fixed self_eval prompt
acostarelli Jun 4, 2024
f3add74
removed boxing data again; will not come back; goodbye boxing data
acostarelli Jun 4, 2024
9e34886
using more bootstarpping and a boxplot now
acostarelli Jun 4, 2024
b911452
added total table generation
acostarelli Jun 4, 2024
d827186
properly include original license and disclaimer of modifications
acostarelli Jun 4, 2024
0163c38
updated figures
acostarelli Jun 4, 2024
07fc5d3
consolidating rating code from plot generating code; attributing choi…
acostarelli Jun 5, 2024
7a95230
changed rap to gpt4-rap
acostarelli Jun 5, 2024
f290376
updated figures
acostarelli Jun 5, 2024
f30cdc9
datatables
acostarelli Jun 5, 2024
0d686ab
added human data
acostarelli Jun 6, 2024
6f3265f
one to rule them all
acostarelli Jun 6, 2024
f5ffba2
changing naming convention
acostarelli Jun 6, 2024
50469e6
final figures
acostarelli Jun 6, 2024
d013c37
Update README.md
grace-sodunke Jun 12, 2024
4e463ab
Update README.md
grace-sodunke Jun 12, 2024
27b9639
Update README.md
grace-sodunke Jun 12, 2024
186ba0a
Update requirements.txt
grace-sodunke Jun 12, 2024
c331d6b
Update make_visuals.py
grace-sodunke Jun 12, 2024
2d39354
Update README.md
grace-sodunke Jun 12, 2024
6b44cbf
update READMe
acostarelli Jun 12, 2024
df4b81c
unused visuals files
acostarelli Jun 12, 2024
a62300d
updated agent names
acostarelli Jun 12, 2024
19edbe1
Update README.md
grace-sodunke Jun 12, 2024
cf37c10
Update README.md
grace-sodunke Jun 12, 2024
e04efe2
Update requirements.txt
grace-sodunke Jun 12, 2024
c6f4fb2
removed website link temporarily
acostarelli Jun 23, 2024
2968355
added website back ::)
acostarelli Jun 27, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
credentials.json

**/__pycache__/

# Created by https://www.toptal.com/developers/gitignore/api/python
# Edit at https://www.toptal.com/developers/gitignore?templates=python

Expand Down
21 changes: 21 additions & 0 deletions LICENSE.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2024 Anthony Costarelli

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
38 changes: 34 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,42 @@
# Setup
In the repo root:
# [GameBench: Evaluating Strategic Reasoning Abilities of LLM Agents](https://gamebench-website.vercel.app/)

This repository contains both the code for the benchmark and the data we collected so far.

The code is available under the MIT license, and the data are available under the CC-BY license.

The match data is located in [`matches.json`](https://github.com/Joshuaclymer/GameBench/tree/main/matches.json).

### Setup
In the repository root:

```
conda create -n gameenv python=3.10
conda activate gameenv
pip install -e .
```
Ask Josh for the credentials file.
You must provide your own OpenAI API key in a file `credentials.json` at the top-level directory. It should have the format:
```json
{
"openai_api_key": "your_openai_api_key_here"
}
```

### Replicating figures

The Python script [`generate_all_results.py`](https://github.com/Joshuaclymer/GameBench/tree/main/generate_all_results.py) generates all the figures from the paper into [`figures/`](https://github.com/Joshuaclymer/GameBench/tree/main/figures/). Use the command:

```py
python3 generate_all_results.py
```

### Collecting data

The scripts provided in [`scripts/`](https://github.com/Joshuaclymer/GameBench/tree/main/scripts/) run some individual games with preconfigured settings. You can run/modify these scripts or create another. To run a script, execute:
```sh
sh ./scripts/<script_name>.sh
```

Alternatively, you can run `api.play_game.play_game` directly from a Python script created in the top-level directory.

### `llm-reasoners` dependency

Expand All @@ -19,4 +49,4 @@ Ask Josh for the credentials file.
journal={arXiv preprint arXiv:2305.14992},
year={2023}
}
```
```
161 changes: 135 additions & 26 deletions agents/gpt.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,15 @@
from collections import defaultdict
from dataclasses import dataclass, field
from api.classes import Agent, AvailableActions, Action, Observation, Rules
import random
import openai
import api.util as util
import ast
import json
from PIL import Image
import base64
from io import BytesIO
import re


action_format_instructions_no_openended = """\
Expand All @@ -27,6 +32,15 @@
api_key=util.load_json("credentials.json")["openai_api_key"]
)

tokens = defaultdict(int)
def completions(*args, **kwargs):
ret = openai_client.chat.completions.create(*args, **kwargs)

model = kwargs["model"]
tokens[f"{model}_input"] += ret.usage.prompt_tokens
tokens[f"{model}_output"] += ret.usage.completion_tokens
print("*******************", tokens)
return ret

@dataclass
class OpenAITextAgent(Agent):
Expand All @@ -48,15 +62,66 @@ def take_action(
available_actions: AvailableActions,
show_state: bool,
):
messages = [{"role": "system", "content": self.system_message}]
valid_actions = []
prompt = f"You are playing a game called {rules.title}. The rules are as follows:\n{rules.summary}\n"
if rules.additional_details != None:
prompt += "The following are headings with additional information about the rules that you can expand by taking the action Explain(<heading key>).\n"
details_dict = {f"H{i+1}": topic for i, topic in enumerate(rules.additional_details)}
details_dict = {
f"H{i+1}": topic for i, topic in enumerate(rules.additional_details)
}
prompt += json.dumps(details_dict, indent=4)
valid_actions.extend(f"Explain({h})" for h in list(details_dict.keys()))
#valid_actions.extend(f"Explain({h})" for h in list(details_dict.keys()))

prompt += f"\n# Observation\nThe following describes the current state of the game:\n{observation.text}\n"
if observation.image is not None:
if self.openai_model == "gpt-4-1106-preview":
self.print("Image observation recieved.")
buffered = BytesIO()
observation.image.save(buffered, format="JPEG")
base64_image = base64.b64encode(buffered.getvalue())
messages.append(
{
"role": "user",
"content": [
{"type": "text", "text": prompt},
{
"type": "image",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}"
},
},
],
}
)
prompt = ""
else:
self.print("Image observation recieved. Using GPT4 to generate text description.")
buffered = BytesIO()
image.save(buffered, format="JPEG")
base64_image = base64.b64encode(buffered.getvalue())

imagedesc = completions(
model="gpt-4-vision-preview",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": prompt
},
{
"type": "image",
"image_url": {"url": f"data:image/jpeg;base64,{base64_image}"},
},
],
}
],
).choices[0].message.content
prompt += imagedesc
observation.image = None

assert available_actions.predefined != {} or available_actions.openended != {}
prompt += f"\n# Actions\n"
prompt += f"{available_actions.instructions}\n"
Expand All @@ -83,16 +148,17 @@ def take_action(
):
prompt += "Return the action Explain(<action>) to receive additional info about what any of the above actions do.\n"

messages = [{"role": "system", "content": self.system_message}]

# Chain of Thought
if self.mode == 1:
prompt += "First, let's reason out loud about which action you should take to maximize your probability of winning."
messages.append({"role": "user", "content": prompt})

response = (
openai_client.chat.completions.create(
model=self.openai_model, messages=messages
completions(
model=self.openai_model
if observation.image is None
else "gpt-4-vision-preview",
messages=messages,
)
.choices[0]
.message.content
Expand All @@ -109,18 +175,19 @@ def take_action(

messages.append({"role": "user", "content": prompt})
response = (
openai_client.chat.completions.create(
model=self.openai_model, messages=messages
completions(
model=self.openai_model
if observation.image is None
else "gpt-4-vision-preview",
messages=messages,
)
.choices[0]
.message.content
)
messages.append({"role": "assistant", "content": response})
prompt = ""

self.print(
f"GPT listed the following actions as possibilities: {response}"
)
self.print(f"GPT listed the following actions as possibilities: {response}")

prompt += "\nTo summarize, if you choose a predefined action, you must return json with an 'action' key which contains one of the following valid actions:\n"
prompt += str(list(available_actions.predefined))
Expand All @@ -131,8 +198,10 @@ def take_action(
result = None
for _ in range(self.max_retries):
response = (
openai_client.chat.completions.create(
model=self.openai_model,
completions(
model=self.openai_model
if observation.image is None
else "gpt-4-vision-preview",
response_format={"type": "json_object"},
messages=messages,
)
Expand All @@ -143,17 +212,44 @@ def take_action(
self.print("GPT responded with", response)

try:
action = ast.literal_eval(response)
action = ast.literal_eval(response.strip())
action["action"]
except:
self.print("GPT returned invalid JSON")
continue

if action["action"] in available_actions.openended and "openended_response" not in action:
self.print("GPT chose openended action but didn't include response", action)
if (
action["action"] in available_actions.openended
and "openended_response" not in action
):
self.print(
"GPT chose openended action but didn't include response", action
)
error_message = "You chose an openended action, and so your json must have an 'openended_response' key."
messages.append({"role": "user", "content": error_message})
continue

try:
explain = re.findall(r"Explain\((H\d+)\)", action["action"])
if len(explain):
self.print("GPT is asking for rules explanation.")
rule = details_dict[explain[0]]
desc = rules.additional_details[rule]
messages.append({"role": "user", "content": desc})
continue

explain = re.findall(r"Explain\((.+)\)", action["action"])
if len(explain):
self.print("GPT is asking for action explanation.")
desc = available_actions.predefined.get(explain[0], "") + available_actions.openended.get(explain[0], "")
messages.append({"role": "user", "content": desc})
continue
except:
self.print("GPT tried asking for an expalanation but failed.")
error_message = "This is an invalid Explain action."
messages.append({"role": "user", "content": error_message})
continue

if action["action"] in valid_actions:
self.print("GPT chose valid action", action)
result = action
Expand All @@ -167,35 +263,48 @@ def take_action(
messages.append({"role": "user", "content": error_message})
if result == None:
self.print(
f"WARNING: GPT returned an a random action after {self.max_retries} tries"
f"WARNING: GPT returned too many invalid actions after {self.max_retries} tries"
)
return Action(action_id=None)

return Action(
action_id=result["action"],
openended_response=result.get("openended_response"),
)


@dataclass
class ChatGPTText(OpenAITextAgent):
class GPT3(OpenAITextAgent):
openai_model: str = "gpt-3.5-turbo-1106"
agent_type_id: str = "gpt-3"
mode: int = 0

@dataclass
class GPT3CoT(OpenAITextAgent):
openai_model: str = "gpt-3.5-turbo-1106"
agent_type_id: str = "gpt-3.5"
agent_type_id: str = "gpt-3-cot"
mode: int = 1

@dataclass
class GPT3BaP(OpenAITextAgent):
openai_model: str = "gpt-3.5-turbo-1106"
agent_type_id: str = "gpt-3-bap"
mode: int = 2

@dataclass
class GPT4Text(OpenAITextAgent):
class GPT4(OpenAITextAgent):
openai_model: str = "gpt-4-1106-preview"
agent_type_id: str = "gpt-4"

mode: int = 0

@dataclass
class ChainOfThought(OpenAITextAgent):
class GPT4CoT(OpenAITextAgent):
openai_model: str = "gpt-4-1106-preview"
agent_type_id: str = "cot"
agent_type_id: str = "gpt-4-cot"
mode: int = 1

@dataclass
class BabbleAndPrune(OpenAITextAgent):
class GPT4BaP(OpenAITextAgent):
openai_model: str = "gpt-4-1106-preview"
agent_type_id: str = "b&p"
mode: int = 2
agent_type_id: str = "gpt-4-bap"
mode: int = 2
4 changes: 2 additions & 2 deletions agents/random_agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,5 @@ class RandomAgent(Agent):
agent_type_id : str = "random"

def take_action(self, rules : Rules, observation: Observation, available_actions: AvailableActions, show_state : bool):
actions = list(available_actions.predefined.keys())
return Action(action_id=random.choice(actions))
actions = list(available_actions.predefined.keys()) + list(available_actions.openended.keys())
return Action(action_id=random.choice(actions), openended_response="")
4 changes: 2 additions & 2 deletions agents/rap/agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,9 @@ class ReasoningViaPlanning(Agent, WorldModel, SearchConfig):
"""Inherents Agent from api.classes, and WorldModel and SearchConfig
from the llm-agents library."""

agent_type_id: str = "rap"
agent_type_id: str = "gpt4-rap"
transparent_reasoning: bool = False
agent_type: int = 0 # 0 = random replies, 1 = human interaction, 2 = openai
agent_type: int = 2 # 0 = random replies, 1 = human interaction, 2 = openai

context_builder: Callable[[str, str], ContextType] = None
completions: CompletionsFunction = None
Expand Down
4 changes: 2 additions & 2 deletions agents/rap/chat.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ def probabilities(

top_logprobs = (
openai_client.chat.completions.create(
model="gpt-3.5-turbo-1106",
model=model,
messages=context,
logprobs=True,
top_logprobs=n,
Expand Down Expand Up @@ -156,7 +156,7 @@ def image_description(image: Image, rules: Rules) -> str:
"content": [
{
"type": "text",
"text": "You are playing a game called {rules.title}. The rules are as follows: {rules.summary}.\nThis image is your observation of the game. Describe what's going on in the image.",
"text": f"You are playing a game called {rules.title}. The rules are as follows: {rules.summary}.\nThis image is your observation of the game. Describe what's going on in the image.",
},
{
"type": "image",
Expand Down
Loading