Skip to content

Commit 4f24fce

Browse files
committed
Pickles, argument parsing and README.
1 parent 4613d70 commit 4f24fce

18 files changed

+1328
-111
lines changed

README.md

+50-18
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,22 @@
11
# Reinforcement Learning
2-
Plays the Q*bert game by reinforcement learning. This is assignment 3 of the ECSE-526 class, as described [here](http://www.cim.mcgill.ca/~jer/courses/ai/assignments/as3.html).
2+
Plays the Qbert game by reinforcement learning. This is assignment 3 of the ECSE-526 class, as described [here](http://www.cim.mcgill.ca/~jer/courses/ai/assignments/as3.html).
33

44
## Installation
55

66
### Library Dependencies
77

8-
## Usage
8+
The main dependency of the program is `numpy`, which can be installed via `pip`. For plotting, `matplotlib` is used.
99

10-
### Example Commands
10+
## Usage
1111

12-
Here are some examples of running the program:
12+
To run the program, use the command-line interface in `main.py`, as follows:
1313

1414
```
15-
15+
python main.py
1616
```
1717

18-
### Help
1918

20-
To run the program, use the command-line interface in `main.py`. To see the list of available commands, run the following:
19+
To see the list of available commands, run the following:
2120

2221
```
2322
python main.py --help
@@ -26,25 +25,58 @@ python main.py --help
2625
This will print the following:
2726

2827
```
29-
28+
usage: main.py [-h] [-l {info,debug,critical,warn,error}] [-e NUM_EPISODES]
29+
[-o LOAD_LEARNING_FILENAME] [-f SAVE_LEARNING_FILENAME]
30+
[-p PLOT_FILENAME] [-c CSV_FILENAME] [-d DISPLAY_SCREEN]
31+
[-s {simple,verbose}]
32+
[-a {block,enemy,friendly,subsumption,combined_verbose}]
33+
[-x {random,optimistic,combined}]
34+
[-m {manhattan,hamming,same_result}] [-r RANDOM_SEED]
35+
[-i SHOW_IMAGE]
36+
37+
Reinforcement Learning with Qbert.
38+
39+
optional arguments:
40+
-h, --help show this help message and exit
41+
-l {info,debug,critical,warn,error}, --logging_level {info,debug,critical,warn,error}
42+
The logging level.
43+
-e NUM_EPISODES, --num_episodes NUM_EPISODES
44+
The number of training episodes.
45+
-o LOAD_LEARNING_FILENAME, --load_learning_filename LOAD_LEARNING_FILENAME
46+
The pickle file to load learning data from. To run the
47+
agent with pre-trained Q data, set this parameter to
48+
'data'
49+
-f SAVE_LEARNING_FILENAME, --save_learning_filename SAVE_LEARNING_FILENAME
50+
The pickle file to save learning data to.
51+
-p PLOT_FILENAME, --plot_filename PLOT_FILENAME
52+
The filename to save a score plot to.
53+
-c CSV_FILENAME, --csv_filename CSV_FILENAME
54+
The filename to save a score CSV file to.
55+
-d DISPLAY_SCREEN, --display_screen DISPLAY_SCREEN
56+
Whether to display the ALE screen.
57+
-s {simple,verbose}, --state_representation {simple,verbose}
58+
The state representation to use.
59+
-a {block,enemy,friendly,subsumption,combined_verbose}, --agent_type {block,enemy,friendly,subsumption,combined_verbose}
60+
The agent type to use.
61+
-x {random,optimistic,combined}, --exploration {random,optimistic,combined}
62+
The exploration mode to use.
63+
-m {manhattan,hamming,same_result}, --distance_metric {manhattan,hamming,same_result}
64+
The distance metric to use.
65+
-r RANDOM_SEED, --random_seed RANDOM_SEED
66+
The random seed to use.
67+
-i SHOW_IMAGE, --show_image SHOW_IMAGE
68+
Whether to show a screenshot at the end of every
69+
episode.
3070
```
3171

3272
### Default Values
3373

34-
Here are the default values for all the optional arguments:
35-
36-
Argument | Default Value
37-
--- | ---
38-
`--a` | `a`
39-
`--b` | `b`
40-
`--c` | `c/`
41-
`--d` | d
42-
`--e` | `e`
43-
`--f` | f
74+
The default values of all the parameters can be found in the `main.py` file.
4475

4576

4677
## Code Organization
4778

79+
The bulk of the code can be found the `actions.py`, `agent.py`, `learner.py` and `world.py` files.
4880

4981
## Report
5082

agent.py

-1
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,6 @@ def __init__(self, agent_type='subsumption', random_seed=123, frame_skip=4, repe
4646
self.agent = QbertCombinedVerboseAgent(random_seed, frame_skip, repeat_action_probability, sound,
4747
display_screen, alpha, gamma, epsilon, unexplored_threshold,
4848
unexplored_reward, exploration, distance_metric)
49-
5049
self.world = self.agent.world
5150

5251
def action(self):

main.py

+44-91
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,9 @@ def play_learning_agent(num_episodes=2, show_image=False, load_learning_filename
2121
save_learning_filename=None, plot_filename=None, csv_filename=None, display_screen=False,
2222
state_representation='simple', agent_type='subsumption', exploration=None,
2323
distance_metric=None, random_seed=123):
24+
"""
25+
Let the learning agent play with the specified parameters.
26+
"""
2427
logging.info('Plot filename: {}'.format(plot_filename))
2528
logging.info('Agent type: {}'.format(agent_type))
2629
logging.info('Distance metric: {}'.format(distance_metric))
@@ -68,105 +71,55 @@ def setup_logging(level):
6871
datefmt='%d-%m-%Y:%H:%M:%S',
6972
level=LOGGING_LEVELS[level])
7073

71-
7274
def parse_command_line_arguments():
7375
"""
7476
Parse the command-line arguments provided by the user.
7577
"""
76-
parser = ArgumentParser(description='Reinforcement Learning with Q*bert.')
78+
parser = ArgumentParser(description='Reinforcement Learning with Qbert.')
7779
parser.add_argument('-l', '--logging_level', default='info', choices=LOGGING_LEVELS.keys(),
7880
help='The logging level.')
79-
80-
subparsers = parser.add_subparsers()
81-
82-
args = parser.parse_args()
81+
parser.add_argument('-e', '--num_episodes', default=100, type=int, help='The number of training episodes.')
82+
parser.add_argument('-o', '--load_learning_filename', default=None,
83+
help="The pickle file to load learning data from. To run the agent with pre-trained Q data, set"
84+
" this parameter to 'data'")
85+
parser.add_argument('-f', '--save_learning_filename', default=None,
86+
help='The pickle file to save learning data to.')
87+
parser.add_argument('-p', '--plot_filename', default=None,
88+
help='The filename to save a score plot to.')
89+
parser.add_argument('-c', '--csv_filename', default=None,
90+
help='The filename to save a score CSV file to.')
91+
parser.add_argument('-d', '--display_screen', default=False, type=bool,
92+
help='Whether to display the ALE screen.')
93+
parser.add_argument('-s', '--state_representation', default='simple', choices=['simple', 'verbose'],
94+
help='The state representation to use.')
95+
parser.add_argument('-a', '--agent_type', default='subsumption',
96+
choices=['block', 'enemy', 'friendly', 'subsumption', 'combined_verbose'],
97+
help='The agent type to use.')
98+
parser.add_argument('-x', '--exploration', default='combined', choices=['random', 'optimistic', 'combined'],
99+
help='The exploration mode to use.')
100+
parser.add_argument('-m', '--distance_metric', default=None, choices=['manhattan', 'hamming', 'same_result'],
101+
help='The distance metric to use.')
102+
parser.add_argument('-r', '--random_seed', default=None, type=int,
103+
help='The random seed to use.')
104+
parser.add_argument('-i', '--show_image', default=False, type=bool,
105+
help='Whether to show a screenshot at the end of every episode.')
106+
107+
args = parser.parse_args('-help'.split())
83108
setup_logging(args.logging_level)
84-
args.func(args)
85-
86-
87-
def save_generalization_results():
88-
distance_metric = 'no_generalization'
89-
play_learning_agent(num_episodes=100, plot_filename=distance_metric, csv_filename=distance_metric,
90-
display_screen=False, agent_type='combined_verbose', exploration=None, distance_metric=None)
91-
92-
distance_metric = 'manhattan'
93-
play_learning_agent(num_episodes=100, plot_filename=distance_metric, csv_filename=distance_metric,
94-
display_screen=False, agent_type='combined_verbose', exploration=None,
95-
distance_metric=distance_metric)
96-
97-
distance_metric = 'hamming'
98-
play_learning_agent(num_episodes=100, plot_filename=distance_metric, csv_filename=distance_metric,
99-
display_screen=False, agent_type='combined_verbose', exploration=None,
100-
distance_metric=distance_metric)
101-
102-
distance_metric = 'same_result'
103-
play_learning_agent(num_episodes=100, plot_filename=distance_metric, csv_filename=distance_metric,
104-
display_screen=False, agent_type='combined_verbose', exploration=None,
105-
distance_metric=distance_metric)
106-
107-
filename = 'subsumption_generalization'
108-
play_learning_agent(num_episodes=100, plot_filename=filename, csv_filename=filename,
109-
display_screen=False, agent_type='subsumption', exploration=None,
110-
distance_metric=None, save_learning_filename='subsumption_dangerous_no_exploration')
111-
112-
113-
def save_exploration_results():
114-
filename = 'subsumption_random'
115-
play_learning_agent(num_episodes=100, plot_filename=filename, csv_filename=filename,
116-
display_screen=False, agent_type='subsumption', exploration='random',
117-
distance_metric=None, save_learning_filename='subsumption_dangerous_random')
118-
119-
filename = 'subsumption_optimistic'
120-
play_learning_agent(num_episodes=100, plot_filename=filename, csv_filename=filename,
121-
display_screen=False, agent_type='subsumption', exploration='optimistic',
122-
distance_metric=None, save_learning_filename='subsumption_dangerous_optimistic')
123-
124-
filename = 'subsumption_combined'
125-
play_learning_agent(num_episodes=100, plot_filename=filename, csv_filename=filename,
126-
display_screen=False, agent_type='subsumption', exploration='combined',
127-
distance_metric=None, save_learning_filename='subsumption_dangerous_combined')
128-
129-
130-
def save_performance_results():
131-
filename = 'seed123'
132-
play_learning_agent(num_episodes=100, plot_filename=filename, csv_filename=filename,
133-
display_screen=False, agent_type='subsumption', exploration='combined',
134-
distance_metric=None, save_learning_filename='subsumption_dangerous_combined_123',
135-
random_seed=123)
136-
137-
filename = 'seed459'
138-
play_learning_agent(num_episodes=100, plot_filename=filename, csv_filename=filename,
139-
display_screen=False, agent_type='subsumption', exploration='combined',
140-
distance_metric=None, save_learning_filename='subsumption_dangerous_combined_459',
141-
random_seed=459)
142-
143-
filename = 'seed598'
144-
play_learning_agent(num_episodes=100, plot_filename=filename, csv_filename=filename,
145-
display_screen=False, agent_type='subsumption', exploration='combined',
146-
distance_metric=None, save_learning_filename='subsumption_dangerous_combined_598',
147-
random_seed=459)
148-
149-
150-
def continued_learning():
151-
filename = 'seed459_600'
152-
play_learning_agent(num_episodes=100, plot_filename=filename, csv_filename=filename,
153-
display_screen=False, agent_type='subsumption', exploration='combined',
154-
distance_metric=None, save_learning_filename='subsumption_dangerous_combined_459_600',
155-
random_seed=459, load_learning_filename='subsumption_dangerous_combined_459_500')
156-
157-
158-
def sample_play():
159-
play_learning_agent(num_episodes=100,
160-
display_screen=True, agent_type='subsumption', exploration='combined',
161-
distance_metric=None,
162-
random_seed=459, load_learning_filename='subsumption_dangerous_combined_459_400')
109+
play_learning_agent(num_episodes=args.num_episodes,
110+
load_learning_filename=args.load_learning_filename,
111+
save_learning_filename=args.save_learning_filename,
112+
plot_filename=args.plot_filename,
113+
csv_filename=args.csv_filename,
114+
display_screen=args.display_screen,
115+
state_representation=args.state_representation,
116+
agent_type=args.agent_type,
117+
exploration=args.exploration,
118+
distance_metric=args.distance_metric,
119+
random_seed=args.random_seed,
120+
show_image=args.show_image)
163121

164122

165123
if __name__ == '__main__':
166124
setup_logging('info')
167-
# play_learning_agent()
168-
# save_generalization_results()
169-
# save_exploration_results()
170-
# save_performance_results()
171-
continued_learning()
172-
# sample_play()
125+
parse_command_line_arguments()

pickle/data_block_N.pkl

60.1 KB
Binary file not shown.

pickle/data_block_Q.pkl

77 KB
Binary file not shown.

pickle/data_enemy_N.pkl

4.38 KB
Binary file not shown.

pickle/data_enemy_Q.pkl

5.79 KB
Binary file not shown.

pickle/data_friendly_N.pkl

1.28 KB
Binary file not shown.

pickle/data_friendly_Q.pkl

1.75 KB
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.

plotter.py

-1
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,6 @@ def plot_scores(scores, filename):
2121
x_smooth = np.linspace(1, len(scores), 200)
2222
y_smooth = spline(x_points, y_points, x_smooth)
2323

24-
# plt.plot(x_points, y_points, 'o', label='Data')
2524
plt.plot(x_smooth, y_smooth, 'C0', label='Score')
2625
plt.xlabel('Number of episodes')
2726
plt.ylabel('Score')

0 commit comments

Comments
 (0)