tinygym
reimplements flashrl
, while using tinygrad
instead of torch
🛠️ pip install tinygym
or clone the repo & pip install -r requirements.txt
- If cloned (or if envs changed), compile:
python setup.py build_ext --inplace
The README of flashrl
is mostly valid for tinygym
, with the biggest difference being:
tinygym
is not fast (yet) -> Learns Pong in ~5 minutes instead of 5 seconds (on a RTX 3090)
Just like in flashrl
, python train.py
should look like this (with the progress bar moving ~60x slower):
Check out the onefile
branch, if you want to make it fast(=try to make TinyJit
work)!
The most important difference (enabled RL after 2 hours of debugging):
- Use
.abs().clip(min_=1e-8)
inppo
to avoid close to zero values in(value - ret)
Without this, the optimizer step can result in NaNs and "RL doesn't work" 😜
To potentially enable tinygrad.TinyJit
(does not work yet, hence the slowness)
Learner
does not.setup_data
androllout
is a function (instead of aLearner
method) that fills a list with Tensors and.stack
s them at the end
Since it somehow performs better
.uniform
(tinygrad
default) instead of.kaiming_uniform
(torch
default) weight initialization fornn.Linear
Custom tinygrad
rewrites of torch.nn.init.orthogonal_
& torch.nn.utils.clip_grad_norm_
are used
You'll find a .detach()
here and a .contiguous()
there, but other than that tinygym
=flashrl
🤝
I want to thank
- George Hotz and the tinygrad team for commoditizing the petaflop! Star tinygrad ⭐
- Andrej Karpathy for commoditizing RL knowledge! Star pg-pong ⭐
and last but not least...