Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
736 commits
Select commit Hold shift + click to select a range
e6b2e6e
Super net config sampling
Sep 27, 2021
5f0b21e
Add a unit test for OneHotActions
Sep 29, 2021
57f27db
Adds unit test for columnvector function
Sep 29, 2021
c703915
Update docstring for transforms.py
igfox Sep 30, 2021
b5afcc0
Allow obj_func be optional (#548)
czxttkl Sep 30, 2021
c41b961
Fix rasp tests (#550)
czxttkl Sep 30, 2021
0517902
Add test_gym_replay_buffer (#549)
czxttkl Sep 30, 2021
2e71682
Remove `ABC` from `LightningModule` (#9517)
Sep 30, 2021
603387e
Fix gym_cpu_unittest (#551)
czxttkl Sep 30, 2021
48a5a28
Deprecate TrainerProperties Mixin and move property definitions direc…
Oct 1, 2021
9b7281d
Fix last two circle ci tests (#552)
czxttkl Oct 1, 2021
f8bb0bf
Change clampping of probability feature preprocessing. (#553)
Oct 1, 2021
d219a0c
Change fb core types from namedtuple to dataclass (#554)
czxttkl Oct 7, 2021
46de5c3
add basic MAB classes to reagent
alexnikulkov Oct 7, 2021
bb357dc
Move ReAgent MAB from numpy to PyTorch
alexnikulkov Oct 7, 2021
34fe167
suppress errors in `reagent`
Oct 8, 2021
4808562
copy possible_action_maks from the env at each step instead of re-usi…
alexnikulkov Oct 9, 2021
b70c43e
Improve REINFORCE trainer (#558)
czxttkl Oct 11, 2021
dba2fd9
Convert possible_actions_mask to a Tensor (#556)
alexnikulkov Oct 12, 2021
4f8fe65
Fix ReAgentLightningModule (#559)
czxttkl Oct 12, 2021
2b65e91
suppress errors in `reagent`
Oct 13, 2021
1e2b265
Adding Bayesian Optimization Optimizer (#560)
PavlosApo Oct 13, 2021
4ce275b
Adding Bayesian Optimization Optimizer with ensemble of feedforward n…
PavlosApo Oct 13, 2021
57b58a8
add assertion for non-empty possible action mask (#557)
alexnikulkov Oct 14, 2021
103893c
suppress errors in `reagent`
Oct 18, 2021
471defa
Add Thompson Sampling to ReAgent MAB and refactor the UCB classes and…
alexnikulkov Oct 19, 2021
b60b23d
Add basic MAB simulation tools to ReAgent (#566)
alexnikulkov Oct 19, 2021
263a7ff
Add variance estimates to UCB
alexnikulkov Oct 19, 2021
9531e9c
Add MAB unittests to CircleCI test config (#567)
alexnikulkov Oct 19, 2021
25a2692
Add support for `len(datamodule)` (#9895)
tangbinh Oct 21, 2021
453d362
Add typing for `LightningOptimizer` (#9990)
tangbinh Oct 21, 2021
6cf1949
Fix `optimizers` overloads typing annotation (#10069)
tangbinh Oct 26, 2021
63bbb92
fix CircleCI test config for MAB (#568)
alexnikulkov Oct 27, 2021
5b09e5a
expose output layer activation in FC network and DQN (#572)
alexnikulkov Oct 28, 2021
cab64f8
use known batch size when sampling (#569)
Oct 30, 2021
e2c2674
Add support for BCE loss for reward decomposition. (#573)
Nov 4, 2021
02cfe37
add LinUCB trainer to reagent (#574)
alexnikulkov Nov 5, 2021
b1a3c17
Remove deprecated dataloader arguments in Trainer methods (#10325) (#…
edward-io Nov 10, 2021
756e441
Fix report coverage command
czxttkl Nov 11, 2021
ba25ae3
Update ReAgent docs (#577)
czxttkl Nov 11, 2021
e3ac3d2
Add info to arguments of post_episode_callback in Agent (#576)
alexnikulkov Nov 11, 2021
2e9e639
Remove deprecated accelerator pass through functions in Accelerator (…
aazzolini Nov 17, 2021
12ec6fe
suppress errors in `fbcode/reagent` - batch 1
Nov 18, 2021
ed0d44b
update Java version (#580)
alexnikulkov Nov 18, 2021
c932ee1
add optimize=False to reagent optimizer configs (#581)
alexnikulkov Nov 18, 2021
0f16378
remove deprecated train_loop (#10482)
Nov 18, 2021
b870a24
Warn instead of throwing an exception if an operator doesn't support …
alexnikulkov Nov 19, 2021
56f5de7
add datamodule and check if test_step is implemented in trainer.test(…
alexnikulkov Nov 19, 2021
f4fdfc1
reagent MAB: randomize argmax, add lower bound on variance estimate, …
alexnikulkov Nov 20, 2021
62661e3
add batch update mode to MAB simulation (#579)
alexnikulkov Nov 20, 2021
b548476
Fix documents
czxttkl Nov 24, 2021
0bdcebc
Add copyright in files (#585)
czxttkl Nov 24, 2021
83b7fda
Fix another circleci test (#586)
czxttkl Nov 24, 2021
4c470f4
Fix more integration tests (#587)
czxttkl Nov 24, 2021
4ab19c5
Add an internal product model manager for signal loss
czxttkl Nov 24, 2021
4316cc7
Fix loop examples after Accelerator API removals (#10514)
jjenniferdai Dec 1, 2021
e130880
remove deprecated `reload_dataloaders_every_epoch` from `Trainer` (#1…
jjenniferdai Dec 1, 2021
c33fa6d
add env flag to skip frozen registry check (#589)
alexnikulkov Dec 2, 2021
2efda82
Refactor progress bar initialization to avoid extra attribute set on …
jjenniferdai Dec 2, 2021
8303179
Add OSS BanditRewardNetPredictorWrapper and enable exporting reward m…
gji1 Dec 5, 2021
5f91696
Update README.md
czxttkl Dec 5, 2021
218b61e
Add RL Cookbook
czxttkl Dec 6, 2021
9b6152f
Update update_requirements
four4fish Dec 8, 2021
d53def2
Print adjusted Direct method score (#591)
Dec 8, 2021
1ca6382
Add some comments
Dec 8, 2021
84b88f9
Miscellaneous fixes and improvements
czxttkl Dec 17, 2021
a46b53a
suppress errors in `reagent`
Dec 18, 2021
89ae941
Remove redundant special case for disabling the progress bar on TPU (…
Dec 18, 2021
e214d1c
suppress errors in `reagent`
Dec 18, 2021
075af5f
Enable logging hparams only if there are any (#11105)
Dec 20, 2021
e1c24b9
Update CircleCI config to use a newer Xcode version (#592)
gji1 Dec 22, 2021
a46fd19
Add copyright header (#594)
czxttkl Dec 26, 2021
480e1d8
Fix the batch size issue cuased by recent OSS PyTorch Lightning chang…
gji1 Dec 28, 2021
96abcbd
suppress errors in `fbcode/reagent` - batch 1
Dec 28, 2021
52f6666
fix flaky MAB test (#595)
alexnikulkov Dec 29, 2021
9a24f82
behavioral cloning (#598)
Jan 21, 2022
3872994
Add missing __init__ file (#600)
gji1 Jan 26, 2022
d82e850
better documentation for reagent lite
czxttkl Jan 27, 2022
91b2d85
Add doc string tests for bayesian optimizer (#601)
czxttkl Jan 27, 2022
09e36a9
Update requirements after sync
Jan 28, 2022
b40d474
Add torchrec dependencies to reagent (#597)
czxttkl Feb 8, 2022
c35a42c
Feature config change
czxttkl Feb 10, 2022
3706c13
Data reading and transformation
czxttkl Feb 10, 2022
9dc37d9
Add net builder of sparse arch-based reward decomposition models
czxttkl Feb 10, 2022
fa0f2d7
Model and trainer
czxttkl Feb 10, 2022
532184b
Necessary changes in model managers
czxttkl Feb 10, 2022
d5d031f
All other necessary changes to accommodate previous changes
czxttkl Feb 10, 2022
90882b8
Tests for <Add sparse features to reward decomposition> (#604)
czxttkl Feb 10, 2022
b1a306a
All small fixes to make all tests pass (#605)
czxttkl Feb 10, 2022
2237d0a
Add foreach flag to reagent optimizer configs (#606)
mikaylagawarecki Feb 14, 2022
e9d68f9
Add annotations to `reagent`
Feb 15, 2022
dd4438d
Add annotations to `reagent`
Feb 18, 2022
7a27ed1
add log performance of each episode (#607)
Feb 23, 2022
40875dd
Improve debuggability of free/reagent (#608)
czxttkl Feb 24, 2022
3eaf1cf
Refactor sparse arch and interaction arch (#609)
YLGH Mar 1, 2022
79a1891
Unittest of ModelManager (BehavioralCloning)
Mar 3, 2022
7fec50a
docs: add GH button in support of Ukraine (#613)
dmitryvinn Mar 4, 2022
7d5f5f7
Add support for pluggable Accelerators (#12030)
tangbinh Mar 4, 2022
e3ca3bd
suppress errors in `reagent`
Mar 5, 2022
8c4d127
remove RASP Mac test (#614)
alexnikulkov Mar 7, 2022
a21a1a9
Add callout items to the Docs landing page (#12196) (#189)
tangbinh Mar 9, 2022
c2cca1d
Improve the ReinforceTrainer (#617)
gji1 Mar 10, 2022
b9fbe61
add submit_config for free/reagent (#618)
czxttkl Mar 11, 2022
180cf14
Minor update to bayes optimizer (#615)
czxttkl Mar 12, 2022
b52129b
Update ubuntu (#619)
czxttkl Mar 22, 2022
aaf0c50
Import torchrec (#620)
czxttkl Mar 23, 2022
9048f36
Enable per-batch logging for reinforce trainer (#621)
gji1 Mar 25, 2022
617bf15
suppress errors in `reagent`
Mar 31, 2022
046d50e
suppress errors in `reagent`
Apr 1, 2022
9818378
FbContBanditBatchPreprocessor: add context-arm features; rename state…
alexnikulkov Apr 5, 2022
03b5e28
Detach variables not in the policy net for REINFORCE trainer (#625)
gji1 Apr 6, 2022
6099e83
small change to FixedLengthSequences (#626)
czxttkl Apr 6, 2022
6468d5b
remove num_arms from LinUCBTrainer (#627)
alexnikulkov Apr 7, 2022
d3fe756
Exploration - Prep Work (#628)
Apr 11, 2022
abc08f7
suppress errors in `fbcode/reagent` - batch 1
Apr 13, 2022
52f344a
Register LinearRegressionUCB attribute tensors as buffers (#629)
alexnikulkov Apr 14, 2022
6b8cfb9
Removing device following registration of LinUCB params (#630)
Apr 16, 2022
62779e4
suppress errors in `reagent`
Apr 20, 2022
decb7e4
Fix EpsilonGreedyActionSampler Runtime error and add a test to ensure…
Apr 21, 2022
f60fdd5
suppress errors in `fbcode/reagent` - batch 1
Apr 27, 2022
cc5091e
Add helper functions for KeyedJaggedTensor (#633)
czxttkl Apr 28, 2022
d48968a
quick fix (#634)
czxttkl Apr 29, 2022
607df97
Bayes by Backprop (#637)
ronaldyufb May 6, 2022
eb9b2b7
formatting changes from black 22.3.0
amyreese May 12, 2022
f7ff588
update torch / torchrec to use stable version (#640)
czxttkl May 12, 2022
8fb0706
logger.experiment is only available on rank 0 (#642)
czxttkl May 13, 2022
0b72674
apply import merging for fbcode (8 of 11)
amyreese May 15, 2022
083f457
check input for parametric dqn (#635)
czxttkl May 17, 2022
deb9c67
make reward net optional (#641)
czxttkl May 17, 2022
18af427
Improve SARSA in FREE (#643)
alexnikulkov May 24, 2022
f07130b
Bulk Eval workflow with hive writer (#644)
May 25, 2022
7444a32
expose final activation argument for critic Q-network (#645)
alexnikulkov Jun 1, 2022
81f2774
apply new formatting config
amyreese Jun 10, 2022
a407035
suppress errors in `reagent`
Jun 15, 2022
8bdf290
Add capturable attribute to reagent optimizers (#646)
alexnikulkov Jun 17, 2022
8a05b06
suppress errors in `reagent`
Jun 21, 2022
f9e7484
concat KJTs (#647)
Jun 21, 2022
b62c9de
Neural LinUCB
Jun 21, 2022
a98d8fb
suppress errors in `reagent`
Jun 22, 2022
ae38db0
suppress errors in `reagent`
Jun 23, 2022
dccc3e0
fix feature importance run (#649)
czxttkl Jun 27, 2022
5855949
Make EB/EBC scriptable (#648)
Jun 27, 2022
cb58370
Add check if the logger is set in dqn_trainer (#650)
dkorenkevych Jul 11, 2022
534e87b
Add unit tests for DQNTrainer (#651)
dkorenkevych Jul 11, 2022
14bc01a
Some cpu-only tests do not need cuda version ubuntu (#658)
czxttkl Jul 25, 2022
1002d12
upgrading pyenv (#659)
czxttkl Jul 26, 2022
bd00420
import torchrec properly
czxttkl Jul 27, 2022
cc43b0f
Improve model tests (#657)
czxttkl Jul 29, 2022
722a7a9
Add unit tests for DQNTrainerBaseLightning class (#653)
dkorenkevych Aug 3, 2022
4fcfb3e
Add docstrings to DQNTrainer and DQNTrainerBaseLightning classes (#654)
dkorenkevych Aug 3, 2022
de7d782
Sparse DQN Implementation (#663)
Aug 4, 2022
5c51e4c
upgrade pyre version in `fbcode/reagent` - batch 1
Aug 5, 2022
2b3b3af
fix sparse dqn (#665)
czxttkl Aug 10, 2022
8f67377
test TensorDataClass can be moved to cuda properly (#666)
czxttkl Aug 16, 2022
3203ee8
Reagent DeepRepresentLinucb [1/x] (#664)
Aug 16, 2022
834fd19
Reagent DeepRepresentLinucb [2/x] [quick fix nit] (#668)
Aug 17, 2022
939cc07
Update optimizer signatures in reagent to match new differentiable ar…
alexnikulkov Aug 18, 2022
9085aae
Reagent DeepRepresentLinucb [3/x] add params (#669)
Aug 19, 2022
ebbfb25
Add upper limit on grpcio-tools version (#671)
alexnikulkov Aug 19, 2022
5445d09
Fix TestDeepRepresentLinUCB (#672)
alexnikulkov Aug 19, 2022
1fe8ba8
Disjoint LinUCB model
Aug 22, 2022
db04fa0
improve LinUCB on-demand coefficient calculation (#661)
alexnikulkov Aug 23, 2022
80cd260
simplify the arguments of LinearRegressionUCB (#662)
alexnikulkov Aug 23, 2022
ac165af
Enable FX tracing on the dense-only RL model (#674)
Aug 23, 2022
0b55029
Fix disjoint LinUCB unit tests
Aug 23, 2022
2abbf49
Add support for distributed training to LinUCB (#677)
alexnikulkov Aug 24, 2022
a496713
Add dtype conversion to reagent CB batch preprocessor (#675)
alexnikulkov Aug 24, 2022
1e22ee2
Update reagent fully_connected_network.py to comply with fx trace (#678)
Aug 24, 2022
e8fe386
Remove use_interaction_features from LinUCB (#676)
alexnikulkov Aug 24, 2022
bdd4b21
Sync device in shift_kjt_by_one
Aug 25, 2022
86a5a94
Applying discount factor to Disjoint LinUCB
Sep 7, 2022
5f81d6e
Fix LinUCB tests (#681)
alexnikulkov Sep 7, 2022
10dde59
Remove if logic in forward pass to unblock model publish
Sep 21, 2022
4d8a649
Only try to getattr if hasattr (#683)
seemethere Sep 22, 2022
5d90840
change LinearRegressionUCB forward pass logic to make it more tracing…
Sep 24, 2022
014067d
Fix Multi-processor in LinUCB training (#684)
Sep 29, 2022
31b9511
fix recurrent training for LinUCB by splictting A,b between all_data …
Oct 3, 2022
15825f6
add discounting to LinUCB (#685)
Oct 3, 2022
6ee6716
upgrade pyre version in `fbcode/reagent` - batch 1
Oct 13, 2022
f673d41
Clean up (#656)
Oct 19, 2022
30715aa
comments updated for MLPEnsembler optimizer (#687)
Oct 20, 2022
7a79fdc
suppress errors in `reagent`
Oct 28, 2022
64d3353
fix reagent model script errors
emlin Nov 4, 2022
af1cecc
fix FeatureData fx wrap issue
emlin Nov 4, 2022
4ea529e
make epsilon greedy sampler support GPU mode (#690)
Nov 7, 2022
5aa4f57
Delete stub file to enable mypy check (#88701)
yhcharles Nov 9, 2022
ff1ff09
Fix bug in sampler log_prob dim (#692)
Nov 10, 2022
9de19b6
Add support for 3d tensors to batch_quadratic_form() (#693)
Nov 12, 2022
699ac5a
Fix reagent LRScheduler tests (#697)
janeyx99 Nov 15, 2022
25bafe6
Add CB Offline Evaluation to ReAgent (#695)
Dec 10, 2022
7cb5500
Refactor CB trainers in reagent to integrate Offline Eval (#694)
Dec 10, 2022
015785a
Add total weight tracking to LinUCB (#700)
Dec 10, 2022
cf357ac
Add new is_causal flag introduced by nn.Transformer API
Dec 16, 2022
5d95e0d
add support for variable number of arms to FbContBanditBatchPreproces…
Dec 19, 2022
d70e7b8
Mask out non-present arm scores for Offline Eval
Dec 19, 2022
ceac47a
add upper bound on numpy version (#704)
Dec 21, 2022
2e72fdc
Pass SummaryWriter to Offline Eval class to log metrics (#705)
Dec 30, 2022
1c769eb
inverse of matrix in LinUCB
Jan 4, 2023
627a72c
Fix matrix inverse for joint LinUCB
Jan 4, 2023
517a67f
Track average A and b instead of cumulative in LinUCB (#707)
Jan 6, 2023
89519d7
add support for distributed Offline Eval (#708)
Jan 13, 2023
d6102c2
reduce spam messages during reagent import (#709)
Jan 19, 2023
fa3fc6f
Add loggings of num of observations in training
Jan 20, 2023
b1486b7
move LinUCB discounting to a dedicated method (#710)
Jan 31, 2023
20263e1
update lighting version specification in OSS reagent (#711)
Feb 1, 2023
0eb75a3
upgrade pyre version in `fbcode/reagent` - batch 1
Feb 2, 2023
705f96b
upgrade pyre version in `fbcode/reagent` - batch 1
Feb 6, 2023
825bd70
add torchrec metrics to ap_container LinUCB (#713)
Feb 7, 2023
e4658f9
upgrade pyre version in `fbcode/reagent` - batch 1
Feb 17, 2023
27a0d7b
remove duplicated functions
Feb 28, 2023
5d21a68
Back out "remove duplicated functions"
Mar 8, 2023
1e53810
local sparse for feed
xuruiyang Mar 15, 2023
a21bd7a
upgrade pyre version in `fbcode/reagent` - batch 1
Mar 18, 2023
0923753
refactor LinUCBTrainer to remove the wrapper
Mar 22, 2023
7f3f471
Changes to DeepRepresentLinearRegressionUCB and DeepRepresentLinUCBTr…
Mar 22, 2023
911ecd8
Synthetic data/env for contextual bandit [1/X]
Mar 24, 2023
675ef5e
Synthetic data/env for contextual bandit [2/X]
Mar 24, 2023
c183f52
quick fix : rm unused reset method.
Mar 29, 2023
e5194e9
Add CB UCB scorer base class
Mar 30, 2023
1e9bbf3
Add fully-connected NN CB model (no uncertainty, just point estimates)
Mar 30, 2023
f785961
Add supervised CB trainer
Mar 30, 2023
8c9452c
BE: NNUCB[1/2] Docstring / comments
Mar 31, 2023
0b931b0
Adding id_score_list_features config to empty id_score_list_features_raw
Mar 31, 2023
582e003
Adding sparse features into score offline for inference
Apr 3, 2023
bf35935
BE: NNUCB[2/2] README
Apr 3, 2023
53d51a5
BE: Linucb [1/x] README
Apr 3, 2023
9abcd76
Switch the CB model to eval model for Offline Eval and inference wrapper
Apr 4, 2023
04fb976
Add skip connections to reagent FullyConnectedNetwork and CB NN models
Apr 4, 2023
e3b4ec9
Move chosen arm feature extraction, input check and recmetric logging…
Apr 4, 2023
9e1a350
Synthetic data/env for contextual bandit [3/X]
Apr 5, 2023
d94632c
suppress errors in `reagent`
Apr 5, 2023
b9033ca
Output UCB,mean,sigma from all ContextualUCB models
Apr 19, 2023
7dd6ae3
Back out "Output UCB,mean,sigma from all ContextualUCB models"
Apr 20, 2023
38405e0
Hot fix + "Output UCB,mean,sigma from all ContextualUCB models"
Apr 21, 2023
0a59035
Reducing calculation complexity on LinRegressionUCB
Apr 21, 2023
bb2140d
Add test for RecMetric module in CB trainer
Apr 28, 2023
6b39f63
add support for weighted loss to NN and Neural LinUCB
May 2, 2023
68e022d
Add more Offline Eval logging
May 2, 2023
3e80830
Add weight decay regularization to NN CB trainers
May 2, 2023
326f033
Add support for label transformations to Contextual Bandits in FREE
May 2, 2023
39fdace
Append column of ones to MLP output in Neural LinUCB to account for b…
May 4, 2023
5dcaca2
quick fix NN not update
May 19, 2023
7cbb7c9
Suppress type errors on reagent
stroxler May 19, 2023
3851038
Check valid window size + add docs (#717)
lequytra May 23, 2023
1ad5734
upgrade pyre version in `fbcode/reagent` - batch 1
May 24, 2023
ce35a8f
make normalization check robust to small numbers
seanchen1-meta May 30, 2023
6ae6732
Is NNLinUCB using ucb_alpha ?
May 30, 2023
78cf56a
suppress errors in `reagent`
Jun 1, 2023
2003023
replace np.object with object
igorsugak Jun 12, 2023
82db4bb
Update usage.rst
adhiiisetiawan Jun 17, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
309 changes: 244 additions & 65 deletions .circleci/config.yml

Large diffs are not rendered by default.

4 changes: 4 additions & 0 deletions .codecov.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
ignore:
# These are more experimental stuffs
- "reagent/ope/**/*"
- "reagent/training/gradient_free/**/*"
61 changes: 50 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,38 +1,77 @@
![Banner](logo/reagent_banner.png)
### Applied Reinforcement Learning @ Facebook
[![Support Ukraine](https://img.shields.io/badge/Support-Ukraine-FFD500?style=flat&labelColor=005BBB)](https://opensource.fb.com/support-ukraine)
[![License](https://img.shields.io/badge/license-BSD%203--Clause-brightgreen)](LICENSE)
[![CircleCI](https://circleci.com/gh/facebookresearch/ReAgent/tree/master.svg?style=shield)](https://circleci.com/gh/facebookresearch/ReAgent/tree/master)
[![codecov](https://codecov.io/gh/facebookresearch/ReAgent/branch/master/graph/badge.svg)](https://codecov.io/gh/facebookresearch/ReAgent)
[![CircleCI](https://circleci.com/gh/facebookresearch/ReAgent/tree/main.svg?style=shield)](https://circleci.com/gh/facebookresearch/ReAgent/tree/main)
[![codecov](https://codecov.io/gh/facebookresearch/ReAgent/branch/main/graph/badge.svg)](https://codecov.io/gh/facebookresearch/ReAgent)
---

#### Overview
ReAgent is an open source end-to-end platform for applied reinforcement learning (RL) developed and used at Facebook. ReAgent is built in Python and uses PyTorch for modeling and training and TorchScript for model serving. The platform contains workflows to train popular deep RL algorithms and includes data preprocessing, feature transformation, distributed training, counterfactual policy evaluation, and optimized serving. For more detailed information about ReAgent see the white paper [here](https://research.fb.com/publications/horizon-facebooks-open-source-applied-reinforcement-learning-platform/).
### Overview
ReAgent is an open source end-to-end platform for applied reinforcement learning (RL) developed and used at Facebook. ReAgent is built in Python and uses PyTorch for modeling and training and TorchScript for model serving. The platform contains workflows to train popular deep RL algorithms and includes data preprocessing, feature transformation, distributed training, counterfactual policy evaluation, and optimized serving. For more detailed information about ReAgent see the release post [here](https://research.fb.com/publications/horizon-facebooks-open-source-applied-reinforcement-learning-platform/) and white paper [here](https://arxiv.org/abs/1811.00260).

The platform was once named "Horizon" but we have adopted the name "ReAgent" recently to emphasize its broader scope in decision making and reasoning.

#### Algorithms Supported
### Algorithms Supported

Classic Off-Policy algorithms:
- Discrete-Action [DQN](https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf)
- Parametric-Action DQN
- [Double DQN](https://arxiv.org/abs/1509.06461), [Dueling DQN](https://arxiv.org/abs/1511.06581), [Dueling Double DQN](https://arxiv.org/abs/1710.02298)
- Distributional RL: [C51](https://arxiv.org/abs/1707.06887) and [QR-DQN](https://arxiv.org/abs/1710.10044)
- [Twin Delayed DDPG](https://arxiv.org/abs/1802.09477) (TD3)
- [Soft Actor-Critic](https://arxiv.org/abs/1801.01290) (SAC)
- [Critic Regularized Regression](https://arxiv.org/abs/2006.15134) (CRR)
- [Proximal Policy Optimization Algorithms](https://arxiv.org/abs/1707.06347) (PPO)

RL for recommender systems:
- [Seq2Slate](https://arxiv.org/abs/1810.02019)
- [SlateQ](https://arxiv.org/abs/1905.12767)

Counterfactual Evaluation:
- [Doubly Robust](https://arxiv.org/abs/1612.01205) (for bandits)
- [Doubly Robust](https://arxiv.org/abs/1511.03722) (for sequential decisions)
- [MAGIC](https://arxiv.org/abs/1604.00923)

Multi-Arm and Contextual Bandits:
- [UCB1](https://www.cs.bham.ac.uk/internal/courses/robotics/lectures/ucb1.pdf)
- [MetricUCB](https://arxiv.org/abs/0809.4882)
- [Thompson Sampling](https://web.stanford.edu/~bvr/pubs/TS_Tutorial.pdf)
- [LinUCB](https://arxiv.org/abs/1003.0146)


Others:
- [Cross-Entropy Method](http://web.mit.edu/6.454/www/www_fall_2003/gew/CEtutorial.pdf)
- [Synthetic Return for Credit Assignment](https://arxiv.org/abs/2102.12425)

#### Installation

### Installation
ReAgent can be installed via. Docker or manually. Detailed instructions on how to install ReAgent can be found [here](docs/installation.rst).

#### Usage
Detailed instructions on how to use ReAgent Models can be found [here](docs/usage.rst).
### Tutorial
ReAgent is designed for large-scale, distributed recommendation/optimization tasks where we don’t have access to a simulator.
In this environment, it is typically better to train offline on batches of data, and release new policies slowly over time.
Because the policy updates slowly and in batches, we use off-policy algorithms. To test a new policy without deploying it,
we rely on counter-factual policy evaluation (CPE), a set of techniques for estimating a policy based on the actions of another policy.

We also have a set of tools to facilitate applying RL in real-world applications:
- Domain Analysis Tool, which analyzes state/action feature importance and identifies whether the problem is a suitable for applying batch RL
- Behavior Cloning, which clones from the logging policy to bootstrap the learning policy safely

Detailed instructions on how to use ReAgent can be found [here](docs/usage.rst).

The ReAgent Serving Platform (RASP) tutorial is available [here](docs/rasp_tutorial.rst).

#### License
### License
ReAgent is released under a BSD 3-Clause license. Find out more about it [here](LICENSE).

#### Citing
[Terms of Use](https://opensource.facebook.com/legal/terms) | [Privacy Policy](https://opensource.facebook.com/legal/privacy) | Copyright © 2022 Meta Platforms, Inc


### Citing
```
@article{gauci2018horizon,
title={Horizon: Facebook's Open Source Applied Reinforcement Learning Platform},
author={Gauci, Jason and Conti, Edoardo and Liang, Yitao and Virochsiri, Kittipat and Chen, Zhengxing and He, Yuchen and Kaden, Zachary and Narayanan, Vivek and Ye, Xiaohui},
journal={arXiv preprint arXiv:1811.00260},
year={2018}
}
```
78 changes: 0 additions & 78 deletions docs/api/ml.rl.evaluation.rst

This file was deleted.

150 changes: 0 additions & 150 deletions docs/api/ml.rl.models.rst

This file was deleted.

30 changes: 0 additions & 30 deletions docs/api/ml.rl.prediction.rst

This file was deleted.

Loading