Skip to content

Conversation

adhiiisetiawan
Copy link

Update the documentation in the quick start example to improve clarity. The following changes have been made:

  • Modified the command in the documentation to use the run_test_replay_buffer function instead of run_test in the reagent.gym.tests.test_gym module.
  • Updated the example code snippet to reflect the change:

Before

# set the config
export CONFIG=reagent/gym/tests/configs/cartpole/discrete_dqn_cartpole_online.yaml
# train and evaluate model on gym environment
./reagent/workflow/cli.py run reagent.gym.tests.test_gym.run_test $CONFIG

After

# set the config
export CONFIG=reagent/gym/tests/configs/cartpole/discrete_dqn_cartpole_online.yaml
# train and evaluate model on gym environment
./reagent/workflow/cli.py run reagent.gym.tests.test_gym.run_test_replay_buffer $CONFIG

I also change in "On-Policy Training" section

These changes ensure that the example aligns with the current implementation and usage of the reagent.gym.tests.test_gym module.

Please review these modifications and let me know if any further adjustments are needed.

Wei Wen and others added 30 commits September 27, 2021 09:19
Summary:
1. super net sampling (with Reagent APIs)
2. Other utils to support 1
2.1. update `SuperNNConfig` attribute by a path str so that samples from Reagent ng.p.Dict can be easily mapped to masks within `SuperNNConfig`: `replace_named_tuple_by_path`
3. test samples such that counts of masks are close to configured probabilities

Reviewed By: dehuacheng

Differential Revision: D31126805

fbshipit-source-id: 95e48728773c2afd7e6856f8a7a831b00214bbda
Summary: Add a unit test for OneHotActions.

Reviewed By: igfox

Differential Revision: D31248082

fbshipit-source-id: 74d55ab5d3a23c75f5d0020b53616c87023afcf0
Summary: Adds unit test to the test_processing.py for columnvector function from transform.py

Reviewed By: igfox

Differential Revision: D31247953

fbshipit-source-id: 8e6eee0fecf3dfb0bff8fb3d168e15f002c0acf3
Summary: I found some of the documentation confusing, this is an attempt to clarify the functionality of the code.

Reviewed By: czxttkl

Differential Revision: D31071280

fbshipit-source-id: 62e7e299d40e7a431ed29dea0c6582646a855fd9
Summary:
Pull Request resolved: facebookresearch#548

as titled

Reviewed By: gji1

Differential Revision: D31217654

fbshipit-source-id: 514ab8ae7561b8a5a7ff5094642314f83c6b5be1
Summary:
Pull Request resolved: facebookresearch#550

update miniconda and update T101565175

Reviewed By: gji1

Differential Revision: D31290939

fbshipit-source-id: cbecdb63048fb3fb79a7b7eb87406408309026c1
Summary:
Pull Request resolved: facebookresearch#549

Tests for replay buffer's behavior

Reviewed By: alexnikulkov

Differential Revision: D30978005

fbshipit-source-id: aa034db5699071654d607fe7795bc8be232157c2
Summary:
### New commit log messages
  3aba9d16a Remove `ABC` from `LightningModule` (#9517)

Reviewed By: ananthsub

Differential Revision: D31296721

fbshipit-source-id: a9992486c61a6f86fb251f2733bbc9311d93f293
Summary:
Pull Request resolved: facebookresearch#551

as titled

Reviewed By: igfox

Differential Revision: D31296738

fbshipit-source-id: 3672485ccd230f9b1a029f90759bdf598f5990e4
…tly into `trainer.py` (#9495)

Summary:
### New commit log messages
  290398f81 Deprecate TrainerProperties Mixin and move property definitions directly into `trainer.py` (#9495)

Reviewed By: ananthsub

Differential Revision: D31317981

fbshipit-source-id: 9a6270f326cebb59ef5fb53b8db9d0797f62be77
Summary:
Pull Request resolved: facebookresearch#552

By relaxing the threshold...

Also set seeds

Reviewed By: bankawas

Differential Revision: D31334025

fbshipit-source-id: d5d666b2b5f5e5e4f06dea2a1353e85456f39a60
…rch#553)

Summary:
Pull Request resolved: facebookresearch#553

Use [0.01, 0.99] may cause some performance loss in boosting with entropy
metrics.

Reviewed By: czxttkl

Differential Revision: D31346456

fbshipit-source-id: dae1ef0f6e36e67a182ced5793555e0d78dbf51e
Summary:
Pull Request resolved: facebookresearch#554

as titled.
This is one step towards a config/script-based rl orchestrator which can start necessary workflows automatically.

Reviewed By: j-jiafei

Differential Revision: D31334081

fbshipit-source-id: 0355b46396d922cf82f041734ffb8d20ceeab8e5
Summary:
Adding basic UCB MAB classes to ReAgent.
3 variants of UCB are added (including the one currently used for Ads Creative Exploration - MetricUCB)
Supported functionality:
1. Batch training (feed in counts of samples and total reward from each arm). We'll use this mode for Ads Creative Exploration.
2. Online training (query the bandit for next action one step at a time).
3. Dumping the state of the bandit and loading it from a JSON string

Reviewed By: czxttkl

Differential Revision: D31355506

fbshipit-source-id: 978ec16cba289dc08af599a2c05bb49fcae2843a
Summary: Replace numpy with PyTorch. This is a step towards using the standard ReAgent interface for MABs

Reviewed By: czxttkl

Differential Revision: D31423841

fbshipit-source-id: 04ccf92fba7b0f44ab6c19bdef3d098bf62394cf
Differential Revision: D31496257

fbshipit-source-id: 0f6b56075e4d24bdfd9d54bcecee90c5d86efbaf
…ng the same variable (facebookresearch#555)

Summary:
Pull Request resolved: facebookresearch#555

The current implementation was buggy if the env was reusing the same variable for possible_actions_mask and modifying it in place. I fix the bug by copying the possible_action_mask values instead of assigning the variable directly.

Reviewed By: czxttkl

Differential Revision: D31487641

fbshipit-source-id: ebc70164e42dc097291a7aeecba60d2ef30117b3
Summary:
Pull Request resolved: facebookresearch#558

add some input check and simplify code

Reviewed By: gji1

Differential Revision: D31529090

fbshipit-source-id: 0c38d9b927d0149256fa78d373687bc9048a0c85
Summary:
Pull Request resolved: facebookresearch#556

Convert possible_actions_mask to a Tensor

Reviewed By: czxttkl

Differential Revision: D31497491

fbshipit-source-id: c0b8eb479b6be517a9c74c1d61ad68e4120d388a
Summary:
Pull Request resolved: facebookresearch#559

cleanly_stop is a manually set variable which needs to be placed on the correct device. Otherwise we will see errors like in f301990179.

Also, ddp is not needed in single cpu/gpu training.

Reviewed By: alexnikulkov

Differential Revision: D31530342

fbshipit-source-id: 98879fc130616aaccc454f939cd7cf2a704eb0eb
Differential Revision: D31605682

fbshipit-source-id: 6c2d89926ecab45cdbbcdd48058ef3697f94f92b
Summary:
Pull Request resolved: facebookresearch#560

Bayesian Optimization Optimizer mutation-based optimization and acquisition function.

Reviewed By: czxttkl

Differential Revision: D31424105

fbshipit-source-id: 97872516e1c633071f983ebe6b254cbabee7b037
…etworks, independent Thompson sampling, and mutation. (facebookresearch#561)

Summary:
Pull Request resolved: facebookresearch#561

Bayesian Optimization Optimizer with ensemble of feedforward networks, ITS, and mutation based optimization.

Reviewed By: czxttkl

Differential Revision: D31424065

fbshipit-source-id: 8ffc1e7fd5de303cd572ea5bcd880429af67d173
Summary:
Pull Request resolved: facebookresearch#557

See title

Reviewed By: czxttkl

Differential Revision: D31524614

fbshipit-source-id: e7aa7996de570f4ff990b402fbd23688a4ed12f4
Differential Revision: D31739112

fbshipit-source-id: d7ab577f32eadf56fa8ad1846a0e916ab9fcb778
… methods to unify (facebookresearch#565)

Summary:
Pull Request resolved: facebookresearch#565

1. Add 2 Thompson sampling MAB algorithms: 1 for Bernoulli rewards, 1 for Normal rewards
2. Refactor UCB code so that Thompson sampling could reuse as much as possible

Reviewed By: czxttkl

Differential Revision: D31642370

fbshipit-source-id: c4447a22ad11e1bb9696cf269ea9f45523d22f28
Summary:
Pull Request resolved: facebookresearch#566

Adding some tools to evaluate the performance of MAB algorithms in a simple simulated environment
Notebook shows how to use this: https://fburl.com/anp/f7y0gzl8

Reviewed By: czxttkl

Differential Revision: D31672454

fbshipit-source-id: 32e3d4a8daa8f15a4c777c37f70c7962f949c299
Summary:
1. Add option to estimate reward variance and scale the confidence interval width by SQRT(VAR).
2. Add an option to multiply confidence interval width by a constant scalar to make exploration more/less aggressive
3. Remove UCBTuned algorithm because it is essentially UCB1 + variance estimation

Reviewed By: czxttkl

Differential Revision: D31741828

fbshipit-source-id: 684788746e2e626228cb522c49b2bafa9179d6fe
Summary: Pull Request resolved: facebookresearch#567

Reviewed By: czxttkl

Differential Revision: D31743265

fbshipit-source-id: 3508027a8ab23c8569d4cf416560f1b9a6891752
Summary:
### New commit log messages
  6429de894 Add support for `len(datamodule)` (#9895)

Removed the following internal patch which may be conflicting with this change:
```
 --- a/fbcode/github/third-party/PyTorchLightning/pytorch-lightning/pytorch_lightning/trainer/connectors/data_connector.py
+++ b/fbcode/github/third-party/PyTorchLightning/pytorch-lightning/pytorch_lightning/trainer/connectors/data_connector.py
@@ -215,6 +215,7 @@
     def attach_datamodule(
         self, model: "pl.LightningModule", datamodule: Optional["pl.LightningDataModule"] = None
     ) -> None:
+        datamodule = datamodule or getattr(model, 'datamodule', None)
         # If we have a datamodule, attach necessary hooks + dataloaders
         if datamodule is None:
             return
```

Reviewed By: yifuwang

Differential Revision: D31693305

fbshipit-source-id: 48e58aa6a6f9cdf7029b93663004f9243de5d3d8
Hongbo Guo and others added 26 commits April 3, 2023 12:13
Summary: documentation README for Deep Learning based LinUCB model

Reviewed By: rodrigodesalvobraz

Differential Revision: D44508375

fbshipit-source-id: 4408b2ea85b1bea728815af20a526c674f0a062b
Summary:
[need below pictures in Summary so as to use their links in README.md file]
{F927315010}

{F927382569}

{F927429214}

Reviewed By: rodrigodesalvobraz

Differential Revision: D44561198

fbshipit-source-id: 77e74778b5b725297039cd4f1396fe610257efd0
Summary:
Our NN models have batch norm and dropout, which behave differently during training and eval.
In this diff I:
1. Switch the evaluated model to eval mode when it's attached to the offline evaluator
2. Switch the model to eval model when an inference wrapper is created. This might not be 100% necessary, but I'm not sufficiently familiar with the publishing process to know 100% that it's been switched to eval model before a wrapper is used.

Reviewed By: BerenLuthien

Differential Revision: D44552990

fbshipit-source-id: c8b4d9690d959da7187418c3c81096256e22f101
Summary:
1. Add a `ResidualWrapper` module, which can be used to turn a regular layer into a residual/skip layer
2. Modify `FullyConnectedNetwork` `__init__` method to support the residual wrapper
3. Add support for new `use_skip_connections` argument to all CB NN models
4. Add unit tests for `ResidualWrapper`

Reviewed By: BerenLuthien

Differential Revision: D44552989

fbshipit-source-id: d092f943737d8ead4bd484da455dfea5023c4e7c
… to the base CB trainer class

Summary:
Requiring each CB trainer class to do input check, chosen arm feature extraction and recmetric logging is cumbersome and error-prone (there was already a bug due to SupervisedTrainer not logging recmetrics - https://fb.workplace.com/groups/4402889573094258/posts/5999347940115072/?comment_id=6004958952887304).
Instead, these 3 actions will be performed centrally in the base class and each trainer can focus on just computing the loss or executing the parameter update.

Reviewed By: BerenLuthien

Differential Revision: D44593201

fbshipit-source-id: 349f6702c9c756d80d23bd9ede6c4fb2e5940e94
Summary:
Outputting the accumulated rewards and regrets.  This helps evaluation.

Demo
N3067100

Reviewed By: alexnikulkov

Differential Revision: D44352607

fbshipit-source-id: 06e6f756a35229f294a98650c2d67d3a78e3c513
Differential Revision: D44719105

fbshipit-source-id: 9e73e110d4c3ed858ac9e8944c73404bb6fa6122
Summary:
T148338245 (next Diff will addresses T148655761),

Output `{"pred_reward": pred_reward, "pred_sigma": pred_sigma, "ucb": ucb}` from Contextual Bandit models

Reviewed By: alexnikulkov

Differential Revision: D44775830

fbshipit-source-id: 2ed22bda5d8ae0491602ee8ffe4ac126f7f4774c
Summary:
Original commit changeset: 2ed22bda5d8a

Original Phabricator Diff: D44775830

Reverting to fix broken release tests. Example: https://www.internalfb.com/mast/job/aienv-20be949aec-f429693811

Reviewed By: zxpmirror1994

Differential Revision: D45132749

fbshipit-source-id: 35bb3496ac720d2569040fc020bcba3fb71af8cd
Summary:
redo D44775830 ,  plus D45133345

(previous D44775830  passed all Unit Tests but failed on starlight run. D45133345 should fix it)

Reviewed By: alexnikulkov

Differential Revision: D45159072

fbshipit-source-id: f621f71672a4fc64ec457deb4f73a4a1e4897a45
Summary:
 ---
# What
This Diff switch `inv` or `pinv`  to save calculation cost on matrix inverse.

------

### Details :
- matrix inverse cost calculation on NNLinUCB
  - we need matrix inverse operation on LinUCB and NNLinUCB.
  - On NNLinUCB, we have to do this inverse on each back-propogation process. Thus, it is costful. Saving some calcuation on inverse operation is more important on NNLinUCB (than LinUCB)

  -  Hermitian=True saves complexity on `pinv`
   https://pytorch.org/docs/stable/generated/torch.linalg.pinv.html
"If hermitian= True, A is assumed to be Hermitian if complex or symmetric if real, but this is not checked internally. Instead, just the lower triangular part of the matrix is used in the computations."
   - Apparently our matrix is Hermitian.

- inv is more efficient (than pinv) but may be calculation unstable
   - `torch.linalg.inv` is more calculation efficient than `torch.linalg.pinv`, but `pinv` is more stable
  - if matrix is rank deficient or its condition number is too huge its inverse is not stable to calculate
  - the existence of regularization diagonal matrix Eye makes the matrix `A` always full rank. However, we also need consider how powerful the arm is compared to Eye.
  - In case some arm has many historical data (equivalently, the arm feature is huge), the eigen value corresponding to this arm may be so huge that the condition number of matrix A is very bad. That makes `inv` unstable.

- Given the above observation/analysis, we adopt : `Try inv, Exception pinv`

**NOTE** : let D45159072 land first, then change this Diff accordingly before land this Diff.

Reviewed By: alexnikulkov

Differential Revision: D44564771

fbshipit-source-id: 5d2c120d0c5faef6390af9c96cdb7453f22c3524
Summary: Add a test for recmetric logging in CB trainer (test for LinUCB only, but should work the same for other trainers)

Reviewed By: BerenLuthien

Differential Revision: D45404107

fbshipit-source-id: cb58809ffbe3948518bdc0ecf25348257f6e704d
Summary: Adding support for weighted supervised losses. This is especially important for Offline Eval because it uses weights to filter training data

Reviewed By: BerenLuthien

Differential Revision: D44597203

fbshipit-source-id: 8af32344123cd6b68e3df085952e35d1244fea8b
Summary:
Log a few more metrics to improve Offline Eval understanding:
1. Average accepted/rejected rewards
2. Fraction of accepted to rejected rewards
3. Average reward across all (accepted + rejected) data
4. Average slate sizes of accepted/rejected observations

Reviewed By: BerenLuthien

Differential Revision: D45300609

fbshipit-source-id: d6e776d1d05e789942272fef42993471719e19be
Summary: See title

Reviewed By: BerenLuthien

Differential Revision: D45323720

fbshipit-source-id: 0e01614cdf91c2360e8b2aa63c8352d3d418d279
Summary:
Adding a separate concept of `label`, which is used as the prediction target for model training. In the basic case, `label` is equal to `reward` and if `CBInput` is created without specifying `label`, we automatically set `label=reward` in `__post_init__`. But we can also define `label` differently, e.g. as a transformation of `reward`, to give the model a more stable learning target.
I have observed improvements in performance from using `log` or `sqrt` transforms in AP Container Selection.
The `reward` field is now used only for Offline Evaluation, while `label` is used for model training and supervised learning accuracy metrics.
In a FREE workflow the transform is specified in `config.features.label_transform`, which can be one of `["identity", "log", "sqrt"]`

Reviewed By: BerenLuthien, PoojaAg18

Differential Revision: D45300610

fbshipit-source-id: 7005b10e652549948e9104c9d90ef76475276f67
…ias/intercept

Summary: Currently we pass the output of MLP directly to the input of LinUCB. This is equivalent to using a linear layer without a bias term. This diff appends a column of ones as an extra feature to the output of MLP in order to allow LinUCB to have a bias/intercept.

Reviewed By: PoojaAg18

Differential Revision: D45539068

fbshipit-source-id: 48504c276590c04e274b380f89f806b2307809cf
Summary: allow NN to grad backprop

Reviewed By: alexnikulkov

Differential Revision: D45997912

fbshipit-source-id: 68ac01811f3b0eb9610ee5bf8652ae5163316a27
Summary:
Pyre upgrade and continuous jobs never get black
formatting anymore, and I don't have time to fix
it this week.

It's quicker to create diffs locally than to
commandeer and fix the bot-generated diffs.

```
LOCAL_CONFIG=reagent
IDENTIFIER="$(sed 's/\//-/g' <<< $LOCAL_CONFIG)"

ERRORS_FILE=/tmp/$IDENTIFIER
PYRE_UPGRADE=~/fbsource/"$(buck2 build //tools/pyre/facebook/tools:upgrade --show-output | awk '{ print $2}')"

# get errors
pyre -l "$LOCAL_CONFIG" --output json check > $ERRORS_FILE

# fix errors
cat $ERRORS_FILE | $PYRE_UPGRADE fixme-single $LOCAL_CONFIG --lint --error-source stdin --no-commit
```

Reviewed By: grievejia

Differential Revision: D46025666

fbshipit-source-id: e1d3c8a7dca99707b48a5c93e7e30a3b7dfc89eb
Summary:
X-link: meta-pytorch/torchrec#1171

Pull Request resolved: facebookresearch#717

ATT. If window_size is smaller than the overall/global batch size, window metrics will be NaN since we'll pop the entire batch out of the window state buffer.

Reviewed By: joshuadeng

Differential Revision: D45590488

fbshipit-source-id: 6d84e24cf1c77760e3ff2ef8fb9a86b5ab775f68
Differential Revision: D46117822

fbshipit-source-id: 811c2fc92ca39622f74d05ea1298863328ac6eda
Summary:
Addresses this error:
https://pxl.cl/2JNBR

This shouldn't happen because the if statement above checks range. Perhaps the feature range is less than the boxcox resolution. So in the if statement check that the range is more than the boxcox resolution.

Differential Revision: D46269758

fbshipit-source-id: 2e6272a7da6e63b5c93cd8aeab52a8fb2e8166b2
Summary: Allow inference to get specific `ucb_alpha`

Reviewed By: alexnikulkov

Differential Revision: D46284800

fbshipit-source-id: 00daa34e0679e2d8c6d268b6efd2dd349d047c8a
Differential Revision: D46355537

fbshipit-source-id: 696677a13e131de466df1e919b0b976a988eb67e
Summary:
numpy 1.20.0 removed `np.object`. It was an alias to builtin `object`. It's safe to replace directly to `object`.

Change is generated mechanically using the following oneliner:
```
fbgr -sl 'np\.object\b' | xargs perl -pi -e 's,\bnp\.object\b,object,g'
```

Differential Revision: D46585978

fbshipit-source-id: 21f2a5f0d1379ebd3fc5f89c9362699cbce0ef50
@facebook-github-bot
Copy link

Hi @adhiiisetiawan!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

@facebook-github-bot
Copy link

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.