-
Notifications
You must be signed in to change notification settings - Fork 525
Update Documentation for Quick Start Example: usage.rst #718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary: 1. super net sampling (with Reagent APIs) 2. Other utils to support 1 2.1. update `SuperNNConfig` attribute by a path str so that samples from Reagent ng.p.Dict can be easily mapped to masks within `SuperNNConfig`: `replace_named_tuple_by_path` 3. test samples such that counts of masks are close to configured probabilities Reviewed By: dehuacheng Differential Revision: D31126805 fbshipit-source-id: 95e48728773c2afd7e6856f8a7a831b00214bbda
Summary: Add a unit test for OneHotActions. Reviewed By: igfox Differential Revision: D31248082 fbshipit-source-id: 74d55ab5d3a23c75f5d0020b53616c87023afcf0
Summary: Adds unit test to the test_processing.py for columnvector function from transform.py Reviewed By: igfox Differential Revision: D31247953 fbshipit-source-id: 8e6eee0fecf3dfb0bff8fb3d168e15f002c0acf3
Summary: I found some of the documentation confusing, this is an attempt to clarify the functionality of the code. Reviewed By: czxttkl Differential Revision: D31071280 fbshipit-source-id: 62e7e299d40e7a431ed29dea0c6582646a855fd9
Summary: Pull Request resolved: facebookresearch#548 as titled Reviewed By: gji1 Differential Revision: D31217654 fbshipit-source-id: 514ab8ae7561b8a5a7ff5094642314f83c6b5be1
Summary: Pull Request resolved: facebookresearch#550 update miniconda and update T101565175 Reviewed By: gji1 Differential Revision: D31290939 fbshipit-source-id: cbecdb63048fb3fb79a7b7eb87406408309026c1
Summary: Pull Request resolved: facebookresearch#549 Tests for replay buffer's behavior Reviewed By: alexnikulkov Differential Revision: D30978005 fbshipit-source-id: aa034db5699071654d607fe7795bc8be232157c2
Summary: ### New commit log messages 3aba9d16a Remove `ABC` from `LightningModule` (#9517) Reviewed By: ananthsub Differential Revision: D31296721 fbshipit-source-id: a9992486c61a6f86fb251f2733bbc9311d93f293
Summary: Pull Request resolved: facebookresearch#551 as titled Reviewed By: igfox Differential Revision: D31296738 fbshipit-source-id: 3672485ccd230f9b1a029f90759bdf598f5990e4
…tly into `trainer.py` (#9495) Summary: ### New commit log messages 290398f81 Deprecate TrainerProperties Mixin and move property definitions directly into `trainer.py` (#9495) Reviewed By: ananthsub Differential Revision: D31317981 fbshipit-source-id: 9a6270f326cebb59ef5fb53b8db9d0797f62be77
Summary: Pull Request resolved: facebookresearch#552 By relaxing the threshold... Also set seeds Reviewed By: bankawas Differential Revision: D31334025 fbshipit-source-id: d5d666b2b5f5e5e4f06dea2a1353e85456f39a60
…rch#553) Summary: Pull Request resolved: facebookresearch#553 Use [0.01, 0.99] may cause some performance loss in boosting with entropy metrics. Reviewed By: czxttkl Differential Revision: D31346456 fbshipit-source-id: dae1ef0f6e36e67a182ced5793555e0d78dbf51e
Summary: Pull Request resolved: facebookresearch#554 as titled. This is one step towards a config/script-based rl orchestrator which can start necessary workflows automatically. Reviewed By: j-jiafei Differential Revision: D31334081 fbshipit-source-id: 0355b46396d922cf82f041734ffb8d20ceeab8e5
Summary: Adding basic UCB MAB classes to ReAgent. 3 variants of UCB are added (including the one currently used for Ads Creative Exploration - MetricUCB) Supported functionality: 1. Batch training (feed in counts of samples and total reward from each arm). We'll use this mode for Ads Creative Exploration. 2. Online training (query the bandit for next action one step at a time). 3. Dumping the state of the bandit and loading it from a JSON string Reviewed By: czxttkl Differential Revision: D31355506 fbshipit-source-id: 978ec16cba289dc08af599a2c05bb49fcae2843a
Summary: Replace numpy with PyTorch. This is a step towards using the standard ReAgent interface for MABs Reviewed By: czxttkl Differential Revision: D31423841 fbshipit-source-id: 04ccf92fba7b0f44ab6c19bdef3d098bf62394cf
Differential Revision: D31496257 fbshipit-source-id: 0f6b56075e4d24bdfd9d54bcecee90c5d86efbaf
…ng the same variable (facebookresearch#555) Summary: Pull Request resolved: facebookresearch#555 The current implementation was buggy if the env was reusing the same variable for possible_actions_mask and modifying it in place. I fix the bug by copying the possible_action_mask values instead of assigning the variable directly. Reviewed By: czxttkl Differential Revision: D31487641 fbshipit-source-id: ebc70164e42dc097291a7aeecba60d2ef30117b3
Summary: Pull Request resolved: facebookresearch#558 add some input check and simplify code Reviewed By: gji1 Differential Revision: D31529090 fbshipit-source-id: 0c38d9b927d0149256fa78d373687bc9048a0c85
Summary: Pull Request resolved: facebookresearch#556 Convert possible_actions_mask to a Tensor Reviewed By: czxttkl Differential Revision: D31497491 fbshipit-source-id: c0b8eb479b6be517a9c74c1d61ad68e4120d388a
Summary: Pull Request resolved: facebookresearch#559 cleanly_stop is a manually set variable which needs to be placed on the correct device. Otherwise we will see errors like in f301990179. Also, ddp is not needed in single cpu/gpu training. Reviewed By: alexnikulkov Differential Revision: D31530342 fbshipit-source-id: 98879fc130616aaccc454f939cd7cf2a704eb0eb
Differential Revision: D31605682 fbshipit-source-id: 6c2d89926ecab45cdbbcdd48058ef3697f94f92b
Summary: Pull Request resolved: facebookresearch#560 Bayesian Optimization Optimizer mutation-based optimization and acquisition function. Reviewed By: czxttkl Differential Revision: D31424105 fbshipit-source-id: 97872516e1c633071f983ebe6b254cbabee7b037
…etworks, independent Thompson sampling, and mutation. (facebookresearch#561) Summary: Pull Request resolved: facebookresearch#561 Bayesian Optimization Optimizer with ensemble of feedforward networks, ITS, and mutation based optimization. Reviewed By: czxttkl Differential Revision: D31424065 fbshipit-source-id: 8ffc1e7fd5de303cd572ea5bcd880429af67d173
Summary: Pull Request resolved: facebookresearch#557 See title Reviewed By: czxttkl Differential Revision: D31524614 fbshipit-source-id: e7aa7996de570f4ff990b402fbd23688a4ed12f4
Differential Revision: D31739112 fbshipit-source-id: d7ab577f32eadf56fa8ad1846a0e916ab9fcb778
… methods to unify (facebookresearch#565) Summary: Pull Request resolved: facebookresearch#565 1. Add 2 Thompson sampling MAB algorithms: 1 for Bernoulli rewards, 1 for Normal rewards 2. Refactor UCB code so that Thompson sampling could reuse as much as possible Reviewed By: czxttkl Differential Revision: D31642370 fbshipit-source-id: c4447a22ad11e1bb9696cf269ea9f45523d22f28
Summary: Pull Request resolved: facebookresearch#566 Adding some tools to evaluate the performance of MAB algorithms in a simple simulated environment Notebook shows how to use this: https://fburl.com/anp/f7y0gzl8 Reviewed By: czxttkl Differential Revision: D31672454 fbshipit-source-id: 32e3d4a8daa8f15a4c777c37f70c7962f949c299
Summary: 1. Add option to estimate reward variance and scale the confidence interval width by SQRT(VAR). 2. Add an option to multiply confidence interval width by a constant scalar to make exploration more/less aggressive 3. Remove UCBTuned algorithm because it is essentially UCB1 + variance estimation Reviewed By: czxttkl Differential Revision: D31741828 fbshipit-source-id: 684788746e2e626228cb522c49b2bafa9179d6fe
Summary: Pull Request resolved: facebookresearch#567 Reviewed By: czxttkl Differential Revision: D31743265 fbshipit-source-id: 3508027a8ab23c8569d4cf416560f1b9a6891752
Summary: ### New commit log messages 6429de894 Add support for `len(datamodule)` (#9895) Removed the following internal patch which may be conflicting with this change: ``` --- a/fbcode/github/third-party/PyTorchLightning/pytorch-lightning/pytorch_lightning/trainer/connectors/data_connector.py +++ b/fbcode/github/third-party/PyTorchLightning/pytorch-lightning/pytorch_lightning/trainer/connectors/data_connector.py @@ -215,6 +215,7 @@ def attach_datamodule( self, model: "pl.LightningModule", datamodule: Optional["pl.LightningDataModule"] = None ) -> None: + datamodule = datamodule or getattr(model, 'datamodule', None) # If we have a datamodule, attach necessary hooks + dataloaders if datamodule is None: return ``` Reviewed By: yifuwang Differential Revision: D31693305 fbshipit-source-id: 48e58aa6a6f9cdf7029b93663004f9243de5d3d8
Summary: documentation README for Deep Learning based LinUCB model Reviewed By: rodrigodesalvobraz Differential Revision: D44508375 fbshipit-source-id: 4408b2ea85b1bea728815af20a526c674f0a062b
Summary: [need below pictures in Summary so as to use their links in README.md file] {F927315010} {F927382569} {F927429214} Reviewed By: rodrigodesalvobraz Differential Revision: D44561198 fbshipit-source-id: 77e74778b5b725297039cd4f1396fe610257efd0
Summary: Our NN models have batch norm and dropout, which behave differently during training and eval. In this diff I: 1. Switch the evaluated model to eval mode when it's attached to the offline evaluator 2. Switch the model to eval model when an inference wrapper is created. This might not be 100% necessary, but I'm not sufficiently familiar with the publishing process to know 100% that it's been switched to eval model before a wrapper is used. Reviewed By: BerenLuthien Differential Revision: D44552990 fbshipit-source-id: c8b4d9690d959da7187418c3c81096256e22f101
Summary: 1. Add a `ResidualWrapper` module, which can be used to turn a regular layer into a residual/skip layer 2. Modify `FullyConnectedNetwork` `__init__` method to support the residual wrapper 3. Add support for new `use_skip_connections` argument to all CB NN models 4. Add unit tests for `ResidualWrapper` Reviewed By: BerenLuthien Differential Revision: D44552989 fbshipit-source-id: d092f943737d8ead4bd484da455dfea5023c4e7c
… to the base CB trainer class Summary: Requiring each CB trainer class to do input check, chosen arm feature extraction and recmetric logging is cumbersome and error-prone (there was already a bug due to SupervisedTrainer not logging recmetrics - https://fb.workplace.com/groups/4402889573094258/posts/5999347940115072/?comment_id=6004958952887304). Instead, these 3 actions will be performed centrally in the base class and each trainer can focus on just computing the loss or executing the parameter update. Reviewed By: BerenLuthien Differential Revision: D44593201 fbshipit-source-id: 349f6702c9c756d80d23bd9ede6c4fb2e5940e94
Summary: Outputting the accumulated rewards and regrets. This helps evaluation. Demo N3067100 Reviewed By: alexnikulkov Differential Revision: D44352607 fbshipit-source-id: 06e6f756a35229f294a98650c2d67d3a78e3c513
Differential Revision: D44719105 fbshipit-source-id: 9e73e110d4c3ed858ac9e8944c73404bb6fa6122
Summary: T148338245 (next Diff will addresses T148655761), Output `{"pred_reward": pred_reward, "pred_sigma": pred_sigma, "ucb": ucb}` from Contextual Bandit models Reviewed By: alexnikulkov Differential Revision: D44775830 fbshipit-source-id: 2ed22bda5d8ae0491602ee8ffe4ac126f7f4774c
Summary: Original commit changeset: 2ed22bda5d8a Original Phabricator Diff: D44775830 Reverting to fix broken release tests. Example: https://www.internalfb.com/mast/job/aienv-20be949aec-f429693811 Reviewed By: zxpmirror1994 Differential Revision: D45132749 fbshipit-source-id: 35bb3496ac720d2569040fc020bcba3fb71af8cd
Summary: redo D44775830 , plus D45133345 (previous D44775830 passed all Unit Tests but failed on starlight run. D45133345 should fix it) Reviewed By: alexnikulkov Differential Revision: D45159072 fbshipit-source-id: f621f71672a4fc64ec457deb4f73a4a1e4897a45
Summary: --- # What This Diff switch `inv` or `pinv` to save calculation cost on matrix inverse. ------ ### Details : - matrix inverse cost calculation on NNLinUCB - we need matrix inverse operation on LinUCB and NNLinUCB. - On NNLinUCB, we have to do this inverse on each back-propogation process. Thus, it is costful. Saving some calcuation on inverse operation is more important on NNLinUCB (than LinUCB) - Hermitian=True saves complexity on `pinv` https://pytorch.org/docs/stable/generated/torch.linalg.pinv.html "If hermitian= True, A is assumed to be Hermitian if complex or symmetric if real, but this is not checked internally. Instead, just the lower triangular part of the matrix is used in the computations." - Apparently our matrix is Hermitian. - inv is more efficient (than pinv) but may be calculation unstable - `torch.linalg.inv` is more calculation efficient than `torch.linalg.pinv`, but `pinv` is more stable - if matrix is rank deficient or its condition number is too huge its inverse is not stable to calculate - the existence of regularization diagonal matrix Eye makes the matrix `A` always full rank. However, we also need consider how powerful the arm is compared to Eye. - In case some arm has many historical data (equivalently, the arm feature is huge), the eigen value corresponding to this arm may be so huge that the condition number of matrix A is very bad. That makes `inv` unstable. - Given the above observation/analysis, we adopt : `Try inv, Exception pinv` **NOTE** : let D45159072 land first, then change this Diff accordingly before land this Diff. Reviewed By: alexnikulkov Differential Revision: D44564771 fbshipit-source-id: 5d2c120d0c5faef6390af9c96cdb7453f22c3524
Summary: Add a test for recmetric logging in CB trainer (test for LinUCB only, but should work the same for other trainers) Reviewed By: BerenLuthien Differential Revision: D45404107 fbshipit-source-id: cb58809ffbe3948518bdc0ecf25348257f6e704d
Summary: Adding support for weighted supervised losses. This is especially important for Offline Eval because it uses weights to filter training data Reviewed By: BerenLuthien Differential Revision: D44597203 fbshipit-source-id: 8af32344123cd6b68e3df085952e35d1244fea8b
Summary: Log a few more metrics to improve Offline Eval understanding: 1. Average accepted/rejected rewards 2. Fraction of accepted to rejected rewards 3. Average reward across all (accepted + rejected) data 4. Average slate sizes of accepted/rejected observations Reviewed By: BerenLuthien Differential Revision: D45300609 fbshipit-source-id: d6e776d1d05e789942272fef42993471719e19be
Summary: See title Reviewed By: BerenLuthien Differential Revision: D45323720 fbshipit-source-id: 0e01614cdf91c2360e8b2aa63c8352d3d418d279
Summary: Adding a separate concept of `label`, which is used as the prediction target for model training. In the basic case, `label` is equal to `reward` and if `CBInput` is created without specifying `label`, we automatically set `label=reward` in `__post_init__`. But we can also define `label` differently, e.g. as a transformation of `reward`, to give the model a more stable learning target. I have observed improvements in performance from using `log` or `sqrt` transforms in AP Container Selection. The `reward` field is now used only for Offline Evaluation, while `label` is used for model training and supervised learning accuracy metrics. In a FREE workflow the transform is specified in `config.features.label_transform`, which can be one of `["identity", "log", "sqrt"]` Reviewed By: BerenLuthien, PoojaAg18 Differential Revision: D45300610 fbshipit-source-id: 7005b10e652549948e9104c9d90ef76475276f67
…ias/intercept Summary: Currently we pass the output of MLP directly to the input of LinUCB. This is equivalent to using a linear layer without a bias term. This diff appends a column of ones as an extra feature to the output of MLP in order to allow LinUCB to have a bias/intercept. Reviewed By: PoojaAg18 Differential Revision: D45539068 fbshipit-source-id: 48504c276590c04e274b380f89f806b2307809cf
Summary: allow NN to grad backprop Reviewed By: alexnikulkov Differential Revision: D45997912 fbshipit-source-id: 68ac01811f3b0eb9610ee5bf8652ae5163316a27
Summary: Pyre upgrade and continuous jobs never get black formatting anymore, and I don't have time to fix it this week. It's quicker to create diffs locally than to commandeer and fix the bot-generated diffs. ``` LOCAL_CONFIG=reagent IDENTIFIER="$(sed 's/\//-/g' <<< $LOCAL_CONFIG)" ERRORS_FILE=/tmp/$IDENTIFIER PYRE_UPGRADE=~/fbsource/"$(buck2 build //tools/pyre/facebook/tools:upgrade --show-output | awk '{ print $2}')" # get errors pyre -l "$LOCAL_CONFIG" --output json check > $ERRORS_FILE # fix errors cat $ERRORS_FILE | $PYRE_UPGRADE fixme-single $LOCAL_CONFIG --lint --error-source stdin --no-commit ``` Reviewed By: grievejia Differential Revision: D46025666 fbshipit-source-id: e1d3c8a7dca99707b48a5c93e7e30a3b7dfc89eb
Summary: X-link: meta-pytorch/torchrec#1171 Pull Request resolved: facebookresearch#717 ATT. If window_size is smaller than the overall/global batch size, window metrics will be NaN since we'll pop the entire batch out of the window state buffer. Reviewed By: joshuadeng Differential Revision: D45590488 fbshipit-source-id: 6d84e24cf1c77760e3ff2ef8fb9a86b5ab775f68
Differential Revision: D46117822 fbshipit-source-id: 811c2fc92ca39622f74d05ea1298863328ac6eda
Summary: Addresses this error: https://pxl.cl/2JNBR This shouldn't happen because the if statement above checks range. Perhaps the feature range is less than the boxcox resolution. So in the if statement check that the range is more than the boxcox resolution. Differential Revision: D46269758 fbshipit-source-id: 2e6272a7da6e63b5c93cd8aeab52a8fb2e8166b2
Summary: Allow inference to get specific `ucb_alpha` Reviewed By: alexnikulkov Differential Revision: D46284800 fbshipit-source-id: 00daa34e0679e2d8c6d268b6efd2dd349d047c8a
Differential Revision: D46355537 fbshipit-source-id: 696677a13e131de466df1e919b0b976a988eb67e
Summary: numpy 1.20.0 removed `np.object`. It was an alias to builtin `object`. It's safe to replace directly to `object`. Change is generated mechanically using the following oneliner: ``` fbgr -sl 'np\.object\b' | xargs perl -pi -e 's,\bnp\.object\b,object,g' ``` Differential Revision: D46585978 fbshipit-source-id: 21f2a5f0d1379ebd3fc5f89c9362699cbce0ef50
Hi @adhiiisetiawan! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at [email protected]. Thanks! |
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks! |
Update the documentation in the quick start example to improve clarity. The following changes have been made:
run_test_replay_buffer
function instead ofrun_test
in thereagent.gym.tests.test_gym
module.Before
After
I also change in "On-Policy Training" section
These changes ensure that the example aligns with the current implementation and usage of the
reagent.gym.tests.test_gym
module.Please review these modifications and let me know if any further adjustments are needed.