-
Notifications
You must be signed in to change notification settings - Fork 526
Allow for publishing of reward network in discrete CRR #588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Differential Revision: D27079703 fbshipit-source-id: a590aa1d22ba70e47eef3eb4c1d61bcc48040b01
Summary: Pull Request resolved: facebookresearch#420 Add some comments, remove useless fields, rename fields Oncall Short Name: oncall_reinforcement_learning Reviewed By: gji1 Differential Revision: D26947158 fbshipit-source-id: 8bd832e323efa26ffbbecabf48172726539d8213
Summary: Pull Request resolved: facebookresearch#424 now, MDNRNNTrainer has been migrated to PytorchLightning, we should migrate CEM Trainer to PytorchLightning as well. This is an adhoc fix. Oncall Short Name: oncall_reinforcement_learning Reviewed By: kaiwenw Differential Revision: D27145258 fbshipit-source-id: c54b97e09d3560e0f3f358eff62e851d60e95edb
Summary: introduced in D26635649 (facebookresearch@0136ba5) https://fb.workplace.com/groups/appliedrl/permalink/2919793174970984/ Reviewed By: czxttkl Differential Revision: D27180718 fbshipit-source-id: 2e6ba10961416aaf70ce5156ff800880a3562c1d
…unctions and classes (facebookresearch#423) Summary: Pull Request resolved: facebookresearch#423 Move functions `create_df_from_replay_buffer`, `set_seed`, `feature_transform`, and `validate_mdp_ids_seq_nums` from fblearner.flow.projects.rl to reagent, as well as class `ProblemDomain` from reagent.core.fb.parameters to reagent.core.parameters so that oss may call them in unit tests. Reviewed By: czxttkl Differential Revision: D27130180 fbshipit-source-id: a06b7e8d5d683bb82a214bdab67b7e7e0ea71f2e
Summary: Pull Request resolved: facebookresearch#419 Add a unit test for Seq2Reward model-based algorithm, to replicate the current integration test in https://fburl.com/diffusion/tctz61f8. This would enable a faster testbed for future explorations (see stacked diff as an example). Reviewed By: czxttkl Differential Revision: D27041945 fbshipit-source-id: ca4b54125debc88a53208ff5489f481faf582e22
…earch#422) Summary: Pull Request resolved: facebookresearch#422 This diff verifes that setting `filter_short_sequence=True` is able to reduce eval mse loss of seq2reward to small values around zero on StringGame data. Reviewed By: czxttkl Differential Revision: D27052147 fbshipit-source-id: e8428039ea72f66e9394d8efd90c1fccd6aeef2a
Summary: Pull Request resolved: facebookresearch#426 - add FinalLayer, enabling specification of sigmoid - CPE use same dataset as training (TODO: figure out why setting table_sample < 100 causes training to not work) - add config for reward model training, for feature importance - enable use of user features - minor refactoring, more user friendly for Reels - add option to override equiv_len during training Reviewed By: czxttkl Differential Revision: D27034687 fbshipit-source-id: 06bc519352334ea990ebcabba6cafd3569255def
Summary: Pull Request resolved: facebookresearch#428 title Reviewed By: czxttkl Differential Revision: D27204048 fbshipit-source-id: f7f7a628247ab48822912d28b30643c5c7de8eac
…moryNetworkInput` (facebookresearch#430) Summary: Pull Request resolved: facebookresearch#430 - fix import errors (remove duplicates + resolve path for train_and_evaluate_generic) - add `from_dict` classmethod to `MemoryNetworkInput` Reviewed By: kaiwenw Differential Revision: D27134600 fbshipit-source-id: 41770d5c3d624f651a41513bc84ad844aafb10ec
Summary: Pull Request resolved: facebookresearch#429 Pull Request resolved: facebookresearch#421 Now, for each model manager, it has an OSS implementation in `reagent.model_manager` and internal implementation in `reagent.model_manager.fb`. The internal version mostly inherits from the OSS counterpart with just a few methods overridden for internal usage. So the code has minimal duplication. Reviewed By: MisterTea Differential Revision: D27073406 fbshipit-source-id: e6192960b8e132f5680adc2222993d9ff18216ef
Reviewed By: zertosh Differential Revision: D27288821 fbshipit-source-id: 7053bbb5f324530378d49e9edf6a45ea702914b3
Summary: Pull Request resolved: facebookresearch#431 We find models exported by jit.script caused QE canary timeout error. One hypothesis is that jit.trace has better performance than jit.script so we should stick to jit.trace whenever possible. Reviewed By: kaiwenw Differential Revision: D27083963 fbshipit-source-id: 32cc81079b67a10f72385a6ac816231ef93e8a91
Summary: Pull Request resolved: facebookresearch#433 One should adjust minibatch_size in reader_optioin Differential Revision: D27383416 fbshipit-source-id: c12458ecc0a9de162a6ce0098e905d044a302533
…rch#434) Summary: Pull Request resolved: facebookresearch#434 Reviewed By: kaiwenw Differential Revision: D27388819 fbshipit-source-id: 94669ef04f4532c9435a78d90e3e0ff3a763ffd1
Summary: title Reviewed By: alexnikulkov Differential Revision: D27340272 fbshipit-source-id: d506c7b7ebd04d5a70d529b0c4f9761a276f9d2a
…#435) Summary: Pull Request resolved: facebookresearch#435 Reviewed By: DavidV17 Differential Revision: D27436575 fbshipit-source-id: cbcc0439fca2e0258a1aac5ceff3ae1bb29258c2
Summary: Add integration tests for model-based sequence model cfeval spark transform Reviewed By: kaiwenw Differential Revision: D27381397 fbshipit-source-id: 64e2473d7805435047f5ac4b830e7c55e9584ae3
Summary: Pull Request resolved: facebookresearch#436 Added some comments Reviewed By: alexnikulkov Differential Revision: D27485489 fbshipit-source-id: 69c48bff53d383b41c092fb219be47e4fa35cce1
Summary: Pull Request resolved: facebookresearch#440 Log values directly to Tensorboard Reviewed By: kaiwenw Differential Revision: D27586324 fbshipit-source-id: a06cbedff28d072fec3bc76626f3945bc556d559
Summary: Pull Request resolved: facebookresearch#439 Reviewed By: kaiwenw Differential Revision: D27584143 fbshipit-source-id: 991663d72a5c4e36a109c6f0e49be6a793aa2811
Differential Revision: D27610490 fbshipit-source-id: 1c6c5301720861039ab8537e8bfae4637a3ef756
Summary: Pull Request resolved: facebookresearch#443 Reviewed By: bankawas Differential Revision: D27613861 fbshipit-source-id: 554719add9f34f2206b076e65e941cd3aebf48ad
Summary: Pull Request resolved: facebookresearch#438 Also adds TensorBoard plots into the reporter, and removes an unused unit test. Reviewed By: czxttkl Differential Revision: D27497184 fbshipit-source-id: 304ef603ec3457e7862492a2f82a482263846b30
Differential Revision: D27626042 fbshipit-source-id: 5c31221672790abe5ceadc06cbb0327d86ff46cf
Summary: Pull Request resolved: facebookresearch#445 Reviewed By: kaiwenw Differential Revision: D27303639 fbshipit-source-id: 1c8f105a90aa929c8fecae12aa3191a0a8ed0008
Differential Revision: D27643630 fbshipit-source-id: 38246baa4212271a68c3ae3044e4c87e37de5b4d
Summary: Pull Request resolved: facebookresearch#446 Switch eval_td_loss to Tensorboard Reviewed By: bankawas Differential Revision: D27643487 fbshipit-source-id: 25c0af8f0d943abaa68b024fd2f61caf65445cd9
Summary: Pull Request resolved: facebookresearch#444 Reviewed By: kaiwenw Differential Revision: D27614614 fbshipit-source-id: ce5de96de5714eab80c1e3c6c78100663426ff66
Summary: Adding binary-cross-entropy-with-logits loss for myopic values between 0 and 1. Reviewed By: czxttkl Differential Revision: D27712539 fbshipit-source-id: f9e5fa67cee9955d191712a4c472968086e94c91
Summary: Pull Request resolved: facebookresearch#574 Adding a LinUCB trainer and a LinearRegressionUCB model type Reviewed By: czxttkl Differential Revision: D31817255 fbshipit-source-id: 17b65da2dd6cf17d21fe90e1591a0a0cfd3c880f
…acebookresearch#575) Summary: Pull Request resolved: facebookresearch#575 ### New commit log messages 412f0a4d2 Remove deprecated dataloader arguments in Trainer methods (#10325) Reviewed By: tangbinh Differential Revision: D32261342 fbshipit-source-id: 0dc24bb64eeb186f722ba147aa569d2b8af63f84
Summary: By some unknown reason, the coverage tool looks for the source code for '/home/circleci/project/config-3.8.py', a file does not exist on the circle ci test machine. We have to use `report coverage -i` to ignore the error Reference: https://coverage.readthedocs.io/en/6.1.1/cmd.html#cmd-report Reviewed By: alexnikulkov Differential Revision: D32325423 fbshipit-source-id: 24e6b355aff287d22cea9008d58f801b300b9f4d
Summary: Pull Request resolved: facebookresearch#577 Update module lists following https://fb.quip.com/lEbxAN6UzLrS#UUGACAIIXSi Reviewed By: alexnikulkov Differential Revision: D32345725 fbshipit-source-id: fef624a759026ea7727159e22433129466bab399
…arch#576) Summary: Pull Request resolved: facebookresearch#576 Adding an additional argument (info) to post episode callback in Agent to match the post episode callback in replay buffer This is needed for Klotski Reviewed By: czxttkl Differential Revision: D32335744 fbshipit-source-id: 8b46b50057656a9cc5d4c6c40edfda3c90beacb4
…#10403) Summary: ### New commit log messages f9b9cdb0d Remove deprecated accelerator pass through functions in Accelerator (#10403) Reviewed By: edward-io Differential Revision: D32261339 fbshipit-source-id: c6696154be5e349cd1de1796ba396325ae06b831
Differential Revision: D32513876 fbshipit-source-id: a83d0291f8332c09aa4dbade434d61eb08e93794
Summary: Pull Request resolved: facebookresearch#580 The Java version that we were using in OSS (8.0.272.hs-adpt) seems to have been removed from sdkman See https://app.circleci.com/pipelines/github/facebookresearch/ReAgent/2142/workflows/fc99db2e-7b69-4331-abb8-ea798aa13ec4/jobs/18221 The closest available version is 8.0.292.hs-adpt Reviewed By: czxttkl Differential Revision: D32509203 fbshipit-source-id: df6349619d9d0d46034833ffe667f90656d0e3ca
Summary: Pull Request resolved: facebookresearch#581 A new attribute has been added to SGD and will be added to other optimizers in the future. We need to make a corresponding change to `OptimizerConfig` pytorch/pytorch#68052 Reviewed By: czxttkl Differential Revision: D32513683 fbshipit-source-id: 61f4042c10f9843f73d886b9d8c1d90baa52c5c1
Summary: ### New commit log messages fa0ed17f8 remove deprecated train_loop (#10482) Reviewed By: kandluis Differential Revision: D32454980 fbshipit-source-id: a35237dde06cc9ddac5373b75992ce88a6771c76
…deterministic mode (facebookresearch#582) Summary: Pull Request resolved: facebookresearch#582 Deterministic mode was causing error because some functions don't support deterministic mode (see https://app.circleci.com/pipelines/github/facebookresearch/ReAgent/2142/workflows/94f7ae0b-d229-4fc0-911d-08f37307b6e7/jobs/18243/parallel-runs/0/steps/0-104) `RuntimeError: scatter_add_cuda_kernel does not have a deterministic implementation, but you set 'torch.use_deterministic_algorithms(True)'. You can turn off determinism just for this operation, or you can use the 'warn_only=True' option, if that's acceptable for your application. You can also file an issue at https://github.com/pytorch/pytorch/issues to help us prioritize adding deterministic support for this operation. ` Instead of using `deterministic=True` in Lightning trainer, I used `torch.use_deterministic_algorithms(True, warn_only=True)`, which prints a warning instead of an error if some operator doesn't support deterministic mode. Reviewed By: czxttkl Differential Revision: D32515266 fbshipit-source-id: 6e803cd2030011ffde3e9310fb8c86f4f792f245
…) in OSS train_eval_lightning (facebookresearch#584) Summary: Pull Request resolved: facebookresearch#584 This is how it's done in internal implementation, so I'll mirror it in OSS. Without this an error was thrown in tests: `pytorch_lightning.utilities.exceptions.MisconfigurationException: No `test_step()` method defined to run `Trainer.test`.` https://app.circleci.com/pipelines/github/facebookresearch/ReAgent/2149/workflows/217d0bfa-25c5-41b0-9947-300793ce0fc8/jobs/18384/parallel-runs/0/steps/0-107 Reviewed By: czxttkl Differential Revision: D32516970 fbshipit-source-id: fc2ef5d4bd710e85b7a3a9c71d5d5c367b2c42de
…add TunedUCB (facebookresearch#578) Summary: Pull Request resolved: facebookresearch#578 1. Make argmax return random index from argmax set instead of returning the 1st index from the argmax set. 2. Add UCB Tuned 3. Add lower bound on estimated reward variance 4. Add minimum number of observations per arm (arm scores are inf until the reach the minimum) Reviewed By: evrardgarcelon Differential Revision: D32410581 fbshipit-source-id: 2ebe39bb5d35aa3e585078bc4a2a41cbbcdea210
Summary: Pull Request resolved: facebookresearch#579 Add batch training mode to the simulation. In batch mode the model is updated every N steps. Reviewed By: czxttkl Differential Revision: D32411860 fbshipit-source-id: f700713d443ddc1c91ffa84513a3c76771bea72a
Summary: Follow the instructions in T66611582. Now the only remaining problem is that headers must include copyright. Reviewed By: alexnikulkov Differential Revision: D32583915 fbshipit-source-id: 13d390d756825c5e91e7801bf0dc4efec9b8b1f7
Summary: Pull Request resolved: facebookresearch#585 as titled Reviewed By: alexnikulkov Differential Revision: D32584005 fbshipit-source-id: dcb999c2743e5ad788f5642f811dccb160d457ba
Summary: Pull Request resolved: facebookresearch#586 as titled Reviewed By: alexnikulkov Differential Revision: D32584831 fbshipit-source-id: 7bff346118ea56992ca2c4570432aff078110d1e
Summary: Pull Request resolved: facebookresearch#587 The goal of this diff is to fix all integration tests except sparse_dqn ones, which needs more understanding. Reviewed By: alexnikulkov Differential Revision: D32589825 fbshipit-source-id: 0394dfd0c2a59a77a1957e5daa172ddb2c142657
Summary: Since the code will become more and more specific to the ads signal loss use case, it is better to create a dedicated version which does not sync to OSS. Reviewed By: j-jiafei Differential Revision: D32591299 fbshipit-source-id: 02600fd68062a24ff22933e91faae3804a9da2fa
This pull request was exported from Phabricator. Differential Revision: D32711991 |
…rch#588) Summary: Pull Request resolved: facebookresearch#588 Allow for publishing of reward network in discrete_crr.py Differential Revision: D32711991 fbshipit-source-id: 5dcd8bc4ee9e5fb9922a6b47563b0c12a0908fa9
3ded40c
to
528c8f1
Compare
This pull request was exported from Phabricator. Differential Revision: D32711991 |
…rch#588) Summary: Pull Request resolved: facebookresearch#588 Allow for publishing of reward network in discrete_crr.py Differential Revision: D32711991 fbshipit-source-id: d13fcf724cd5de0c04609378a86b779c07db9efb
528c8f1
to
5f17b97
Compare
This pull request was exported from Phabricator. Differential Revision: D32711991 |
Hi @DavidV17! Thank you for your pull request. We require contributors to sign our Contributor License Agreement, and yours needs attention. You currently have a record in our system, but the CLA is no longer valid, and will need to be resubmitted. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at [email protected]. Thanks! |
Summary: Allow for publishing of reward network in discrete_crr.py
Differential Revision: D32711991