Skip to content

Conversation

DavidV17
Copy link
Contributor

Summary: Allow for publishing of reward network in discrete_crr.py

Differential Revision: D32711991

generatedunixname89002005307016 and others added 30 commits March 16, 2021 09:31
Differential Revision: D27079703

fbshipit-source-id: a590aa1d22ba70e47eef3eb4c1d61bcc48040b01
Summary:
Pull Request resolved: facebookresearch#420

Add some comments, remove useless fields, rename fields

Oncall Short Name: oncall_reinforcement_learning

Reviewed By: gji1

Differential Revision: D26947158

fbshipit-source-id: 8bd832e323efa26ffbbecabf48172726539d8213
Summary:
Pull Request resolved: facebookresearch#424

now, MDNRNNTrainer has been migrated to PytorchLightning, we should migrate CEM Trainer to PytorchLightning as well. This is an adhoc fix.

Oncall Short Name: oncall_reinforcement_learning

Reviewed By: kaiwenw

Differential Revision: D27145258

fbshipit-source-id: c54b97e09d3560e0f3f358eff62e851d60e95edb
Summary:
introduced in D26635649 (facebookresearch@0136ba5)
https://fb.workplace.com/groups/appliedrl/permalink/2919793174970984/

Reviewed By: czxttkl

Differential Revision: D27180718

fbshipit-source-id: 2e6ba10961416aaf70ce5156ff800880a3562c1d
…unctions and classes (facebookresearch#423)

Summary:
Pull Request resolved: facebookresearch#423

Move functions `create_df_from_replay_buffer`, `set_seed`, `feature_transform`, and `validate_mdp_ids_seq_nums` from fblearner.flow.projects.rl to reagent, as well as class `ProblemDomain` from reagent.core.fb.parameters to reagent.core.parameters so that oss may call them in unit tests.

Reviewed By: czxttkl

Differential Revision: D27130180

fbshipit-source-id: a06b7e8d5d683bb82a214bdab67b7e7e0ea71f2e
Summary:
Pull Request resolved: facebookresearch#419

Add a unit test for Seq2Reward model-based algorithm, to replicate the current integration test in https://fburl.com/diffusion/tctz61f8. This would enable a faster testbed for future explorations (see stacked diff as an example).

Reviewed By: czxttkl

Differential Revision: D27041945

fbshipit-source-id: ca4b54125debc88a53208ff5489f481faf582e22
…earch#422)

Summary:
Pull Request resolved: facebookresearch#422

This diff verifes that setting `filter_short_sequence=True` is able to reduce eval mse loss of seq2reward to small values around zero on StringGame data.

Reviewed By: czxttkl

Differential Revision: D27052147

fbshipit-source-id: e8428039ea72f66e9394d8efd90c1fccd6aeef2a
Summary:
Pull Request resolved: facebookresearch#426

- add FinalLayer, enabling specification of sigmoid
- CPE use same dataset as training (TODO: figure out why setting table_sample < 100 causes training to not work)
- add config for reward model training, for feature importance
- enable use of user features
- minor refactoring, more user friendly for Reels
- add option to override equiv_len during training

Reviewed By: czxttkl

Differential Revision: D27034687

fbshipit-source-id: 06bc519352334ea990ebcabba6cafd3569255def
Summary:
Pull Request resolved: facebookresearch#428

title

Reviewed By: czxttkl

Differential Revision: D27204048

fbshipit-source-id: f7f7a628247ab48822912d28b30643c5c7de8eac
…moryNetworkInput` (facebookresearch#430)

Summary:
Pull Request resolved: facebookresearch#430

- fix import errors (remove duplicates + resolve path for train_and_evaluate_generic)

- add `from_dict` classmethod to `MemoryNetworkInput`

Reviewed By: kaiwenw

Differential Revision: D27134600

fbshipit-source-id: 41770d5c3d624f651a41513bc84ad844aafb10ec
Summary:
Pull Request resolved: facebookresearch#429

Pull Request resolved: facebookresearch#421

Now, for each model manager, it has an OSS implementation in `reagent.model_manager` and internal implementation in `reagent.model_manager.fb`. The internal version mostly inherits from the OSS counterpart with just a few methods overridden for internal usage. So the code has minimal duplication.

Reviewed By: MisterTea

Differential Revision: D27073406

fbshipit-source-id: e6192960b8e132f5680adc2222993d9ff18216ef
Reviewed By: zertosh

Differential Revision: D27288821

fbshipit-source-id: 7053bbb5f324530378d49e9edf6a45ea702914b3
Summary:
Pull Request resolved: facebookresearch#431

We find models exported by jit.script caused QE canary timeout error. One hypothesis is that jit.trace has better performance than jit.script so we should stick to jit.trace whenever possible.

Reviewed By: kaiwenw

Differential Revision: D27083963

fbshipit-source-id: 32cc81079b67a10f72385a6ac816231ef93e8a91
Summary:
Pull Request resolved: facebookresearch#433

One should adjust minibatch_size in reader_optioin

Differential Revision: D27383416

fbshipit-source-id: c12458ecc0a9de162a6ce0098e905d044a302533
…rch#434)

Summary: Pull Request resolved: facebookresearch#434

Reviewed By: kaiwenw

Differential Revision: D27388819

fbshipit-source-id: 94669ef04f4532c9435a78d90e3e0ff3a763ffd1
Summary: title

Reviewed By: alexnikulkov

Differential Revision: D27340272

fbshipit-source-id: d506c7b7ebd04d5a70d529b0c4f9761a276f9d2a
…#435)

Summary: Pull Request resolved: facebookresearch#435

Reviewed By: DavidV17

Differential Revision: D27436575

fbshipit-source-id: cbcc0439fca2e0258a1aac5ceff3ae1bb29258c2
Summary: Add integration tests for model-based sequence model cfeval spark transform

Reviewed By: kaiwenw

Differential Revision: D27381397

fbshipit-source-id: 64e2473d7805435047f5ac4b830e7c55e9584ae3
Summary:
Pull Request resolved: facebookresearch#436

Added some comments

Reviewed By: alexnikulkov

Differential Revision: D27485489

fbshipit-source-id: 69c48bff53d383b41c092fb219be47e4fa35cce1
Summary:
Pull Request resolved: facebookresearch#440

Log values directly to Tensorboard

Reviewed By: kaiwenw

Differential Revision: D27586324

fbshipit-source-id: a06cbedff28d072fec3bc76626f3945bc556d559
Summary: Pull Request resolved: facebookresearch#439

Reviewed By: kaiwenw

Differential Revision: D27584143

fbshipit-source-id: 991663d72a5c4e36a109c6f0e49be6a793aa2811
Differential Revision: D27610490

fbshipit-source-id: 1c6c5301720861039ab8537e8bfae4637a3ef756
Summary: Pull Request resolved: facebookresearch#443

Reviewed By: bankawas

Differential Revision: D27613861

fbshipit-source-id: 554719add9f34f2206b076e65e941cd3aebf48ad
Summary:
Pull Request resolved: facebookresearch#438

Also adds TensorBoard plots into the reporter, and removes an unused unit test.

Reviewed By: czxttkl

Differential Revision: D27497184

fbshipit-source-id: 304ef603ec3457e7862492a2f82a482263846b30
Differential Revision: D27626042

fbshipit-source-id: 5c31221672790abe5ceadc06cbb0327d86ff46cf
Summary: Pull Request resolved: facebookresearch#445

Reviewed By: kaiwenw

Differential Revision: D27303639

fbshipit-source-id: 1c8f105a90aa929c8fecae12aa3191a0a8ed0008
Differential Revision: D27643630

fbshipit-source-id: 38246baa4212271a68c3ae3044e4c87e37de5b4d
Summary:
Pull Request resolved: facebookresearch#446

Switch eval_td_loss to Tensorboard

Reviewed By: bankawas

Differential Revision: D27643487

fbshipit-source-id: 25c0af8f0d943abaa68b024fd2f61caf65445cd9
Summary: Pull Request resolved: facebookresearch#444

Reviewed By: kaiwenw

Differential Revision: D27614614

fbshipit-source-id: ce5de96de5714eab80c1e3c6c78100663426ff66
Summary: Adding binary-cross-entropy-with-logits loss for myopic values between 0 and 1.

Reviewed By: czxttkl

Differential Revision: D27712539

fbshipit-source-id: f9e5fa67cee9955d191712a4c472968086e94c91
alexnikulkov and others added 19 commits November 4, 2021 20:14
Summary:
Pull Request resolved: facebookresearch#574

Adding a LinUCB trainer and a LinearRegressionUCB model type

Reviewed By: czxttkl

Differential Revision: D31817255

fbshipit-source-id: 17b65da2dd6cf17d21fe90e1591a0a0cfd3c880f
…acebookresearch#575)

Summary:
Pull Request resolved: facebookresearch#575

### New commit log messages
  412f0a4d2 Remove deprecated dataloader arguments in Trainer methods (#10325)

Reviewed By: tangbinh

Differential Revision: D32261342

fbshipit-source-id: 0dc24bb64eeb186f722ba147aa569d2b8af63f84
Summary:
By some unknown reason, the coverage tool looks for the source code for '/home/circleci/project/config-3.8.py', a file does not exist on the circle ci test machine. We have to use `report coverage -i` to ignore the error

Reference: https://coverage.readthedocs.io/en/6.1.1/cmd.html#cmd-report

Reviewed By: alexnikulkov

Differential Revision: D32325423

fbshipit-source-id: 24e6b355aff287d22cea9008d58f801b300b9f4d
Summary:
Pull Request resolved: facebookresearch#577

Update module lists following https://fb.quip.com/lEbxAN6UzLrS#UUGACAIIXSi

Reviewed By: alexnikulkov

Differential Revision: D32345725

fbshipit-source-id: fef624a759026ea7727159e22433129466bab399
…arch#576)

Summary:
Pull Request resolved: facebookresearch#576

Adding an additional argument (info) to post episode callback in Agent to match the post episode callback in replay buffer
This is needed for Klotski

Reviewed By: czxttkl

Differential Revision: D32335744

fbshipit-source-id: 8b46b50057656a9cc5d4c6c40edfda3c90beacb4
…#10403)

Summary:
### New commit log messages
  f9b9cdb0d Remove deprecated accelerator pass through functions in Accelerator (#10403)

Reviewed By: edward-io

Differential Revision: D32261339

fbshipit-source-id: c6696154be5e349cd1de1796ba396325ae06b831
Differential Revision: D32513876

fbshipit-source-id: a83d0291f8332c09aa4dbade434d61eb08e93794
Summary:
Pull Request resolved: facebookresearch#580

The Java version that we were using in OSS (8.0.272.hs-adpt) seems to have been removed from sdkman
See https://app.circleci.com/pipelines/github/facebookresearch/ReAgent/2142/workflows/fc99db2e-7b69-4331-abb8-ea798aa13ec4/jobs/18221
The closest available version is 8.0.292.hs-adpt

Reviewed By: czxttkl

Differential Revision: D32509203

fbshipit-source-id: df6349619d9d0d46034833ffe667f90656d0e3ca
Summary:
Pull Request resolved: facebookresearch#581

A new attribute has been added to SGD and will be added to other optimizers in the future. We need to make a corresponding change to `OptimizerConfig`
pytorch/pytorch#68052

Reviewed By: czxttkl

Differential Revision: D32513683

fbshipit-source-id: 61f4042c10f9843f73d886b9d8c1d90baa52c5c1
Summary:
### New commit log messages
  fa0ed17f8 remove deprecated train_loop (#10482)

Reviewed By: kandluis

Differential Revision: D32454980

fbshipit-source-id: a35237dde06cc9ddac5373b75992ce88a6771c76
…deterministic mode (facebookresearch#582)

Summary:
Pull Request resolved: facebookresearch#582
Deterministic mode was causing error because some functions don't support deterministic mode (see https://app.circleci.com/pipelines/github/facebookresearch/ReAgent/2142/workflows/94f7ae0b-d229-4fc0-911d-08f37307b6e7/jobs/18243/parallel-runs/0/steps/0-104)
`RuntimeError: scatter_add_cuda_kernel does not have a deterministic implementation, but you set 'torch.use_deterministic_algorithms(True)'. You can turn off determinism just for this operation, or you can use the 'warn_only=True' option, if that's acceptable for your application. You can also file an issue at https://github.com/pytorch/pytorch/issues to help us prioritize adding deterministic support for this operation.
`

Instead of using `deterministic=True` in Lightning trainer, I used `torch.use_deterministic_algorithms(True, warn_only=True)`, which prints a warning instead of an error if some operator doesn't support deterministic mode.

Reviewed By: czxttkl

Differential Revision: D32515266

fbshipit-source-id: 6e803cd2030011ffde3e9310fb8c86f4f792f245
…) in OSS train_eval_lightning (facebookresearch#584)

Summary:
Pull Request resolved: facebookresearch#584

This is how it's done in internal implementation, so I'll mirror it in OSS. Without this an error was thrown in tests:
`pytorch_lightning.utilities.exceptions.MisconfigurationException: No `test_step()` method defined to run `Trainer.test`.`
https://app.circleci.com/pipelines/github/facebookresearch/ReAgent/2149/workflows/217d0bfa-25c5-41b0-9947-300793ce0fc8/jobs/18384/parallel-runs/0/steps/0-107

Reviewed By: czxttkl

Differential Revision: D32516970

fbshipit-source-id: fc2ef5d4bd710e85b7a3a9c71d5d5c367b2c42de
…add TunedUCB (facebookresearch#578)

Summary:
Pull Request resolved: facebookresearch#578

1. Make argmax return random index from argmax set instead of returning the 1st index from the argmax set.
2. Add UCB Tuned
3. Add lower bound on estimated reward variance
4. Add minimum number of observations per arm (arm scores are inf until the reach the minimum)

Reviewed By: evrardgarcelon

Differential Revision: D32410581

fbshipit-source-id: 2ebe39bb5d35aa3e585078bc4a2a41cbbcdea210
Summary:
Pull Request resolved: facebookresearch#579

Add batch training mode to the simulation. In batch mode the model is updated every N steps.

Reviewed By: czxttkl

Differential Revision: D32411860

fbshipit-source-id: f700713d443ddc1c91ffa84513a3c76771bea72a
Summary: Follow the instructions in T66611582. Now the only remaining problem is that headers must include copyright.

Reviewed By: alexnikulkov

Differential Revision: D32583915

fbshipit-source-id: 13d390d756825c5e91e7801bf0dc4efec9b8b1f7
Summary:
Pull Request resolved: facebookresearch#585

as titled

Reviewed By: alexnikulkov

Differential Revision: D32584005

fbshipit-source-id: dcb999c2743e5ad788f5642f811dccb160d457ba
Summary:
Pull Request resolved: facebookresearch#586

as titled

Reviewed By: alexnikulkov

Differential Revision: D32584831

fbshipit-source-id: 7bff346118ea56992ca2c4570432aff078110d1e
Summary:
Pull Request resolved: facebookresearch#587

The goal of this diff is to fix all integration tests except sparse_dqn ones, which needs more understanding.

Reviewed By: alexnikulkov

Differential Revision: D32589825

fbshipit-source-id: 0394dfd0c2a59a77a1957e5daa172ddb2c142657
Summary: Since the code will become more and more specific to the ads signal loss use case, it is better to create a dedicated version which does not sync to OSS.

Reviewed By: j-jiafei

Differential Revision: D32591299

fbshipit-source-id: 02600fd68062a24ff22933e91faae3804a9da2fa
@facebook-github-bot
Copy link

This pull request was exported from Phabricator. Differential Revision: D32711991

DavidV17 pushed a commit to DavidV17/ReAgent that referenced this pull request Dec 1, 2021
…rch#588)

Summary:
Pull Request resolved: facebookresearch#588

Allow for publishing of reward network in discrete_crr.py

Differential Revision: D32711991

fbshipit-source-id: 5dcd8bc4ee9e5fb9922a6b47563b0c12a0908fa9
@facebook-github-bot
Copy link

This pull request was exported from Phabricator. Differential Revision: D32711991

…rch#588)

Summary:
Pull Request resolved: facebookresearch#588

Allow for publishing of reward network in discrete_crr.py

Differential Revision: D32711991

fbshipit-source-id: d13fcf724cd5de0c04609378a86b779c07db9efb
@facebook-github-bot
Copy link

This pull request was exported from Phabricator. Differential Revision: D32711991

@facebook-github-bot
Copy link

Hi @DavidV17!

Thank you for your pull request.

We require contributors to sign our Contributor License Agreement, and yours needs attention.

You currently have a record in our system, but the CLA is no longer valid, and will need to be resubmitted.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.