Skip to content

Conversation

czxttkl
Copy link
Contributor

@czxttkl czxttkl commented Jul 27, 2022

Summary: use this

Differential Revision: D38185498

Kellie Lu and others added 30 commits June 22, 2021 00:11
…ookresearch#496)

Summary:
Pull Request resolved: facebookresearch#496

Offline Batch RL runs were failing on import error, which arose from missing init.py file

Reviewed By: czxttkl

Differential Revision: D29284160

fbshipit-source-id: 4e69941028f5d00bc0ef7dc30049929a9d44c306
Summary:
Fixes : pytorch/pytorch#24892

In the paper : https://arxiv.org/pdf/1908.03265.pdf  Liyuan Liu et al. suggested a new optimization algorithm with an essence of similar to Adam Algorithm.

It has been discussed in the paper that, without warmup heuristic, in the early stage of adaptive optimization / learning algorithms sometimes we can get undesirable large variance which can slow overall convergence process.

Authors proposed the idea of rectification of variance of adaptive learning rate when it is expected to be high.

Differing from the paper, we selected variance tractability cut-off as 5 instead of 4. This adjustment is common practice, and could be found in the code-repository and also tensorflow swift optim library as well :

https://github.com/LiyuanLucasLiu/RAdam/blob/2f03dd197022da442c6a15c47321f4335d113a3f/radam/radam.py#L156

https://github.com/tensorflow/swift-apis/blob/f51ee4618d652a2419e998bf9418ad80bda67454/Sources/TensorFlow/Optimizers/MomentumBased.swift#L638

Pull Request resolved: pytorch/pytorch#58968

Reviewed By: gchanan

Differential Revision: D29241736

Pulled By: iramazanli

fbshipit-source-id: 288b9b1f3125fdc6c7a7bb23fde1ea5c201c0448
Differential Revision:
D29241736 (facebookresearch@0ff3634)

Original commit changeset: 288b9b1f3125

fbshipit-source-id: 56c4ec98647c6f1822b130726741a1c9ca193670
Summary:
Pull Request resolved: facebookresearch#487

We shouldn't need to yield the placeholder loss.

Differential Revision: D29111772

fbshipit-source-id: 0971221583bd9a5de770860ff15cc80eb8d749c3
Summary:
According to the original [SlateQ paper](https://arxiv.org/abs/1905.12767) (p28, 2nd paragraph, last sentence), the discount factor `gamma` will be scaled by the time difference in this way:

`gamma^((t2-t1)/time_scale)`.

Here, `t1` and `t2` are the timestamps between the current and the next state-action pairs within a training sample, and the `time_scale` is a hyperparameter that can scale up/down the time difference.

This diff implements this mechanism by adding a `discount_time_scale` parameter to `SlateQTrainer`. Its value is the `time_scale` in the formula above.

If this parameter is not set, i.e., `None`, we will keep the discount factor as it is.

Reviewed By: kittipatv

Differential Revision: D29297804

fbshipit-source-id: 5bd9101a2fe3b1b3d9817a3233357cab197e8ce8
Summary:
Fixes : pytorch/pytorch#5804

In the paper : https://openreview.net/forum?id=OM0jvwB8jIp57ZJjtNEZ  Timothy Dozat suggested a new optimization algorithm with an essence of combination of NAG and Adam algorithms.

It is known that the idea of momentum can be improved with the Nesterov acceleration in optimization algorithms, and Dozat is investigating to apply this idea to momentum component of Adam algorithm. Author provided experiment evidence in their work to show excellence of the idea.

In this PR we are implementing the proposed algorithm NAdam in the mentioned paper. Author has a preliminary work http://cs229.stanford.edu/proj2015/054_report.pdf  where he shows the decay base constant should be taken as 0.96 which we also followed the same phenomenon here in this implementation similar to Keras. Moreover, implementation / coding practice have been followed similar to Keras in some other places as well:

https://github.com/tensorflow/tensorflow/blob/f9d386849581d15d72f6f1f96f12aac230a8edbe/tensorflow/python/keras/optimizer_v2/nadam.py

Pull Request resolved: pytorch/pytorch#59009

Reviewed By: gchanan, vincentqb

Differential Revision: D29220375

Pulled By: iramazanli

fbshipit-source-id: 4b4bb4b15f7e16f7527f368bbf4207ed345751aa
Summary:
Fixes : pytorch/pytorch#24892

In the paper : https://arxiv.org/pdf/1908.03265.pdf  Liyuan Liu et al. suggested a new optimization algorithm with an essence of similar to Adam Algorithm.

It has been discussed in the paper that, without warmup heuristic, in the early stage of adaptive optimization / learning algorithms sometimes we can get undesirable large variance which can slow overall convergence process.

Authors proposed the idea of rectification of variance of adaptive learning rate when it is expected to be high.

Differing from the paper, we selected variance tractability cut-off as 5 instead of 4. This adjustment is common practice, and could be found in the code-repository and also tensorflow swift optim library as well :

https://github.com/LiyuanLucasLiu/RAdam/blob/2f03dd197022da442c6a15c47321f4335d113a3f/radam/radam.py#L156

https://github.com/tensorflow/swift-apis/blob/f51ee4618d652a2419e998bf9418ad80bda67454/Sources/TensorFlow/Optimizers/MomentumBased.swift#L638

Pull Request resolved: pytorch/pytorch#58968

Reviewed By: vincentqb

Differential Revision: D29310601

Pulled By: iramazanli

fbshipit-source-id: b7bd487f72f1074f266687fd9c0c6be264a748a9
Summary: Pull Request resolved: facebookresearch#491

Reviewed By: czxttkl, bankawas

Differential Revision: D29251412

fbshipit-source-id: 0a6cbcf59956ecc113e9425079f91a6b3098c2de
Summary: Pull Request resolved: facebookresearch#492

Reviewed By: bankawas

Differential Revision: D29252722

fbshipit-source-id: d855c6688199d2c3a09fab200e9b8d66c52d7273
Summary: We've implemented data modules; this method is redundant

Reviewed By: bankawas

Differential Revision: D29252903

fbshipit-source-id: 044cde768b481d4a12d4a17cca42180b4bd989cb
Summary: redundant

Reviewed By: bankawas

Differential Revision: D29252914

fbshipit-source-id: 536982d3b7886bda68fc14c5c933343167213224
Summary: redundant

Reviewed By: bankawas

Differential Revision: D29253003

fbshipit-source-id: cd05c62a0840b4f2d10c8bf4d9fe9ea057b6a13f
Summary: redundant

Reviewed By: bankawas

Differential Revision: D29253030

fbshipit-source-id: 969d03b6428aead6c6982a26b2e2c4a9a940273f
Summary: This is the start of making model manager stateless to reduce complexity

Reviewed By: czxttkl

Differential Revision: D29253248

fbshipit-source-id: 681d141cb46784e40c8802f2325c1636044c61de
…less

Summary: Removing state from model managers

Reviewed By: czxttkl

Differential Revision: D29253249

fbshipit-source-id: 93ecb090cd2e2b66f86480679ae6145519227360
Summary: Prereq for making model managers stateless

Reviewed By: czxttkl

Differential Revision: D29253385

fbshipit-source-id: 9db747f46a84f26bce079efe8c4394efd3c8adc7
…h#493)

Summary:
Pull Request resolved: facebookresearch#493

Finally removed normalization data from model manager state

Reviewed By: czxttkl

Differential Revision: D29253429

fbshipit-source-id: 619b93b473e49b07fe74d0b525d6fc5f30f52550
Summary: Give it to `build_trainer()` directly so that we can remove state in model managers

Reviewed By: czxttkl

Differential Revision: D29258017

fbshipit-source-id: 39f4a7e8ad9a92499ffeb3c04e2e1c61c10769c0
Summary:
Pull Request resolved: facebookresearch#494

- Remove `initialize_trainer()`
- Implement `train()` on ModelManager base class; remove all the duplicates
- Make `build_serving_module[s]()` takes the trainer module so it can extract whatever nets in the trainer module
- `ModelManager.train()` now returns `Tuple[RLTrainingOutput, pl.Trainer]` so that `_lightning_trainer` member can be deleted

Reviewed By: czxttkl

Differential Revision: D29258016

fbshipit-source-id: 71545dc77c386b532bb48fe4c8ee94c79c20f5c6
Summary: Implement multi-stage trainer module so that multi-stage training looks the same as other training. Internally, the multi-stage trainer forward calls to internal trainers.

Reviewed By: czxttkl

Differential Revision: D29273266

fbshipit-source-id: b51e91e5670362fc8ed85d9eeb05bd685fc7cbfd
Differential Revision: D29398026

fbshipit-source-id: 76923009da0f6fbc82a9fa8ae96c9417422c2577
Summary: Pull Request resolved: facebookresearch#497

Reviewed By: czxttkl

Differential Revision: D29405221

fbshipit-source-id: 3e3524d92fb8d243b7fe62a04830b8f2b80df6ce
Differential Revision: D29458224

fbshipit-source-id: dcef29cd83ee7aecc94100ed579d023072ab581e
Summary:
Pull Request resolved: facebookresearch#498

Add some assertions to make sure end users can use algorithms correctly.

Reviewed By: bankawas

Differential Revision: D29481662

fbshipit-source-id: 0332d990df7d3eca61e1f7bd205136d32f04a7b2
Summary:
Pull Request resolved: facebookresearch#499

Remove Seq2SlateDifferentiableRewardTrainer because it's not tested and wouldn't be used.

Reviewed By: kittipatv

Differential Revision: D29522083

fbshipit-source-id: 9cd7e0d6d1d10c17cc174a54d77a4b37b0f279b7
Summary:
Pull Request resolved: facebookresearch#500

Migrate the regular seq2slate to PyTorch Lightning, which includes one model manager `Seq2SlateTransformer` and three trainers `Seq2SlateTrainer`, `Seq2SlateSimulationTrainer` and `Seq2SlateTeacherForcingTrainer`. Manual optimization (https://pytorch-lightning.readthedocs.io/en/latest/common/optimizers.html#manual-optimization) is used to handle the sophisticated usage of optimizers during training.

Model manager `Seq2SlatePairwiseAttn` and trainer `Seq2SlatePairwiseAttnTrainer` are not migrated in this diff. But to make them compatible with the changes, the setting of `minibatch_size` is also moved from `trainer_params` to `reader_options`.

Reviewed By: czxttkl

Differential Revision: D29436608

fbshipit-source-id: 612a1de4923eb7d138fcb6cb4715be6e4d05b424
Summary: AutoDataModule yields dictionary of tensors. Therefore, we need to manually type the input

Reviewed By: czxttkl

Differential Revision: D29479986

fbshipit-source-id: ab135bb869d8f0eb1fba1813aebf5af6d5ca3401
Differential Revision: D29573192

fbshipit-source-id: 65dc670d1777dd1d6b86c9228a198cd16f504c6e
…h#489)

Summary: Pull Request resolved: facebookresearch#489

Reviewed By: czxttkl

Differential Revision: D29144000

fbshipit-source-id: b72401ee3bb69f4973c32914a440e571d56241f6
…ebookresearch#502)

Summary:
Pull Request resolved: facebookresearch#502

Use transformers to learn the return decomposition model.
1) customized attention layers that feed positional encoding to Key & Query but not V.
2) residual connections that learn meaningful embeddings.

Reviewed By: czxttkl

Differential Revision: D29346526

fbshipit-source-id: c6e642548d4d2b0bcc7f089c08d9144c6f96f8e0
Gaurav Bang and others added 16 commits May 24, 2022 17:16
Summary:
Pull Request resolved: facebookresearch#644

Add extra optional columns of mdp_id and arms in the ReAgent codebase. These are used in eval workflow for linucb.

Reviewed By: alexnikulkov

Differential Revision: D36493574

fbshipit-source-id: 509a5b17617381b244202d7a857a7d7b1eb8bcc9
…ch#645)

Summary:
Pull Request resolved: facebookresearch#645

Instead of always using a linear output activation, I want to specify which activation to use.
I want to try this with positive activation functions (e.g. relu) because I know that my Q-values have to be positive (all rewards are non-negative)

Reviewed By: czxttkl

Differential Revision: D36744360

fbshipit-source-id: 81296d2dfebb0ec77917d6024c0216d0e3fed4d1
Summary:
pyfmt now specifies a target Python version of 3.8 when formatting
with black. With this new config, black adds trailing commas to all
multiline function calls. This applies the new formatting as part
of rolling out the linttool-integration for pyfmt.

paintitblack

Reviewed By: zertosh, lisroach

Differential Revision: D37084377

fbshipit-source-id: 781a1b883a381a172e54d6e447137657977876b4
Differential Revision: D37172467

fbshipit-source-id: c5b8f6fceb327eb61a013836f57d315fd6b17211
Summary:
Pull Request resolved: facebookresearch#646

Fixing this problem: https://fb.workplace.com/groups/horizon.users/posts/1105605083328644/

Differential Revision: D37244190

fbshipit-source-id: 498d6f0790d3954f83758c4d46a6203c672fff67
Differential Revision: D37305159

fbshipit-source-id: 3532b6de87431137832c546d6544ba8d419cd726
Summary:
Pull Request resolved: facebookresearch#647

X-link: meta-pytorch/torchrec#447

Add a static function to concat a list of KJTs

Reviewed By: dstaay-fb

Differential Revision: D36944002

fbshipit-source-id: 1b6865f60dcea91ee250b69360e4606184ffad53
Summary: Made LinUCB neural-based to enable distributed training.

Differential Revision: D37009962

fbshipit-source-id: 4bb3e68ea60a264d26e13e4ce19832bca67a2c7e
Differential Revision: D37353746

fbshipit-source-id: b0dc7a8b59f4c6a1e39daa67fbae1e519488b6ef
Differential Revision: D37371791

fbshipit-source-id: 8247af28ab27242782ab8d317e7d9763121bcc74
Summary:
Pull Request resolved: facebookresearch#649

fix world model reporter so that we can read losses per epoch
make the logic needed to perform at the end of a train/test/validation epoch more explicit in reagent_lightning_module

Differential Revision: D37305377

fbshipit-source-id: 2204cfe94269cfba839b72c77bfea341ab63637d
Summary:
We want to make the EB/EBC scriptable by default w/o the need of running torch.fx first. Not able to script EB/EBC modules by default (especially when needed for inference) is very non-intuitive and inconvenient.
In the same time we don't plan to make ShardedEB/EBC scriptable.

1. Do not use property, use methods instead. This is consistent w/ KJT.
1. _embedding_bag_configs has a complex type List[EmbeddingBagConfig] which does not script. Use List[str] to store features instead.

Pull Request resolved: facebookresearch#648

X-link: meta-pytorch/torchrec#467

Reviewed By: colin2328

Differential Revision: D37389962

fbshipit-source-id: 5ce079a946b9458ee63658cae2fd731cfc1c7958
Summary:
Pull Request resolved: facebookresearch#650

In the base LightningModule class we define an optional `logger` property which may be
initialized to None, while in DQNtrainer._log_dqn method we try to access the `logger`
object without checking first if it was initialized. The issue surfaced when trying to run
unit tests analogous to those in `test_qrdqn`. This commit adds a check whether
the `logger` is initialized prior to attempting to use it.
Interestingly, the analogous QRDQNTrainer class implementation does not use the
`logger` property for logging, perhaps it's redundant?

Reviewed By: czxttkl

Differential Revision: D37529027

fbshipit-source-id: 5fe81cf715ee9f759b937290f1184d1c67e5325f
Summary:
Pull Request resolved: facebookresearch#651

Adds the test_dqn.py with a set of unit tests for DQNTrainer class,
mirroring those in test_qrdqn.py

Reviewed By: czxttkl

Differential Revision: D37536537

fbshipit-source-id: 60cef76adb62c54e66b3fda39596c1cf0ad20555
…#658)

Summary:
Pull Request resolved: facebookresearch#658

1. update machine images for some cpu-only tests
2. we have to switch back to use torchrec nightly (instead of stable) because the torchrec cpu stable version has caused some error in circle ci tests. See for example:  https://app.circleci.com/pipelines/github/facebookresearch/ReAgent/2437/workflows/1c7658b7-1197-425d-9fe3-166876d492ac/jobs/23829

Reviewed By: speedystream

Differential Revision: D38101590

fbshipit-source-id: d72f2cf0d204598ef648e0969522e9801029eca4
Summary:
Pull Request resolved: facebookresearch#659

two things will help avoid openssl errors when installing python 3.8:
upgrade pyenv
add a retry logic for installing python 3.8.

Reviewed By: speedystream

Differential Revision: D38123668

fbshipit-source-id: 7e527a0caf2d302a81306b7f2005e92ce19a6f5e
@facebook-github-bot
Copy link

This pull request was exported from Phabricator. Differential Revision: D38185498

@codecov-commenter
Copy link

codecov-commenter commented Jul 27, 2022

Codecov Report

Merging #660 (d1b8a4a) into main (1002d12) will increase coverage by 17.90%.
The diff coverage is n/a.

@@             Coverage Diff             @@
##             main     #660       +/-   ##
===========================================
+ Coverage   69.27%   87.17%   +17.90%     
===========================================
  Files         354      355        +1     
  Lines       22586    22642       +56     
  Branches       44       44               
===========================================
+ Hits        15647    19739     +4092     
+ Misses       6913     2877     -4036     
  Partials       26       26               
Impacted Files Coverage Δ
reagent/mab/ucb.py 86.84% <0.00%> (-2.64%) ⬇️
reagent/test/test_data/ex_mdps.py 96.42% <0.00%> (ø)
reagent/core/types.py 86.75% <0.00%> (+0.19%) ⬆️
reagent/prediction/predictor_wrapper.py 75.54% <0.00%> (+0.62%) ⬆️
reagent/training/dqn_trainer_base.py 95.52% <0.00%> (+0.74%) ⬆️
reagent/models/synthetic_reward.py 97.20% <0.00%> (+0.93%) ⬆️
reagent/core/tracker.py 93.97% <0.00%> (+3.61%) ⬆️
reagent/core/aggregators.py 94.54% <0.00%> (+3.63%) ⬆️
reagent/preprocessing/postprocessor.py 96.15% <0.00%> (+3.84%) ⬆️
reagent/core/utils.py 85.10% <0.00%> (+4.25%) ⬆️
... and 80 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1002d12...d1b8a4a. Read the comment docs.

@facebook-github-bot
Copy link

This pull request was exported from Phabricator. Differential Revision: D38185498

czxttkl added a commit to czxttkl/ReAgent that referenced this pull request Jul 27, 2022
Summary:
Pull Request resolved: facebookresearch#660

use this

Differential Revision: D38185498

fbshipit-source-id: 72785c9d62ddcc14d2a8529268261137e74b30a2
@czxttkl czxttkl force-pushed the export-D38185498 branch from d1b8a4a to 8be31b4 Compare July 27, 2022 05:53
czxttkl added a commit to czxttkl/ReAgent that referenced this pull request Jul 27, 2022
Summary:
Pull Request resolved: facebookresearch#660

use this

Differential Revision: D38185498

fbshipit-source-id: 409aa190e0a0af00d6e99fce2852b8087623216b
@facebook-github-bot
Copy link

This pull request was exported from Phabricator. Differential Revision: D38185498

@czxttkl czxttkl force-pushed the export-D38185498 branch from 8be31b4 to 87960bf Compare July 27, 2022 07:26
Summary:
Pull Request resolved: facebookresearch#660

use this

Differential Revision: D38185498

fbshipit-source-id: 16dd4698709687a441253630f2bb5db885568c81
@czxttkl czxttkl force-pushed the export-D38185498 branch from 87960bf to db82285 Compare July 27, 2022 07:30
@facebook-github-bot
Copy link

This pull request was exported from Phabricator. Differential Revision: D38185498

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.