-
Notifications
You must be signed in to change notification settings - Fork 525
Authoring Aware ROO on LTV ReAgent model #689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
luofangyuyx
wants to merge
668
commits into
facebookresearch:main
Choose a base branch
from
luofangyuyx:export-D40456561
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Differential Revision: D29738340 fbshipit-source-id: 97c83cea89c46c469cdc967cce2ac7ce281c55fc
Summary: Pull Request resolved: facebookresearch#508 Reviewed By: czxttkl Differential Revision: D29805519 fbshipit-source-id: dbcde11f8292eb167a0b7a66384e0d1d723b38e4
Summary: Pull Request resolved: facebookresearch#506 Use `ExtraData.from_dict` Reviewed By: czxttkl Differential Revision: D29768249 fbshipit-source-id: de0056420ab71a79c4f9821cf451328949256037
Summary: Implement data module for Seq2Slate Reviewed By: czxttkl Differential Revision: D29717416 fbshipit-source-id: 424e3c025d73f691c8b0880f853f8d4dca0db584
Summary: Pull Request resolved: facebookresearch#507 Previously the SlateQ trainer only supports SARSA on-policy training. This diff implements a off-policy training approach based on Q-learning. Changes are: 1. Introduced a new `slate_opt_parameters` to specify which slate optimization method to use: top_k, greedy, or exact, based on the SlateQ paper. Currently only the top_k approach is implemented; 2. When choosing the next action, instead of directly using `training_batch.next_action`, we first calculate the Q-value for each next candidate, and rank them by doc value * Q-value. And choose the indices for the top-k items as the next action. Reviewed By: kittipatv Differential Revision: D29660887 fbshipit-source-id: 9b15de4cba41ad5e34f1ca4553f90c53399052c4
Summary: Pull Request resolved: facebookresearch#511 n/a Reviewed By: igfox Differential Revision: D29820857 fbshipit-source-id: 7389785f20e1a503c5eea3221c5ad68ca1f79b31
Summary: Pull Request resolved: facebookresearch#510 Currently QR-DQN is not tested offline. This diff adds an integration test to open_ai_gym_offline and cogwheel. It also corrects an issue with QR-DQN's CPE (the optimizers were in the wrong order) and modifies our model registration to work outside of fblearner flow environments. Reviewed By: kittipatv Differential Revision: D29800557 fbshipit-source-id: ae324c0323a9e644524a228ab296c412923c5336
Summary: Pull Request resolved: facebookresearch#512 - Removing the opt from `manual_backward` call - Pin Lightning version to same version as in fbcode Reviewed By: igfox Differential Revision: D29828482 fbshipit-source-id: 26a52d71362a9a6fd1ea995d854f4a0268d5cce6
Summary: Currently if logged action prob is 0 NaNs can propagate to the actor loss (even with entropy set to 0) and mess up training (f287261291). This diff removes entropy calculation if entropy_coeff <= 0 and raises an error if entropy calculation is on while a logged action has probability 0. Reviewed By: czxttkl Differential Revision: D29861744 fbshipit-source-id: 2fae30e7108145139851d0767d7bbe18f6dd388a
Summary: Black/whitelist are in the process of being removed from all FB code (https://fb.workplace.com/groups/e/permalink/3320810064641820/). This diff replaces all instances of black/whitelist with block/allowlist in the reagent codebase. Reviewed By: kittipatv Differential Revision: D29881070 fbshipit-source-id: 3d2e63eff5f4371f994ba4ae37586e3ef33c2fb7
…fbe63) to github/third-party/PyTorchLightning/pytorch-lightning Summary: # Manual Changes - Migrate callsites of `_extract_batch_size` to `extract_batch_size` (as per https://fburl.com/code/4q7n8fs9). - Remove unnecessary unit tests in `test_hive_writing_callback.py` ### New commit log messages 000fbe63 Expose `extract_batch_size` method and add corresponding tests. (#8357) Reviewed By: yifuwang Differential Revision: D29834484 fbshipit-source-id: 219a3d40401d9b2c35d3a74b75f2394c4f57d61b
Differential Revision: D29929083 fbshipit-source-id: 66bae2de6f4c7ac658de98475b00f81215ef6b0e
Summary: Pull Request resolved: facebookresearch#513 The previous approach use the fixed slate_size, which includes padded items, and it shouldn't give use the actual average over valid Q-value estimations. This diff fix this issue by calculating the actual slate_size summing the item mask (1 if an item is valid) over each slate. Reviewed By: czxttkl Differential Revision: D29848923 fbshipit-source-id: 2a3fea30cdaa46b85b72fe5b5d054d7b78755a5b
Summary: Pull Request resolved: facebookresearch#514 As titled. RBF kernel is used in Eqn. 10 in https://jgillenw.com/cikm2018.pdf. Reviewed By: Strideradu Differential Revision: D29894690 fbshipit-source-id: 46681ca4e0b5091434834d7f86d9d87c7228da64
Summary: Our current PathManager is based on fvcore, it's in the process of deprecation and is being replaced by an open source solution iopath. This diff is the result of running the provided codemod script on reagent, rl, and rl_exp, followed by a round of autodeps and a fbgs search for 'fvcore'. https://fb.workplace.com/groups/939200583171018/permalink/1022439911513751/ Reviewed By: czxttkl Differential Revision: D29974786 fbshipit-source-id: 397fd69ef94d43a7ca07c963c2a46bbbdcf78599
Summary: Pull Request resolved: facebookresearch#515 title Reviewed By: czxttkl Differential Revision: D29922558 fbshipit-source-id: b5ad7863d5c5b15363a5e9daf237800b79c260f2
Summary: Pull Request resolved: facebookresearch#516 See title. Also: - modifies the CRR constructor to use the standard target_model format - fixes a bug with delayed_policy_update - adds some todos Reviewed By: czxttkl Differential Revision: D29970711 fbshipit-source-id: 8d3add660b865d96b365cbda9baf0aa7ea13e879
Summary: Pull Request resolved: facebookresearch#519 Reviewed By: igfox Differential Revision: D30047652 fbshipit-source-id: b2a4b2542455e43798a8c0b4606be88bcb00f823
…earch#517) Summary: Pull Request resolved: facebookresearch#517 If we don't enable `single_selection`, we should find a way to calculate the next slate value from all Q-values of all items on the next slate. The default way to calculate that is by summing up all the next Q-values and average by the slate_size of the next slate. There is another way to average by the current slate_size as well. This wasn't an issue before since the slate_size is fixed, but after we landed D29848923 (facebookresearch@5351f63), the slate size can be different. We're not sure theoretically if either averaging method is better, so we propose this diff to allow configuring that. The new option is called `next_slate_value_norm_method` and it can take: - `"norm_by_current_slate_size"`: sum the next slate Q-values and average by the **current** slate size; - `"norm_by_next_slate_size"`: sum the next slate Q-values and average by the **next** slate size; cc achechetka solofsson Reviewed By: czxttkl Differential Revision: D29986728 fbshipit-source-id: f178b5da1462e4f9cc6995367ed229ab958c477a
Summary: Currently our recurring training does not cache model outputs instance-to-instance. This means we can't do things like ensure action frequencies don't change too much across deployments. This diff adds the ability to save the past training outputs into a result_history (which can be initialized from external runs) in a fixed-size deque. These result_histories are passed in to each model validator, though currently they are unused. Reviewed By: kittipatv Differential Revision: D30024188 fbshipit-source-id: d5c14dce310ec5e34f7539f7c1a3942eabd79553
Differential Revision: D30120143 fbshipit-source-id: 6a970483211b768fc4439d620ed912825fc1f84b
Summary: Pull Request resolved: facebookresearch#521 It's no longer used Reviewed By: kittipatv Differential Revision: D30133029 fbshipit-source-id: d76dfdcd7ad8874f69d24870f1421d5cb4182827
Summary: The feature importance is based on data perturbation Reviewed By: kaiwenw Differential Revision: D30136375 fbshipit-source-id: 94ae08f530925023bb54cee1be8d1f246ae01fca
…straint) regularization (facebookresearch#522) Summary: Pull Request resolved: facebookresearch#522 (1) Add coefficient "beta" (regularization factor of policy constraint regularization) on CRR's objective. (2) Change default value for "DEFAULT_MAX_UNIQUE_ENUM" from 100 to 10 Reviewed By: czxttkl Differential Revision: D30183147 fbshipit-source-id: 1c18610678482397bdd669fd064a27d34967881f
…nce (facebookresearch#523) Summary: Pull Request resolved: facebookresearch#523 as titled Reviewed By: kaiwenw Differential Revision: D30182710 fbshipit-source-id: 558b6d7093ea23b6ffb23387b7f48e873013d373
Summary: Pull Request resolved: facebookresearch#525 ARS utility functions for reagent Reviewed By: czxttkl Differential Revision: D30005486 fbshipit-source-id: d7802537bc10cb518cd2d09333681f563f06534f
Summary: fix random seed issue to make sure test is stable Reviewed By: igfox Differential Revision: D30314903 fbshipit-source-id: 74fa948cb3a54398d925e779b74d7912a62e64c6
Summary: Pull Request resolved: facebookresearch#520 Adds dedicated unit test for PPO Trainer, additionally: - Fixes a bug with fully connected value net - Fixes some bugs in PPO training around using value net - Adds possible_action_mask to DuelingQNetwork Reviewed By: czxttkl Differential Revision: D30114686 fbshipit-source-id: 3735af1ea65429867d63f7da1462194242ad8254
Differential Revision: D30114686 (facebookresearch@8d00eb1) Original commit changeset: 3735af1ea654 fbshipit-source-id: 905ff36cdf587565487b8ad2e623c3cfbd77effc
…evision. Summary: ^ Reviewed By: yifuwang Differential Revision: D30346110 fbshipit-source-id: 154e69f233132635e947ddbd252ffbd957ead6f1
Summary: Pull Request resolved: facebookresearch#671 CircleCI tests started failing. Example: https://app.circleci.com/pipelines/github/facebookresearch/ReAgent/2490/workflows/efbceb68-9a01-4889-8d82-3d4167af4643/jobs/24780 Root cause is a significant jump in protobuf version from 3.19.4 to 4.21.5. This was caused by an an updated dependency on protobuf in grcpio-tools. I'm adding an upper limit on grcpio-tools version to prevent this error Reviewed By: BerenLuthien, czxttkl Differential Revision: D38870120 fbshipit-source-id: 10d2e619aa10c2b03272b8305681792feefd06f4
Summary: Pull Request resolved: facebookresearch#672 The test was failing in CircleCI: https://app.circleci.com/pipelines/github/facebookresearch/ReAgent/2491/workflows/e153e4b9-795d-4c30-a3ff-6e4411b7dbf2/jobs/24796 I removed unnecessary parts which were breaking the test Reviewed By: czxttkl Differential Revision: D38874439 fbshipit-source-id: 74fe760014b5152d8a59f676beb4c8fa96e15798
Summary: Code design doc: - First part of https://docs.google.com/document/d/1eiyJpLSiSDufPc4h2nIRrHh4yllBtZ9DEsFaHMtl6GM/edit# Main difference from existing LinUCB model in the codebase: - Input is not a batch of data, but a List[batch] of data. Each element of the input is the corresponding batch for an arm. - Similarly: -- covariance matrix A and inv_A is (num_arms, d, d) dimension -- model coef: (num_arms, d) Reviewed By: alexnikulkov Differential Revision: D38374089 fbshipit-source-id: c60b990d8cb7d75ab334ceb24beebf4f5b78fc8a
Summary: Pull Request resolved: facebookresearch#661 Make `coefs` a property, instead of an attribute. This property automatically checks if the currently saved coefficient value is valid (computed using the current values of `A` and `b`) and uses it if so. If the current saved coefficient value is invalid, we compute the new value as `coefs = A**(-1) * b` and save it Reviewed By: czxttkl Differential Revision: D38283370 fbshipit-source-id: aa78828e6a8cf8a7971ccee734a2cbe71856ccd1
Summary: Pull Request resolved: facebookresearch#662 `predict_ucb` wasn't doing anything that can't be done by `ucb_alpha`, so I removed the `predict_ucb` and simplified the logic Reviewed By: czxttkl Differential Revision: D38305542 fbshipit-source-id: 72a4c4e240492c7abf61ece93d1a3725779b50ca
Summary: Pull Request resolved: facebookresearch#674 FX tracing is a pre-requisite of GMS. The current dense-only RL model is not FX traceable. In this diff the model is tweaked to enable FX tracing without any potential behavior change. Reviewed By: czxttkl Differential Revision: D38835020 fbshipit-source-id: 96fd835b44bf42bff42459f0d37654a802bd537d
Summary: The unit tests use torch.randn to generate examples, so it has some randomness which might fail due to floating issues. Proposed fixes are - torch set seed - increase tolerance to 10^-3. Reviewed By: alexnikulkov Differential Revision: D38950945 fbshipit-source-id: e300bd601720bd5671b4729e3406dbe7200f08f3
Summary: Pull Request resolved: facebookresearch#677 Enable distributed training by reducing the values of `A` and `b` across all trainer instances. For non-distributed training nothing changes because `sync_ddp_if_available` returns the input if distributed training isn't used Differential Revision: D38294567 fbshipit-source-id: af1973c764f85fbc1c27b0e1395ed726c5193649
…ch#675) Summary: Pull Request resolved: facebookresearch#675 Add dtype conversion to reagent CB batch preprocessor Without this some elements were `double` instead of `float` Reviewed By: czxttkl Differential Revision: D38807573 fbshipit-source-id: d5120eb0555cba85e5e3a1d9d13ed9c8d43d8363
…cebookresearch#678) Summary: Pull Request resolved: facebookresearch#678 Update reagent fully_connected_network.py to comply with fx trace Reviewed By: czxttkl Differential Revision: D38748880 fbshipit-source-id: 3f8f5a85504591e0d7c1bbfbb1bff418502d3bb7
Summary: Pull Request resolved: facebookresearch#676 Interaction features weren't used by any of the first 3 users of LinUCB, so I'm removing them for simplicity Reviewed By: czxttkl Differential Revision: D38807570 fbshipit-source-id: 2360fae3e931847e46121cb9d51cad7c25b13f21
Summary: Use the same device for constant tensor as input tensor to avoid concatenating tensors on separate devices Reviewed By: czxttkl Differential Revision: D39020305 fbshipit-source-id: c042e337591744f0c97cac0d0791763333f38108
Summary: See N2441818 for more details! Applying discounted factor to Disjoint LinUCB, algo follows paper: https://arxiv.org/pdf/1909.09146.pdf. This brings two benefits: - When incremental training is run for a long time, covariance matrix A = \sum x x^T and b = \sum x y won't explode - This algorithm is also optimal if the LinUCB model is non-stationary, i.e., the true underlying linear model may change over time due to for example, uses' interest shift. Concerns: - The exploration quantity in this new algo requires matrix multiplication inv_A * A_tilde * inv_A every update, which is expensive. Solutions: - We can consider a simplified version of the above algorithm, by just using inv_A = \sum \gamma x x^T, and for exploration quantity we use \sqrt ( x^T inv_A x ) instead of \sqrt ( x^T inv_A * A_tilde * inv_A x ), this significantly simplifies the algo updates and predictions. - The above simplified solution is shown to achieve similar / a little better performance in a simulation study: See N2441818 Reviewed By: alexnikulkov Differential Revision: D39157916 fbshipit-source-id: 0af7303f0d826851e71278c7e135f8a241943f48
Summary: Pull Request resolved: facebookresearch#681 Dtype should be specified when creating a tensor Differential Revision: D39325426 fbshipit-source-id: c06843141182b90fc082c9b0a3c79b9015e57d94
Summary: Remove the if logic in forward function since it's blocking model publish. Also need to fix unit test caused by removing these two lines. Reviewed By: zxpmirror1994 Differential Revision: D39704111 fbshipit-source-id: 0df8c25bc320a1ed81e08a164a29a3467fcabdfa
Summary: Pull Request resolved: facebookresearch#683 Follow up to https://fb.workplace.com/groups/1013818346200497/posts/1141315933450737 To avoid test failures like ``` ERROR: test_configure_optimizers (reagent.test.training.test_qrdqn.TestQRDQN) ---------------------------------------------------------------------- Traceback (most recent call last): File "/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/buck-out/dev/gen/reagent/training_tests#binary,link-tree/reagent/test/training/test_qrdqn.py", line 179, in test_configure_optimizers optimizers = trainer.configure_optimizers() File "/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/buck-out/dev/gen/reagent/training_tests#binary,link-tree/reagent/training/qrdqn_trainer.py", line 84, in configure_optimizers self.q_network_optimizer.make_optimizer_scheduler( File "/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/buck-out/dev/gen/reagent/training_tests#binary,link-tree/reagent/optimizer/union.py", line 62, in make_optimizer_scheduler return self.value.make_optimizer_scheduler(params) File "/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/buck-out/dev/gen/reagent/training_tests#binary,link-tree/reagent/optimizer/optimizer.py", line 72, in make_optimizer_scheduler filtered_args = { File "/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/buck-out/dev/gen/reagent/training_tests#binary,link-tree/reagent/optimizer/optimizer.py", line 73, in <dictcomp> k: getattr(self, k) AttributeError: 'Adam' object has no attribute 'fused' ``` Reviewed By: gji1 Differential Revision: D39750850 fbshipit-source-id: eabf013bbd07b56d6c3b7073fb5e2d8839280b26
…-friendly (facebookresearch#682) Summary: Pull Request resolved: facebookresearch#682 The "if" statement in the forward pass causes a warning during tracing. It will be safer to use a separate `_forward_no_coefs_check()` method for the forward pass of a trained model. In this method we don't check the validity of coefficients, instead the coefficients are assumed to be valid because we compute them at the end of the training epoch The behavior of existing methods/properties should remain unchanged. Reviewed By: czxttkl Differential Revision: D39692331 fbshipit-source-id: bcb2a705345ad34e7ab1574c21955dd111267de9
Summary: Pull Request resolved: facebookresearch#684 1. We train LinUCB models on two hours (7am and 8am) separately and get covariance matrixes A1 and A2 for each. 2. We train a LinUCB model at 7am, enable recurring so it launched a recurring training at 8am. Then we get covariance matrix A. We found that A is around 8*A1 + A2 instead of A1 + A2, where we used NPROC_PER_NODE = 8 when training. The reason is the following: Current implementation aggregates A and b calculated in each training processor. However, if we do recurring training, then in the 2nd training child, it will aggregate the previous A NPROC_PER_NODE times and then add the new A aggregated from current epoch. This diff fixes the above issue. Differential Revision: D39902734 fbshipit-source-id: 58ab1fade9feebd6b0419b18dd085a3944736f23
…and current_epoch (facebookresearch#686) Summary: Pull Request resolved: facebookresearch#686 A copy of D39902734 for joint LinUCB Differential Revision: D39929408 fbshipit-source-id: 81983d9fc648fd3902a2d8b045ab368a240bac6d
Summary: Pull Request resolved: facebookresearch#685 Mostly a copy of D39157916 for Joint LinUCB model Differential Revision: D39929407 fbshipit-source-id: 7566c00641f1b467c4d92a25d213ac7b0c9b4bbd
Differential Revision: D40343271 fbshipit-source-id: ee93ef1d4e750c70e0260822172b5657f340d342
Summary: Pull Request resolved: facebookresearch#656 fixed acquisition functions for Bayes by Backprop and some general clean (comments, functions, etc) Reviewed By: czxttkl Differential Revision: D36470135 fbshipit-source-id: daa23d5b86e764ee4df327df583a4c4d4d621347
Summary: Pull Request resolved: facebookresearch#687 added comments for MLPEnsebler optimizer Reviewed By: czxttkl Differential Revision: D40538571 fbshipit-source-id: ca55dd9be3889ecc0965c8b32bb229092e63d6dd
This pull request was exported from Phabricator. Differential Revision: D40456561 |
Summary: X-link: meta-pytorch/torchrec#742 Pull Request resolved: facebookresearch#689 [Free] Authoring Aware ROO on LTV ReAgent model Differential Revision: D40456561 fbshipit-source-id: 797f51078b61dda7c0b7bde5fbec475a8e28191b
This pull request was exported from Phabricator. Differential Revision: D40456561 |
aea4525
to
deae6c2
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary: [Free] Authoring Aware ROO on LTV ReAgent model
Differential Revision: D40456561