Mask out non-present arm scores for Offline Eval #702

alexnikulkov · 2022-12-14T00:21:19Z

Summary: When some arms might be missing, apply a masked softmax to model scores during offline eval to avoid selecting the missing arms.

Differential Revision: D41990957

Summary: Pull Request resolved: facebookresearch#513 The previous approach use the fixed slate_size, which includes padded items, and it shouldn't give use the actual average over valid Q-value estimations. This diff fix this issue by calculating the actual slate_size summing the item mask (1 if an item is valid) over each slate. Reviewed By: czxttkl Differential Revision: D29848923 fbshipit-source-id: 2a3fea30cdaa46b85b72fe5b5d054d7b78755a5b

Summary: Pull Request resolved: facebookresearch#514 As titled. RBF kernel is used in Eqn. 10 in https://jgillenw.com/cikm2018.pdf. Reviewed By: Strideradu Differential Revision: D29894690 fbshipit-source-id: 46681ca4e0b5091434834d7f86d9d87c7228da64

Summary: Our current PathManager is based on fvcore, it's in the process of deprecation and is being replaced by an open source solution iopath. This diff is the result of running the provided codemod script on reagent, rl, and rl_exp, followed by a round of autodeps and a fbgs search for 'fvcore'. https://fb.workplace.com/groups/939200583171018/permalink/1022439911513751/ Reviewed By: czxttkl Differential Revision: D29974786 fbshipit-source-id: 397fd69ef94d43a7ca07c963c2a46bbbdcf78599

Summary: Pull Request resolved: facebookresearch#515 title Reviewed By: czxttkl Differential Revision: D29922558 fbshipit-source-id: b5ad7863d5c5b15363a5e9daf237800b79c260f2

Summary: Pull Request resolved: facebookresearch#516 See title. Also: - modifies the CRR constructor to use the standard target_model format - fixes a bug with delayed_policy_update - adds some todos Reviewed By: czxttkl Differential Revision: D29970711 fbshipit-source-id: 8d3add660b865d96b365cbda9baf0aa7ea13e879

Summary: Pull Request resolved: facebookresearch#519 Reviewed By: igfox Differential Revision: D30047652 fbshipit-source-id: b2a4b2542455e43798a8c0b4606be88bcb00f823

…earch#517) Summary: Pull Request resolved: facebookresearch#517 If we don't enable `single_selection`, we should find a way to calculate the next slate value from all Q-values of all items on the next slate. The default way to calculate that is by summing up all the next Q-values and average by the slate_size of the next slate. There is another way to average by the current slate_size as well. This wasn't an issue before since the slate_size is fixed, but after we landed D29848923 (facebookresearch@5351f63), the slate size can be different. We're not sure theoretically if either averaging method is better, so we propose this diff to allow configuring that. The new option is called `next_slate_value_norm_method` and it can take: - `"norm_by_current_slate_size"`: sum the next slate Q-values and average by the **current** slate size; - `"norm_by_next_slate_size"`: sum the next slate Q-values and average by the **next** slate size; cc achechetka solofsson Reviewed By: czxttkl Differential Revision: D29986728 fbshipit-source-id: f178b5da1462e4f9cc6995367ed229ab958c477a

Summary: Currently our recurring training does not cache model outputs instance-to-instance. This means we can't do things like ensure action frequencies don't change too much across deployments. This diff adds the ability to save the past training outputs into a result_history (which can be initialized from external runs) in a fixed-size deque. These result_histories are passed in to each model validator, though currently they are unused. Reviewed By: kittipatv Differential Revision: D30024188 fbshipit-source-id: d5c14dce310ec5e34f7539f7c1a3942eabd79553

Differential Revision: D30120143 fbshipit-source-id: 6a970483211b768fc4439d620ed912825fc1f84b

Summary: Pull Request resolved: facebookresearch#521 It's no longer used Reviewed By: kittipatv Differential Revision: D30133029 fbshipit-source-id: d76dfdcd7ad8874f69d24870f1421d5cb4182827

Summary: The feature importance is based on data perturbation Reviewed By: kaiwenw Differential Revision: D30136375 fbshipit-source-id: 94ae08f530925023bb54cee1be8d1f246ae01fca

…straint) regularization (facebookresearch#522) Summary: Pull Request resolved: facebookresearch#522 (1) Add coefficient "beta" (regularization factor of policy constraint regularization) on CRR's objective. (2) Change default value for "DEFAULT_MAX_UNIQUE_ENUM" from 100 to 10 Reviewed By: czxttkl Differential Revision: D30183147 fbshipit-source-id: 1c18610678482397bdd669fd064a27d34967881f

…nce (facebookresearch#523) Summary: Pull Request resolved: facebookresearch#523 as titled Reviewed By: kaiwenw Differential Revision: D30182710 fbshipit-source-id: 558b6d7093ea23b6ffb23387b7f48e873013d373

Summary: Pull Request resolved: facebookresearch#525 ARS utility functions for reagent Reviewed By: czxttkl Differential Revision: D30005486 fbshipit-source-id: d7802537bc10cb518cd2d09333681f563f06534f

Summary: fix random seed issue to make sure test is stable Reviewed By: igfox Differential Revision: D30314903 fbshipit-source-id: 74fa948cb3a54398d925e779b74d7912a62e64c6

Summary: Pull Request resolved: facebookresearch#520 Adds dedicated unit test for PPO Trainer, additionally: - Fixes a bug with fully connected value net - Fixes some bugs in PPO training around using value net - Adds possible_action_mask to DuelingQNetwork Reviewed By: czxttkl Differential Revision: D30114686 fbshipit-source-id: 3735af1ea65429867d63f7da1462194242ad8254

Differential Revision: D30114686 (facebookresearch@8d00eb1) Original commit changeset: 3735af1ea654 fbshipit-source-id: 905ff36cdf587565487b8ad2e623c3cfbd77effc

…evision. Summary: ^ Reviewed By: yifuwang Differential Revision: D30346110 fbshipit-source-id: 154e69f233132635e947ddbd252ffbd957ead6f1

Summary: Pull Request resolved: facebookresearch#526 Adds dedicated unit test for PPO Trainer, additionally: - Fixes a bug with fully connected value net - Fixes some bugs in PPO training around using value net - Adds possible_action_mask to DuelingQNetwork Note: a continuation of D30114686 (facebookresearch@8d00eb1), which I reverted after it caused some CircleCI failures Reviewed By: czxttkl Differential Revision: D30342897 fbshipit-source-id: 9be5e86d234619e97e476e46556a4dee07e3b734

Reviewed By: czxttkl Differential Revision: D29880479 fbshipit-source-id: 61c241d5570c7b81567974c50068a672e6058278

Summary: Implement DiscreteDqnDataModule as an AutoDataModule. Reviewed By: czxttkl Differential Revision: D29835012 fbshipit-source-id: 384413ac3d61cd52285c6a860cff0e0f15e299e0

Summary: Update SAC to support ID-list features Reviewed By: czxttkl Differential Revision: D29880917 fbshipit-source-id: b7be1b7727a1749af38e1640d192b15c1b7608d1

Summary: Pull Request resolved: facebookresearch#527 ActorPredictorUnwrapper takes state features as positional arguments, not ServingFeatureData. Reviewed By: igfox Differential Revision: D30428162 fbshipit-source-id: aaa7307cef35200545478c621b7cb3fe9a1f4eea

Summary: Diff D30342897 (facebookresearch@9b25610) swapped out uses of FullyConnected (which takes a tensor as input) with FloatFeatureFullyConnected (which takes FeatureData as input). This broke an assumption made in the predictor wrapper. Reviewed By: kittipatv Differential Revision: D30432700 fbshipit-source-id: 732eda23f97cb21f094daed6857fb44dc49316b3

Summary: When `feature_type` is given, the parameters for Box-Cox transformation are not computed. That causes error when we try to instantiate the normalization Reviewed By: igfox Differential Revision: D30437833 fbshipit-source-id: 2e9c25a28e6d9cfe85670eb3b6714668f4cefff6

Summary: Pull Request resolved: facebookresearch#528 Simplify the comments Reviewed By: xuruiyang Differential Revision: D30343990 fbshipit-source-id: 8b3c4172a4af9e01c27e8e511486bed68c1032b5

Differential Revision: D30514142 fbshipit-source-id: 4e9d8facc613e67d7806a26a170c8b7545a1c742

Summary: - Pass down action_names to DiscreteDqnDataModule - Fix action query formatting Reviewed By: czxttkl Differential Revision: D30527307 fbshipit-source-id: b9128b3708dc922f774d7fa97d041c0e16df1088

Summary: Pull Request resolved: facebookresearch#529 Using torch::jit::load has some privacy issues: T93507328 instead we're supposed to load the model as a caffe2:PyTorchPredictorContainer and then extract the pytorch module. Reviewed By: kittipatv Differential Revision: D30285801 fbshipit-source-id: b13330d5a27eec943a46fe13a2be4c203e2e993c

Summary: Steps 1. Run scorer (learned CRR actor/q network) on the whole evaluation dataset, and get a list of scores (for action 1) 2. Find the percentiles of these scores. This determines a threshold for 60% promo for example. 3. use this threshold to construct a new predictor wrapper, which outputs 1/0 4. Replace the original wrapper with this new wrapper, with the same manifold path. Validator takes a parameter that is Null by default. If not specified, the promo threshold will be set s.t. the promo ratio matches the dataset. If specified, the promo threshold will be set s.t. promo ratio is equal to the specified percentile. Reviewed By: DavidV17 Differential Revision: D30584956 fbshipit-source-id: 310f91bc25470904dfcf1b8b6455376334d2a8f0

…ch#675) Summary: Pull Request resolved: facebookresearch#675 Add dtype conversion to reagent CB batch preprocessor Without this some elements were `double` instead of `float` Reviewed By: czxttkl Differential Revision: D38807573 fbshipit-source-id: d5120eb0555cba85e5e3a1d9d13ed9c8d43d8363

…cebookresearch#678) Summary: Pull Request resolved: facebookresearch#678 Update reagent fully_connected_network.py to comply with fx trace Reviewed By: czxttkl Differential Revision: D38748880 fbshipit-source-id: 3f8f5a85504591e0d7c1bbfbb1bff418502d3bb7

Summary: Pull Request resolved: facebookresearch#676 Interaction features weren't used by any of the first 3 users of LinUCB, so I'm removing them for simplicity Reviewed By: czxttkl Differential Revision: D38807570 fbshipit-source-id: 2360fae3e931847e46121cb9d51cad7c25b13f21

Summary: Use the same device for constant tensor as input tensor to avoid concatenating tensors on separate devices Reviewed By: czxttkl Differential Revision: D39020305 fbshipit-source-id: c042e337591744f0c97cac0d0791763333f38108

Summary: See N2441818 for more details! Applying discounted factor to Disjoint LinUCB, algo follows paper: https://arxiv.org/pdf/1909.09146.pdf. This brings two benefits: - When incremental training is run for a long time, covariance matrix A = \sum x x^T and b = \sum x y won't explode - This algorithm is also optimal if the LinUCB model is non-stationary, i.e., the true underlying linear model may change over time due to for example, uses' interest shift. Concerns: - The exploration quantity in this new algo requires matrix multiplication inv_A * A_tilde * inv_A every update, which is expensive. Solutions: - We can consider a simplified version of the above algorithm, by just using inv_A = \sum \gamma x x^T, and for exploration quantity we use \sqrt ( x^T inv_A x ) instead of \sqrt ( x^T inv_A * A_tilde * inv_A x ), this significantly simplifies the algo updates and predictions. - The above simplified solution is shown to achieve similar / a little better performance in a simulation study: See N2441818 Reviewed By: alexnikulkov Differential Revision: D39157916 fbshipit-source-id: 0af7303f0d826851e71278c7e135f8a241943f48

Summary: Pull Request resolved: facebookresearch#681 Dtype should be specified when creating a tensor Differential Revision: D39325426 fbshipit-source-id: c06843141182b90fc082c9b0a3c79b9015e57d94

Summary: Remove the if logic in forward function since it's blocking model publish. Also need to fix unit test caused by removing these two lines. Reviewed By: zxpmirror1994 Differential Revision: D39704111 fbshipit-source-id: 0df8c25bc320a1ed81e08a164a29a3467fcabdfa

Summary: Pull Request resolved: facebookresearch#683 Follow up to https://fb.workplace.com/groups/1013818346200497/posts/1141315933450737 To avoid test failures like ``` ERROR: test_configure_optimizers (reagent.test.training.test_qrdqn.TestQRDQN) ---------------------------------------------------------------------- Traceback (most recent call last): File "/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/buck-out/dev/gen/reagent/training_tests#binary,link-tree/reagent/test/training/test_qrdqn.py", line 179, in test_configure_optimizers optimizers = trainer.configure_optimizers() File "/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/buck-out/dev/gen/reagent/training_tests#binary,link-tree/reagent/training/qrdqn_trainer.py", line 84, in configure_optimizers self.q_network_optimizer.make_optimizer_scheduler( File "/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/buck-out/dev/gen/reagent/training_tests#binary,link-tree/reagent/optimizer/union.py", line 62, in make_optimizer_scheduler return self.value.make_optimizer_scheduler(params) File "/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/buck-out/dev/gen/reagent/training_tests#binary,link-tree/reagent/optimizer/optimizer.py", line 72, in make_optimizer_scheduler filtered_args = { File "/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/buck-out/dev/gen/reagent/training_tests#binary,link-tree/reagent/optimizer/optimizer.py", line 73, in <dictcomp> k: getattr(self, k) AttributeError: 'Adam' object has no attribute 'fused' ``` Reviewed By: gji1 Differential Revision: D39750850 fbshipit-source-id: eabf013bbd07b56d6c3b7073fb5e2d8839280b26

…-friendly (facebookresearch#682) Summary: Pull Request resolved: facebookresearch#682 The "if" statement in the forward pass causes a warning during tracing. It will be safer to use a separate `_forward_no_coefs_check()` method for the forward pass of a trained model. In this method we don't check the validity of coefficients, instead the coefficients are assumed to be valid because we compute them at the end of the training epoch The behavior of existing methods/properties should remain unchanged. Reviewed By: czxttkl Differential Revision: D39692331 fbshipit-source-id: bcb2a705345ad34e7ab1574c21955dd111267de9

Summary: Pull Request resolved: facebookresearch#684 1. We train LinUCB models on two hours (7am and 8am) separately and get covariance matrixes A1 and A2 for each. 2. We train a LinUCB model at 7am, enable recurring so it launched a recurring training at 8am. Then we get covariance matrix A. We found that A is around 8*A1 + A2 instead of A1 + A2, where we used NPROC_PER_NODE = 8 when training. The reason is the following: Current implementation aggregates A and b calculated in each training processor. However, if we do recurring training, then in the 2nd training child, it will aggregate the previous A NPROC_PER_NODE times and then add the new A aggregated from current epoch. This diff fixes the above issue. Differential Revision: D39902734 fbshipit-source-id: 58ab1fade9feebd6b0419b18dd085a3944736f23

…and current_epoch (facebookresearch#686) Summary: Pull Request resolved: facebookresearch#686 A copy of D39902734 for joint LinUCB Differential Revision: D39929408 fbshipit-source-id: 81983d9fc648fd3902a2d8b045ab368a240bac6d

Summary: Pull Request resolved: facebookresearch#685 Mostly a copy of D39157916 for Joint LinUCB model Differential Revision: D39929407 fbshipit-source-id: 7566c00641f1b467c4d92a25d213ac7b0c9b4bbd

Differential Revision: D40343271 fbshipit-source-id: ee93ef1d4e750c70e0260822172b5657f340d342

Summary: Pull Request resolved: facebookresearch#656 fixed acquisition functions for Bayes by Backprop and some general clean (comments, functions, etc) Reviewed By: czxttkl Differential Revision: D36470135 fbshipit-source-id: daa23d5b86e764ee4df327df583a4c4d4d621347

Summary: Pull Request resolved: facebookresearch#687 added comments for MLPEnsebler optimizer Reviewed By: czxttkl Differential Revision: D40538571 fbshipit-source-id: ca55dd9be3889ecc0965c8b32bb229092e63d6dd

Differential Revision: D40790977 fbshipit-source-id: 1890df9ba16ce3853b52851e00ba07dfcb3a7f6f

Summary: Fix fx.wrap caused model script issue Reviewed By: luofangyuyx, sijiac Differential Revision: D40625493 fbshipit-source-id: dd77d842de1c703044b05fee6f28368d766f7910

Summary: fix jit.script issue after fx.wrap Reviewed By: sijiac Differential Revision: D40625820 fbshipit-source-id: 6fe08daf3f25ba3cf38b418069abff189ab2dce6

Summary: Pull Request resolved: facebookresearch#690 In the GPU mode, the function will raise error since tensors created by torch.ones or torch.zeros are in CPU and rest tensors are in GPU. Reviewed By: alexnikulkov Differential Revision: D41062175 fbshipit-source-id: 27a4be58804f72f258476c749c64731de157c7f2

Summary: X-link: pytorch/pytorch#88701 X-link: meta-pytorch/tnt#269 Pull Request resolved: facebookresearch#691 X-link: meta-pytorch/torchsnapshot#129 X-link: meta-pytorch/torchrec#799 X-link: facebookresearch/detectron2#4649 Context in https://fburl.com/4irjskbe This change deletes distributed.pyi, so that lintrunner will run mypy on distributed.py for typing check. It also helps to fix a lot pyre false alarms, so that we can remove those pyre suppressions. Reviewed By: zhaojuanmao Differential Revision: D41028360 fbshipit-source-id: 577f212ca4c47e23a8577c4b92385c483f96e2c1

Summary: Pull Request resolved: facebookresearch#692 Fixing a bug introduced in D41062175 Reviewed By: qfettes Differential Revision: D41164406 fbshipit-source-id: d6bac862ea37fd9a807f6a344ddd8e6cb0b31c45

…h#693) Summary: Pull Request resolved: facebookresearch#693 Reviewed By: BerenLuthien Differential Revision: D41226452 fbshipit-source-id: 57c384670f2fd28c56a9554581a07289dc217868

Summary: Pull Request resolved: facebookresearch#697 I recently exposed LRScheduler as a public endpoint as that is the right direction for users. This diff adds LRScheduler as a torch lr scheduler, which it is. Would fix test errors such as https://www.internalfb.com/intern/testinfra/diagnostics/281475249445597.562950030008200.1668085134/, which were introduced by my landing of D41109279. Created from CodeHub with https://fburl.com/edit-in-codehub Reviewed By: czxttkl Differential Revision: D41187073 fbshipit-source-id: 2637b6a80247c24620cf0ce8310e8181135637cd

Summary: Pull Request resolved: facebookresearch#695 Add Offline Evaluation for non-stationary Contextual Bandit policies. This diff includes only the Policy Evaluator algorithms from the LinUCB paper: https://arxiv.org/pdf/1003.0146.pdf (Algorithm 3) Reviewed By: BerenLuthien Differential Revision: D41226450 fbshipit-source-id: 10fae8b9b0fb10d44d8ddf313938028585a94c07

…search#694) Summary: Pull Request resolved: facebookresearch#694 `BaseCBTrainerWithEval` integrates Offline Eval into the training process. By default the behavior is same as before refactor. But after `.attach_eval_module()` method gets called, every batch is processed by the eval module before training on it. The processing includes keeping track of the reward and filtering the training batch. Reviewed By: BerenLuthien Differential Revision: D41239491 fbshipit-source-id: f5c506d14a736a71ddc1b64270d1e8842a23488b

Summary: Pull Request resolved: facebookresearch#700 In addition to number of observations, keep track of total weight of consumed data Reviewed By: BerenLuthien Differential Revision: D41483276 fbshipit-source-id: fab4c95455d7ef611706b9356ffaf3416adc1d6d

Differential Revision: D41989361 fbshipit-source-id: b6ae2f41136bea32201ddf9ec0f0cf8680cf270c

Summary: When some arms might be missing, apply a masked softmax to model scores during offline eval to avoid selecting the missing arms. Differential Revision: D41990957 fbshipit-source-id: feeba5e1b4ede5e50467b50e58be2361b3858610

Ruizhe Zhao and others added 30 commits July 27, 2021 05:46

Add unit tests for qr-dqn trainer (facebookresearch#515)

6887798

Summary: Pull Request resolved: facebookresearch#515 title Reviewed By: czxttkl Differential Revision: D29922558 fbshipit-source-id: b5ad7863d5c5b15363a5e9daf237800b79c260f2

fix another manual_backward call (facebookresearch#519)

ada9470

Summary: Pull Request resolved: facebookresearch#519 Reviewed By: igfox Differential Revision: D30047652 fbshipit-source-id: b2a4b2542455e43798a8c0b4606be88bcb00f823

suppress errors in fbcode/reagent - batch 1

7d5bdbf

Differential Revision: D30120143 fbshipit-source-id: 6a970483211b768fc4439d620ed912825fc1f84b

Remove loss_reporter.py (facebookresearch#521)

97e3da3

Summary: Pull Request resolved: facebookresearch#521 It's no longer used Reviewed By: kittipatv Differential Revision: D30133029 fbshipit-source-id: d76dfdcd7ad8874f69d24870f1421d5cb4182827

Add feature importance in star search publishing

28e07db

Summary: The feature importance is based on data perturbation Reviewed By: kaiwenw Differential Revision: D30136375 fbshipit-source-id: 94ae08f530925023bb54cee1be8d1f246ae01fca

Add resource requirements and repeats for star search feature importa…

db010a9

…nce (facebookresearch#523) Summary: Pull Request resolved: facebookresearch#523 as titled Reviewed By: kaiwenw Differential Revision: D30182710 fbshipit-source-id: 558b6d7093ea23b6ffb23387b7f48e873013d373

ARS utility (facebookresearch#525)

0948f01

Summary: Pull Request resolved: facebookresearch#525 ARS utility functions for reagent Reviewed By: czxttkl Differential Revision: D30005486 fbshipit-source-id: d7802537bc10cb518cd2d09333681f563f06534f

fix random seed

6c7cd4f

Summary: fix random seed issue to make sure test is stable Reviewed By: igfox Differential Revision: D30314903 fbshipit-source-id: 74fa948cb3a54398d925e779b74d7912a62e64c6

Revert D30114686: Add PPOTrainer Unit Test

04fab8f

Differential Revision: D30114686 (facebookresearch@8d00eb1) Original commit changeset: 3735af1ea654 fbshipit-source-id: 905ff36cdf587565487b8ad2e623c3cfbd77effc

Upgrade requirements.txt for downstream projects to latest fbcode r…

39ea5bd

…evision. Summary: ^ Reviewed By: yifuwang Differential Revision: D30346110 fbshipit-source-id: 154e69f233132635e947ddbd252ffbd957ead6f1

ID-list feature support

747803f

Reviewed By: czxttkl Differential Revision: D29880479 fbshipit-source-id: 61c241d5570c7b81567974c50068a672e6058278

DiscreteDqnDataModule

0f3b1cd

Summary: Implement DiscreteDqnDataModule as an AutoDataModule. Reviewed By: czxttkl Differential Revision: D29835012 fbshipit-source-id: 384413ac3d61cd52285c6a860cff0e0f15e299e0

SAC w/ ID-list features

0a387c1

Summary: Update SAC to support ID-list features Reviewed By: czxttkl Differential Revision: D29880917 fbshipit-source-id: b7be1b7727a1749af38e1640d192b15c1b7608d1

Update ARS comments (facebookresearch#528)

81eda73

Summary: Pull Request resolved: facebookresearch#528 Simplify the comments Reviewed By: xuruiyang Differential Revision: D30343990 fbshipit-source-id: 8b3c4172a4af9e01c27e8e511486bed68c1032b5

suppress errors in fbcode/reagent - batch 1

22b5d42

Differential Revision: D30514142 fbshipit-source-id: 4e9d8facc613e67d7806a26a170c8b7545a1c742

Fix action expression

b4028d8

Summary: - Pass down action_names to DiscreteDqnDataModule - Fix action query formatting Reviewed By: czxttkl Differential Revision: D30527307 fbshipit-source-id: b9128b3708dc922f774d7fa97d041c0e16df1088

alexnikulkov and others added 28 commits August 24, 2022 10:50

Sync device in shift_kjt_by_one

bdd4b21

Summary: Use the same device for constant tensor as input tensor to avoid concatenating tensors on separate devices Reviewed By: czxttkl Differential Revision: D39020305 fbshipit-source-id: c042e337591744f0c97cac0d0791763333f38108

Fix LinUCB tests (facebookresearch#681)

5f81d6e

Summary: Pull Request resolved: facebookresearch#681 Dtype should be specified when creating a tensor Differential Revision: D39325426 fbshipit-source-id: c06843141182b90fc082c9b0a3c79b9015e57d94

add discounting to LinUCB (facebookresearch#685)

15825f6

Summary: Pull Request resolved: facebookresearch#685 Mostly a copy of D39157916 for Joint LinUCB model Differential Revision: D39929407 fbshipit-source-id: 7566c00641f1b467c4d92a25d213ac7b0c9b4bbd

upgrade pyre version in fbcode/reagent - batch 1

6ee6716

Differential Revision: D40343271 fbshipit-source-id: ee93ef1d4e750c70e0260822172b5657f340d342

comments updated for MLPEnsembler optimizer (facebookresearch#687)

30715aa

Summary: Pull Request resolved: facebookresearch#687 added comments for MLPEnsebler optimizer Reviewed By: czxttkl Differential Revision: D40538571 fbshipit-source-id: ca55dd9be3889ecc0965c8b32bb229092e63d6dd

suppress errors in reagent

7a79fdc

Differential Revision: D40790977 fbshipit-source-id: 1890df9ba16ce3853b52851e00ba07dfcb3a7f6f

fix reagent model script errors

64d3353

Summary: Fix fx.wrap caused model script issue Reviewed By: luofangyuyx, sijiac Differential Revision: D40625493 fbshipit-source-id: dd77d842de1c703044b05fee6f28368d766f7910

fix FeatureData fx wrap issue

af1cecc

Summary: fix jit.script issue after fx.wrap Reviewed By: sijiac Differential Revision: D40625820 fbshipit-source-id: 6fe08daf3f25ba3cf38b418069abff189ab2dce6

Fix bug in sampler log_prob dim (facebookresearch#692)

ff1ff09

Summary: Pull Request resolved: facebookresearch#692 Fixing a bug introduced in D41062175 Reviewed By: qfettes Differential Revision: D41164406 fbshipit-source-id: d6bac862ea37fd9a807f6a344ddd8e6cb0b31c45

Add support for 3d tensors to batch_quadratic_form() (facebookresearc…

9de19b6

…h#693) Summary: Pull Request resolved: facebookresearch#693 Reviewed By: BerenLuthien Differential Revision: D41226452 fbshipit-source-id: 57c384670f2fd28c56a9554581a07289dc217868

add support for variable number of arms to FbContBanditBatchPreprocessor

48a5272

Differential Revision: D41989361 fbshipit-source-id: b6ae2f41136bea32201ddf9ec0f0cf8680cf270c

Mask out non-present arm scores for Offline Eval

cb75d0d

Summary: When some arms might be missing, apply a masked softmax to model scores during offline eval to avoid selecting the missing arms. Differential Revision: D41990957 fbshipit-source-id: feeba5e1b4ede5e50467b50e58be2361b3858610

facebook-github-bot added the cla signed label Jan 12, 2023

xuruiyang force-pushed the main branch from ac49faa to d3ac80e Compare September 20, 2025 06:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Mask out non-present arm scores for Offline Eval #702

Mask out non-present arm scores for Offline Eval #702

Uh oh!

alexnikulkov commented Dec 14, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

27 participants

Mask out non-present arm scores for Offline Eval #702

Are you sure you want to change the base?

Mask out non-present arm scores for Offline Eval #702

Uh oh!

Conversation

alexnikulkov commented Dec 14, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

27 participants