-
Notifications
You must be signed in to change notification settings - Fork 17
Single controller hackathon smd #149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
wensun
merged 74 commits into
single-controller-hackathon
from
single-controller-hackathon-smd
Sep 3, 2025
Merged
Changes from 73 commits
Commits
Show all changes
74 commits
Select commit
Hold shift + click to select a range
c5f4b5c
.
wensun 60f016a
.
wensun 50b3a70
.
wensun bed1179
.
wensun c69e228
.
wensun 665b80c
.
wensun 8ae2b01
.
wensun 752995b
.
wensun 83e8a8f
.
wensun 4a3a1ac
.
wensun 0c45113
.
wensun 689fd19
.
wensun f27e5e8
.
wensun 3cae5d1
.
wensun 3ad15d3
.
wensun 3faab3c
.
wensun 70730bc
.
wensun d309886
.
wensun 9090acd
.
wensun a1d5dfe
.
wensun 808dcd1
.
wensun 746310a
.
wensun 6434316
.
wensun b578daf
.
wensun fa59b25
.
wensun 1f6af37
.
wensun 1a8adc8
.
wensun 8cb130e
.
wensun b65384f
.
wensun 68b9bfb
.
wensun ee71caf
start clean up
wensun 1e9d123
recreating wrong example beta string
wensun 2badf80
bug reproduced, revert back to the working version
wensun 918f6b7
creating second bebugging example: kl/policy_kl
wensun b5f5da1
convert back to the correct version and check in the yaml for smd
wensun 0abe985
.
wensun a93c8d5
addressed comments from bowen
wensun f4d4cc5
.
wensun 4fa329a
add vllm logp and importance weight
wensun e414dcd
.
wensun 603d748
.
wensun 35a365e
.
wensun 549428f
.
wensun be53de6
.
wensun e5f8164
.
wensun 2e159c0
.
wensun 49a451a
.
wensun 3799abb
.
wensun 83c4494
.
wensun 5572515
.
wensun 0536ad2
.
wensun a42f45a
.
wensun 96bb36e
.
wensun 5690a5d
.
wensun 1d21901
.
wensun 472519c
.
wensun e11b2a5
.
wensun a693b1d
delete ray debug, not useful
wensun fec31ba
remove some debug print
wensun d810e04
.
wensun 8736cb0
first draft of decoupled ppo
wensun 3b6d7f2
.
wensun e3212ef
.
wensun ef2b7f4
.
wensun f047c33
.
wensun 869923e
.
wensun d6a9922
add importance weight option
wensun 28c10bb
.
wensun c917afb
clean up logging
wensun ed9e1f0
more comments
wensun 0fd0458
revert the yamls back but added importance weight
wensun 1641314
include all math evals
wensun 70b40a3
.
wensun 693d1f0
addressed comments from bowen
wensun File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.