-
Notifications
You must be signed in to change notification settings - Fork 331
π§βπ³ Added Post training an VLM for reasoning with GRPO using TRL
recipe
#312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Check out this pull request onΒ See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
View / edit / reply to this conversation on ReviewNB ariG23498 commented on 2025-07-23T06:15:39Z I think the title should be "Post training a VLM for reasoning with GRPO using TRL" instead of "an VLM"
"In this recipe, we'll demonstrate how to post-train a" instead of "post-training".
We should have "Vision Language Model (VLM)" spelled out right at the beginning somewhere and then use "VLM" |
View / edit / reply to this conversation on ReviewNB ariG23498 commented on 2025-07-23T06:15:40Z Question: Do we still use the |
View / edit / reply to this conversation on ReviewNB ariG23498 commented on 2025-07-23T06:15:41Z Line #4. processor = AutoProcessor.from_pretrained(model_id, use_fast=True, padding_side="left") Is there a particular reason to put sergiopaniego commented on 2025-07-28T16:00:38Z Yes! It's needed so the generations during training are concatenated directly to the input. Otherwise, we could have [PAD] gaps between the input and the generation. I have added a line explaining that since it's relevant :) Thanks for pointing it out!! |
View / edit / reply to this conversation on ReviewNB ariG23498 commented on 2025-07-23T06:15:41Z Line #19. {"type": "image"}, I think we should also add the image to this dictionary. Something like the following:
{"type": "image", "image": example["image"]} |
View / edit / reply to this conversation on ReviewNB ariG23498 commented on 2025-07-23T06:15:42Z Line #11. # Parameters that control de data preprocessing
|
Yes! It's needed so the generations during training are concatenated directly to the input. Otherwise, we could have [PAD] gaps between the input and the generation. I have added a line explaining that since it's relevant :) Thanks for pointing it out!! View entire conversation on ReviewNB |
Ready for review! We can see that the reward goes up below: |
@@ -0,0 +1,3392 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this sentence is a bit inverted:
For our particular case where we want the model to learn to reason using images, we use as inputΒ image
Β andΒ problem
Β and as outputΒ solution
Β columns.
Reply via ReviewNB
@@ -0,0 +1,3392 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,3392 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
traininig* in last sentence
also more explanation on these params would be nice, i.e. what hardware limitations, for reasoning what could be the most important etc
Reply via ReviewNB
@@ -0,0 +1,3392 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in here we only see train loss and not reward, if we can't enable reward we could mention slightly as loss is a bit odd
Reply via ReviewNB
Thanks a lot for the comments, super valuable for improvement β€οΈ! Recipe improved based on feedback π |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once @merveenoyan's comments are addressed, it is okay to be merged.
This is a very nice recipe. Kudos on the work!
What does this PR do?
Fixes #311