Single 120b model or many 20b models? #46
turt2live
started this conversation in
gpt-oss-safeguard Implementation
Replies: 1 comment 1 reply
-
|
hey hey - Overall the 20B is well suited for your use-cases specially as you want a lot more room to scale (60x peak). The only thing I'd recommend is to define your policy in a more well-defined way i.e. have clear case of what's acceptable vs what isn't; instead of overly broad language! From a cost perspective it's much more efficient to deploy the 20B, you can always scale up to 120B if the current model is unable to work with your safety complexities. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hey all, we're continuing to experiment with gpt-oss-safeguard and have near-zero experience with running models ourselves. Our use case is to deploy the model in an online chat scenario as an evaluator of message content when our other filters aren't sure whether content is allowable in a room. We'd be starting with a few rooms at first, but looking to expand to more rooms as we build confidence in safeguard's accuracy (and our policy's content). This would be less than 1Hz of traffic hitting the model at first, but would grow to about 3-5Hz over time (possibly 60Hz+ at full scale).
It's our understanding that the 120b model is more accurate, but has higher latency, than the 20b model, but we're not sure what that means in practice. We're likely only going to get a single GPU to experiment with at first, so the question would be: should we start our experimentation with a single 120b model or deploy a few 20b models on that GPU?
We're moderately inclined to try deploying multiple 20b models, but would like an informed opinion before we start pressing buttons :)
If interested, our use of safeguard feels pretty standard, though we're embedding it in a highly domain-specific area: matrix-org/policyserv#59
Beta Was this translation helpful? Give feedback.
All reactions