You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was trying to modify TRT LLM code to make a specific thing for my model, that’s is not implemented, and was quite confused with amount of buffers that’s existed can you please explain the structure of them?
Runtime buffers are using for storing values for TRT engine inputs and outputs
Decoder buffers, if I get right decoder means not actual decoder, rather than sampler that sample discrete tokens from models logins and doing all beam search logic which is needed
Decoder inputs and outputs - still does not understand why do we need them, I we already have decoder buffers
Encoder buffers - why do we need them as well if encoder is running in separate model instance which has their own runtime buffers
req slots - what is that?
why executor instance have multiple buffers which splits into some “mini batches”?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
I was trying to modify TRT LLM code to make a specific thing for my model, that’s is not implemented, and was quite confused with amount of buffers that’s existed can you please explain the structure of them?
Beta Was this translation helpful? Give feedback.
All reactions