-
Hello, I wanted to kindly ask for clarification: In the DeepseekV2AttentionMLA implementation within sglang, is it correct that the prefill phase does not utilize KV cache, and the decode phase is when KV cache is used? Additionally, when employing MLA, would it be advisable to avoid enabling the mix chunked feature? Thank you for your guidance |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Prefill will not use KV cache. Extend and decode will use KV cache.
Yes. Prefill and decode have different computation/memory access characteristics. We optimize it with different forward logic. It's better to use separate batch. |
Beta Was this translation helpful? Give feedback.
Prefill will not use KV cache. Extend and decode will use KV cache.
Yes. Prefill and decode have different computation/memory access characteristics. We optimize it with different forward logic. It's better to use separate batch.