Skip to content

about MLA kv cache #4156

Answered by ispobock
FL77N asked this question in Q&A
Mar 7, 2025 · 1 comments · 1 reply
Discussion options

You must be logged in to vote

In the DeepseekV2AttentionMLA implementation within sglang, is it correct that the prefill phase does not utilize KV cache, and the decode phase is when KV cache is used?

Prefill will not use KV cache. Extend and decode will use KV cache.

Additionally, when employing MLA, would it be advisable to avoid enabling the mix chunked feature?

Yes. Prefill and decode have different computation/memory access characteristics. We optimize it with different forward logic. It's better to use separate batch.

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@FL77N
Comment options

Answer selected by FL77N
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants