about MLA kv cache #4156

FL77N · 2025-03-07T01:55:56Z

FL77N
Mar 7, 2025

Hello, I wanted to kindly ask for clarification: In the DeepseekV2AttentionMLA implementation within sglang, is it correct that the prefill phase does not utilize KV cache, and the decode phase is when KV cache is used? Additionally, when employing MLA, would it be advisable to avoid enabling the mix chunked feature? Thank you for your guidance

Answered by ispobock

Mar 7, 2025

In the DeepseekV2AttentionMLA implementation within sglang, is it correct that the prefill phase does not utilize KV cache, and the decode phase is when KV cache is used?

Prefill will not use KV cache. Extend and decode will use KV cache.

Additionally, when employing MLA, would it be advisable to avoid enabling the mix chunked feature?

Yes. Prefill and decode have different computation/memory access characteristics. We optimize it with different forward logic. It's better to use separate batch.

View full answer

ispobock · 2025-03-07T14:14:48Z

ispobock
Mar 7, 2025
Maintainer

In the DeepseekV2AttentionMLA implementation within sglang, is it correct that the prefill phase does not utilize KV cache, and the decode phase is when KV cache is used?

Prefill will not use KV cache. Extend and decode will use KV cache.

Additionally, when employing MLA, would it be advisable to avoid enabling the mix chunked feature?

Yes. Prefill and decode have different computation/memory access characteristics. We optimize it with different forward logic. It's better to use separate batch.

1 reply

FL77N Mar 10, 2025
Author

thank for you reply!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

about MLA kv cache #4156

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

about MLA kv cache #4156

FL77N Mar 7, 2025

Replies: 1 comment · 1 reply

ispobock Mar 7, 2025 Maintainer

FL77N Mar 10, 2025 Author

FL77N
Mar 7, 2025

Replies: 1 comment 1 reply

ispobock
Mar 7, 2025
Maintainer

FL77N Mar 10, 2025
Author