Skip to content

Conversation

@cctry
Copy link
Collaborator

@cctry cctry commented Jan 28, 2026

Motivation

The chunked prefill requests will free its slot in req_to_token_pool and get allocated again when preparing for its next prefill batch.

As a result, if a prefill batch contains multiple requests and req_to_token_pool is at capacity. The write for matched kv indices for another request will overwrite the slot of the chunked requests which is being read in forward stream

Example

Prepare & Launch prefill batch N:     
    req A (first half) --> idx 1  
    
model runner reads idx 1
  
Prepare batch N+1: 
    req A (second half) --> idx 2
    req B --> idx 1

scheduler writes req B's matched indices to idx 1

Modifications

  1. alloc(reqs: list[Req]) - Now takes request list, sets req.req_pool_idx directly, reuses slot if already set. cc @hnyls2002
  2. Separate free() with free_mamba_cache(req, ...) in HybridReqToTokenPool - Only frees mamba state, not req slot cc @hanming-lu @yizhang2077
  3. release_kv_cache() - Now calls free(req) at end; handles early mamba-only free case
  4. Removed free() in process_prefill_chunk and cache_finished_req

Accuracy Tests

Benchmarking and Profiling

Checklist

@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@cctry
Copy link
Collaborator Author

cctry commented Jan 28, 2026

/tag-run-ci-label

@Henrry-CHEN
Copy link

so if a prefill batch contain 2 or more request or request chunk, the accuracy of the mamba state for these req is not right?

@cctry
Copy link
Collaborator Author

cctry commented Jan 28, 2026 via email

@cctry cctry force-pushed the csy/fix_req_to_pool branch from 60ab814 to 30b9b41 Compare January 28, 2026 21:31
@merrymercy merrymercy merged commit 027f314 into main Feb 2, 2026
194 of 214 checks passed
@merrymercy merrymercy deleted the csy/fix_req_to_pool branch February 2, 2026 22:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants