Support batch size > 1 #80

xwang365 · 2024-02-19T02:26:53Z

Support BatchSize > 1

This PR suppose to support batch size > 1 for Medusa inference model.
This is only a draft for now and need further improvement.

Main change

Update update_inference_inputs tree_decoding generate_candidates to support bs>1

using [PAD] to fill the inputs, such as:

prompt:
          A B C [PAD] [PAD]
          D E F   H     I

plan to squeeze the [PAD] in the middle of inputs when add new tokens to the sequence, such as:

prompt:                                 new_tokens:
          A B C [PAD] [PAD]               X         Y
          D E F   H     I                 Z      [PAD]

new sequence:

          A B C X Y [PAD]    
          D E F H I   Z

Test:

 python -m medusa.inference.inference_test --model 'FasterDecoding/medusa-vicuna-7b-v1.3'

This PR is marked as draft as there is more work required to get this into a mergable state.

clean

update bs>1 and add server

support batch_size>1

8ed81a4

clean

xwang365 force-pushed the bs_develop branch from 684eb87 to 8ed81a4 Compare February 19, 2024 04:10

bs>1 update and server (#2)

8c34d2a

update bs>1 and add server

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support batch size > 1 #80

Support batch size > 1 #80

Uh oh!

xwang365 commented Feb 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Support batch size > 1 #80

Are you sure you want to change the base?

Support batch size > 1 #80

Uh oh!

Conversation

xwang365 commented Feb 19, 2024