add audio optimization for qwen2.5-omni #13037
Merged
+182
−4
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
optimize qwen2.5-omni audio part, tts part is still not supported
after installing ipex-llm, then
image processor will convert
14*14
pixels to 1 vision model token, then convert 4 vision model tokens to 1 language model token.when using video input, processor will set
max_pixels
to768 * 28 * 28
automaticallyyou can set
max_pixels
to control image's token number, such as,1024*14*14
means this image will be converted to 1024 tokens during vision model, then be converted to 256 tokens during language model.for image
for video