Add GGUF audio bridge support#2
Conversation
Expose audio-capable helpers on top of the existing VLM/media path and extend the CLI harness for audio generation coverage. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR extends the existing VLM/media pathway to support audio inputs by adding C API helpers for audio encode/generation, exposing audio capability metadata, and expanding llama-test-cli to exercise the new audio path.
Changes:
- Add
ggml_engine_audiotype plus new VLM audio helpers (ggml_engine_vlm_generate_audio,ggml_engine_vlm_encode_audio) and a bitrate query (ggml_engine_vlm_audio_bitrate). - Implement audio bridging in the VLM implementation by reusing the existing “file bytes → mtmd bitmap” media decoder.
- Extend
llama-test-cliwith optional--audiotests for audio encode + generation.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| engine/llama-test-cli.cpp | Adds CLI arg parsing for --audio and introduces audio encode/generation tests for VLM. |
| engine/ggml-engine.h | Extends the public C API with audio structs and new VLM audio-related functions. |
| engine/ggml-engine-vlm.cpp | Implements the new VLM audio helper functions and audio bitrate query. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 65d3072e4e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Validate audio buffers before encoding/generation, count audio tokens correctly, harden the CLI audio test, and align the public media comment with actual file-mode behavior. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Summary
llama-test-cliwith audio encode/generation coveragegguf_liband ToolNeuron GGUF audio support