Skip to content

Add GGUF audio bridge support#2

Open
Godzilla675 wants to merge 2 commits intoSiddhesh2377:re-writefrom
Godzilla675:feature/gguf-audio-support
Open

Add GGUF audio bridge support#2
Godzilla675 wants to merge 2 commits intoSiddhesh2377:re-writefrom
Godzilla675:feature/gguf-audio-support

Conversation

@Godzilla675
Copy link

Summary

  • expose audio-capable helpers on the existing VLM/media path
  • extend llama-test-cli with audio encode/generation coverage
  • unblock downstream gguf_lib and ToolNeuron GGUF audio support

Expose audio-capable helpers on top of the existing VLM/media path and extend the CLI harness for audio generation coverage.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the existing VLM/media pathway to support audio inputs by adding C API helpers for audio encode/generation, exposing audio capability metadata, and expanding llama-test-cli to exercise the new audio path.

Changes:

  • Add ggml_engine_audio type plus new VLM audio helpers (ggml_engine_vlm_generate_audio, ggml_engine_vlm_encode_audio) and a bitrate query (ggml_engine_vlm_audio_bitrate).
  • Implement audio bridging in the VLM implementation by reusing the existing “file bytes → mtmd bitmap” media decoder.
  • Extend llama-test-cli with optional --audio tests for audio encode + generation.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
engine/llama-test-cli.cpp Adds CLI arg parsing for --audio and introduces audio encode/generation tests for VLM.
engine/ggml-engine.h Extends the public C API with audio structs and new VLM audio-related functions.
engine/ggml-engine-vlm.cpp Implements the new VLM audio helper functions and audio bitrate query.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 65d3072e4e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Validate audio buffers before encoding/generation, count audio tokens correctly, harden the CLI audio test, and align the public media comment with actual file-mode behavior.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants