Migrating from llama.cpp to ORT #8
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This branch exists to show that you can run a VLM using ONNX Runtime on Android using Rust. I leveraged pyke-ort for my ONNX Runtime bindings and heavily relied on Claude Code for the generation loop. The primary bottleneck for this project is by far the image encoder. It takes up to a minute to encode the image into tokens, and this needs to be optimized further for practical use. This is also a bottleneck when using llama.cpp, which is why I leveraged Vulkan on the Pixel 9.
The Pixel 9 is a previous generation phone, but it is in no means a "low quality device", and if it can't run well on the Pixel 9, it probably can't run well on Android.