TinyML & Edge AI: On-device inference, model quantization, embedded ML, ultra-low-power AI for microcontrollers and IoT devices.
-
Updated
Nov 10, 2025 - Python
TinyML & Edge AI: On-device inference, model quantization, embedded ML, ultra-low-power AI for microcontrollers and IoT devices.
iOS + Android app that runs local LLMs on-device + routstr cloud LLMs for anonymous inference
Flutter starter example app to get started with NobodyWho, a library designed to run LLMs locally and efficiently on any device.
Mobile AI: iOS CoreML, Android TFLite, on-device inference, ONNX, TensorRT, and ML deployment for smartphones.
Production Android AI with ExecuTorch 1.0 - Deploy PyTorch models to mobile with NPU acceleration and 50KB footprint
Custom llama.cpp fork with character intelligence engine: control vectors, attention bias, head rescaling, attention temperature, fast weight memory
High-performance Android SDK for on-device LLM inference (GGUF). Privacy-focused, offline-first, and powered by llama.cpp with a clean Kotlin Coroutines API.
Neural acoustic echo cancellation for Apple platforms using CoreML — Swift package with 128/256/512-unit DTLN-aec models
Real-time SAM2 segmentation on edge devices - 40x faster C++ inference with ONNX Runtime for iOS/Android deployment
Ad generation via offline LLMs with on-device inference, optionally managed by a self-hosted CMS.
The Private Agent OS — search files, run AI agents, connect to 10,000+ tools via the complete protocol stack (MCP, AG-UI, A2UI, A2A). Zero cloud. Zero telemetry. On-device inference.
Swift wrapper for Apple's BNNS graph API — run compiled CoreML models (.mlmodelc) on CPU with zero-copy buffer management
Run small LLMs directly on your device, no cloud computing needed.
React Native SDK for local LLM inference and on-device AI on iOS and Android.
Open source Node.js runtime for local LLM inference, on-device AI, and private model execution.
Web JavaScript SDK for local LLM inference with WebGPU and on-device AI.
WebGPU runtime core for local LLM inference, on-device AI, and client-side model execution.
On-device inference engine for Apple silicon
Offline plant disease diagnosis system powered by MobileNetV3-Large and TensorFlow Lite — 38 disease classes, 14 crop species, ~5.58ms inference on-device. Built with Flutter & Python.
Deep technical writing on edge AI, on-device inference, llama.cpp, GGML, and mobile AI engineering
Add a description, image, and links to the on-device-inference topic page so that developers can more easily learn about it.
To associate your repository with the on-device-inference topic, visit your repo's landing page and select "manage topics."