This repository demonstrates how to run an RKLLM-based model server using Flask. It provides a simple web interface to interact with models in RKLLM format, making it easy to deploy and test your models on edge devices. Orange Pi 5 with RK3588 supports only.
- Orange PI 5 board with latest Armbian and rknpu driver 0.9.8, Python 3.12
- X86 host machine to covert RKLLM model (e.g.,
.rkllm
files)
On the board:
git clone https://github.com/labintsev/flask-rkllm.git
cd flask-rkllm
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
RKLLM mode must been converted on host X86 machine.
Here example
Then you need to copy .rkllm file to Orange Pi board. You can use S3 storage via s3.py script. Connect to Orange Pi via ssh and copy .rkllm model file to the board.
python s3.py --download your/s3/DeepSeek-R1-Distill-Qwen-1.5B_W8A8_RK3588.rkllm
To start the Flask server with a specific RKLLM model, run:
python flask_server.py --rkllm_model_path DeepSeek-R1-Distill-Qwen-1.5B_W8A8_RK3588.rkllm
The server will start and you can open web UI in browser. VsCode is recommended, because you can test web app via localhost with port forwarding.
flask_server.py
— Main Flask server scriptrkllm_chat.py
- RKLLM utils to interact with models3.py
- script to download .rkllm models from s3*.rkllm
— Example model filestemplates/
— Jinja2 templates for Flaskstatic/
— Frontend assets (HTML, CSS, JS)
Contributions, suggestions, and bug reports are welcome! Please open an issue or submit a pull request.
Apache 2.0.
See the LICENSE file for more details.