An Android SDK for:
- Searching and retrieving
.ggufLLaMA models from Hugging Face - Running on-device inference with streamed responses
Designed for private, efficient, mobile-friendly AI deployments.
- Add JitPack to your root
build.gradle:
allprojects {
repositories {
...
maven { url 'https://jitpack.io' }
}
}- Add the library to your
build.gradle(app module):
dependencies {
implementation 'com.github.jaliyanimanthako:test-llama:v1.0.5'
}Programmatically search Hugging Face for .gguf models.
- Query Hugging Face with filters (
author,tag,search) - Automatically fetch and parse model metadata
- Support for pagination, sorting, and tree inspection (e.g., file structure)
| File | Description |
|---|---|
HFModelSearch.java |
Main API for model search |
HFModelInfo.java |
Describes individual model metadata |
HFModelTree.java |
Fetches model file tree |
HFModels.java |
Aggregates API methods |
HFEndpoints.java |
Central endpoint constants |
ExampleModel.java |
Sample model structure |
CustomDateDeserializer.java |
Parses createdAt timestamps |
CustomDateSerializer.java |
Serializes timestamps for JSON |
HFModelSearch modelSearch = new HFModelSearch();
List<HFModelSearch.ModelSearchResult> results = modelSearch.searchModels(
"llama", // query
"TheBloke", // author
"gguf", // tag
HFModelSearch.ModelSortParam.DOWNLOADS,
HFModelSearch.ModelSearchDirection.DESCENDING,
10,
false,
false
);
for (HFModelSearch.ModelSearchResult result : results) {
Log.d("HF", result.modelId + " - " + result.description);
}Run .gguf models with real-time streamed output.
- Load
.ggufmodel from URI - Dynamically set system prompts
- Receive partial responses with
LiveData - Final callback via
LlamaListener - Auto-formatting of
<think>tags
| File | Description |
|---|---|
ModelInference.java |
Main interface for model interaction |
ModelLoader.java |
Handles loading .gguf model files |
LLaMa.java |
Core inference logic using native libraries |
GGUFReader.java |
Utilities for reading .gguf model metadata |
ModelInference model = ModelInference.getInstance(context);
model.setSystemPrompt("You're a helpful assistant.");
model.setListener(response -> Log.d("LLaMa", "Final: " + response));
model.partialResponse.observe(this, partial -> Log.d("LLaMa", "Partial: " + partial));
model.generateResponse("Where did I leave my book?");model.loadModel(modelUri,
() -> Log.d("Model", "Model loaded successfully."),
() -> Log.e("Model", "Failed to load model.")
);ai.aisee.llama
├── HF
│ ├── CustomDateDeserializer.java
│ ├── CustomDateSerializer.java
│ ├── ExampleModel.java
│ ├── HFEndpoints.java
│ ├── HFModelInfo.java
│ ├── HFModelSearch.java
│ ├── HFModelTree.java
│ └── HFModels.java
├── LLaMa
│ ├── GGUFReader.java
│ ├── LLaMa.java
│ ├── ModelInference.java
│ └── ModelLoader.java
- Use Hugging Face search to dynamically discover models
- Store downloaded
.ggufmodels locally and load viaUri - Combine streamed
LiveDataresponses with real-time UI feedback