forked from mudler/LocalAI
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathindex.yaml
More file actions
14652 lines (13842 loc) · 910 KB
/
index.yaml
File metadata and controls
14652 lines (13842 loc) · 910 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
- name: "gpt-oss-20b-vietmind"
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
urls:
- https://huggingface.co/mradermacher/gpt-oss-20b-VietMind-GGUF
description: |
The **mradermacher/gpt-oss-20b-VietMind-GGUF** model is a large language model (20 billion parameters) optimized for text generation. It is based on the original **GPT-OSU** architecture, with quantized versions (Q2_K and Q4_K_S) for efficient inference. The model is available as a GGUF file and supports fast, low-precision quantization for deployment. It is designed for tasks like text generation, translation, and reasoning, with the original model being the base reference. Quantized versions are provided for flexibility, but the core model remains the original GPT-OSU.
overrides:
parameters:
model: llama-cpp/models/gpt-oss-20b-VietMind.Q4_K_M.gguf
name: gpt-oss-20b-VietMind-GGUF
backend: llama-cpp
template:
use_tokenizer_template: true
known_usecases:
- chat
function:
grammar:
disable: true
description: Imported from https://huggingface.co/mradermacher/gpt-oss-20b-VietMind-GGUF
options:
- use_jinja:true
files:
- filename: llama-cpp/models/gpt-oss-20b-VietMind.Q4_K_M.gguf
sha256: 2bc6992548afe5fe05600cb6455aec4bc8ef28c350c545ab7f3f160db9f0276b
uri: https://huggingface.co/mradermacher/gpt-oss-20b-VietMind-GGUF/resolve/main/gpt-oss-20b-VietMind.Q4_K_M.gguf
- name: "rwkv7-g1c-13.3b"
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
urls:
- https://huggingface.co/NaomiBTW/rwkv7-g1c-13.3b-gguf
description: |
The model is **RWKV7 g1c 13B**, a large language model optimized for efficiency. It is quantized using **Bartowski's calibrationv5 for imatrix** to reduce memory usage while maintaining performance. The base model is **BlinkDL/rwkv7-g1**, and this version is tailored for text-generation tasks. It balances accuracy and efficiency, making it suitable for deployment in various applications.
overrides:
parameters:
model: llama-cpp/models/rwkv7-g1c-13.3b-20251231-Q8_0.gguf
name: rwkv7-g1c-13.3b-gguf
backend: llama-cpp
template:
use_tokenizer_template: true
known_usecases:
- chat
function:
grammar:
disable: true
description: Imported from https://huggingface.co/NaomiBTW/rwkv7-g1c-13.3b-gguf
options:
- use_jinja:true
files:
- filename: llama-cpp/models/rwkv7-g1c-13.3b-20251231-Q8_0.gguf
sha256: e06b3b31cee207723be00425cfc25ae09b7fa1abbd7d97eda4e62a7ef254f877
uri: https://huggingface.co/NaomiBTW/rwkv7-g1c-13.3b-gguf/resolve/main/rwkv7-g1c-13.3b-20251231-Q8_0.gguf
- name: "iquest-coder-v1-40b-instruct-i1"
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
urls:
- https://huggingface.co/mradermacher/IQuest-Coder-V1-40B-Instruct-i1-GGUF
description: |
The **IQuest-Coder-V1-40B-Instruct-i1-GGUF** is a quantized version of the original **IQuestLab/IQuest-Coder-V1-40B-Instruct** model, designed for efficient deployment. It is an **instruction-following large language model** with 40 billion parameters, optimized for tasks like code generation and reasoning.
**Key Features:**
- **Size:** 40B parameters (quantized for efficiency).
- **Purpose:** Instruction-based coding and reasoning.
- **Format:** GGUF (supports multi-part files).
- **Quantization:** Uses advanced techniques (e.g., IQ3_M, Q4_K_M) for balance between performance and quality.
**Available Quantizations:**
- Optimized for speed and size: **i1-Q4_K_M** (recommended).
- Lower-quality options for trade-off between size/quality.
**Note:** This is a **quantized version** of the original model, but the base model (IQuestLab/IQuest-Coder-V1-40B-Instruct) is the official source. For full functionality, use the unquantized version or verify compatibility with your deployment tools.
overrides:
parameters:
model: llama-cpp/models/IQuest-Coder-V1-40B-Instruct.i1-Q4_K_M.gguf
name: IQuest-Coder-V1-40B-Instruct-i1-GGUF
backend: llama-cpp
template:
use_tokenizer_template: true
known_usecases:
- chat
function:
grammar:
disable: true
description: Imported from https://huggingface.co/mradermacher/IQuest-Coder-V1-40B-Instruct-i1-GGUF
options:
- use_jinja:true
files:
- filename: llama-cpp/models/IQuest-Coder-V1-40B-Instruct.i1-Q4_K_M.gguf
sha256: 0090b84ea8e5a862352cbb44498bd6b4cd38564834182813c35ed84209050b51
uri: https://huggingface.co/mradermacher/IQuest-Coder-V1-40B-Instruct-i1-GGUF/resolve/main/IQuest-Coder-V1-40B-Instruct.i1-Q4_K_M.gguf
- name: "onerec-8b"
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
urls:
- https://huggingface.co/mradermacher/OneRec-8B-GGUF
description: |
The model `mradermacher/OneRec-8B-GGUF` is a quantized version of the base model `OpenOneRec/OneRec-8B`, a large language model designed for tasks like recommendations or content generation. It is optimized for efficiency with various quantization schemes (e.g., Q2_K, Q4_K, Q8_0) and available in multiple sizes (3.5–9.0 GB). The model uses the GGUF format and is licensed under Apache-2.0. Key features include:
- **Base Model**: `OpenOneRec/OneRec-8B` (a pre-trained language model for recommendations).
- **Quantization**: Supports multiple quantized variants (Q2_K, Q3_K, Q4_K, etc.), with the best quality for `Q4_K_S` and `Q8_0`.
- **Sizes**: Available in sizes ranging from 3.5 GB (Q2_K) to 9.0 GB (Q8_0), with faster speeds for lower-bit quantized versions.
- **Usage**: Compatible with GGUF files, suitable for deployment in applications requiring efficient model inference.
- **Licence**: Apache-2.0, available at [https://huggingface.co/OpenOneRec/OneRec-8B/blob/main/LICENSE](https://huggingface.co/OpenOneRec/OneRec-8B/blob/main/LICENSE).
For detailed specifications, refer to the [model page](https://hf.tst.eu/model#OneRec-8B-GGUF).
overrides:
parameters:
model: llama-cpp/models/OneRec-8B.Q4_K_M.gguf
name: OneRec-8B-GGUF
backend: llama-cpp
template:
use_tokenizer_template: true
known_usecases:
- chat
function:
grammar:
disable: true
description: Imported from https://huggingface.co/mradermacher/OneRec-8B-GGUF
options:
- use_jinja:true
files:
- filename: llama-cpp/models/OneRec-8B.Q4_K_M.gguf
sha256: f19217971ee5a7a909c9217a79d09fb573380f5018e25dcb32693139e59b434f
uri: https://huggingface.co/mradermacher/OneRec-8B-GGUF/resolve/main/OneRec-8B.Q4_K_M.gguf
- name: "minimax-m2.1-i1"
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
urls:
- https://huggingface.co/mradermacher/MiniMax-M2.1-i1-GGUF
description: |
The model **MiniMax-M2.1** (base model: *MiniMaxAI/MiniMax-M2.1*) is a large language model quantized for efficient deployment. It is optimized for speed and memory usage, with quantized versions available in various formats (e.g., GGUF) for different performance trade-offs. The quantization is done by the user, and the model is licensed under the *modified-mit* license.
Key features:
- **Quantized versions**: Includes low-precision (IQ1, IQ2, Q2_K, etc.) and high-precision (Q4_K_M, Q6_K) options.
- **Usage**: Requires GGUF files; see [TheBloke's documentation](https://huggingface.co/TheBloke/KafkaLM-70B-German-V0.1-GGUF) for details on integration.
- **License**: Modified MIT (see [license link](https://github.com/MiniMax-AI/MiniMax-M2.1/blob/main/LICENSE)).
For gallery use, emphasize its quantized variants, performance trade-offs, and licensing.
overrides:
parameters:
model: llama-cpp/models/MiniMax-M2.1.i1-Q4_K_M.gguf
name: MiniMax-M2.1-i1-GGUF
backend: llama-cpp
template:
use_tokenizer_template: true
known_usecases:
- chat
function:
grammar:
disable: true
description: Imported from https://huggingface.co/mradermacher/MiniMax-M2.1-i1-GGUF
options:
- use_jinja:true
files:
- filename: llama-cpp/models/MiniMax-M2.1.i1-Q4_K_M.gguf
sha256: dba387e17ddd9b4559fb6f14459fcece7f00c66bbe4062d7ceea7fb9568e3282
uri: https://huggingface.co/mradermacher/MiniMax-M2.1-i1-GGUF/resolve/main/MiniMax-M2.1.i1-Q4_K_M.gguf
- name: "tildeopen-30b-instruct-lv-i1"
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
urls:
- https://huggingface.co/mradermacher/TildeOpen-30B-Instruct-LV-i1-GGUF
description: |
The **TildeOpen-30B-Instruct-LV-i1-GGUF** is a quantized version of the base model **pazars/TildeOpen-30B-Instruct-LV**, optimized for deployment. It is an instruct-based language model trained on diverse datasets, supporting multiple languages (en, de, fr, pl, ru, it, pt, cs, nl, es, fi, tr, hu, bg, uk, bs, hr, da, et, lt, ro, sk, sl, sv, no, lv, sr, sq, mk, is, mt, ga). Licensed under CC-BY-4.0, it uses the Transformers library and is designed for efficient inference. The quantized version (with imatrix format) is tailored for deployment on devices with limited resources, while the base model remains the original, high-quality version.
overrides:
parameters:
model: llama-cpp/models/TildeOpen-30B-Instruct-LV.i1-Q4_K_M.gguf
name: TildeOpen-30B-Instruct-LV-i1-GGUF
backend: llama-cpp
template:
use_tokenizer_template: true
known_usecases:
- chat
function:
grammar:
disable: true
description: Imported from https://huggingface.co/mradermacher/TildeOpen-30B-Instruct-LV-i1-GGUF
options:
- use_jinja:true
files:
- filename: llama-cpp/models/TildeOpen-30B-Instruct-LV.i1-Q4_K_M.gguf
sha256: 48ed550e9ce7278ac456a43634c2a5804ba273522021434dfa0aa85dda3167b3
uri: https://huggingface.co/mradermacher/TildeOpen-30B-Instruct-LV-i1-GGUF/resolve/main/TildeOpen-30B-Instruct-LV.i1-Q4_K_M.gguf
- name: "allenai_olmo-3.1-32b-think"
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
urls:
- https://huggingface.co/bartowski/allenai_Olmo-3.1-32B-Think-GGUF
description: |
The **Olmo-3.1-32B-Think** model is a large language model (LLM) optimized for efficient inference using quantized versions. It is a quantized version of the original **allenai/Olmo-3.1-32B-Think** model, developed by **bartowski** using the **imatrix** quantization method.
### Key Features:
- **Base Model**: `allenai/Olmo-3.1-32B-Think` (unquantized version).
- **Quantized Versions**: Available in multiple formats (e.g., `Q6_K_L`, `Q4_1`, `bf16`) with varying precision (e.g., Q8_0, Q6_K_L, Q5_K_M). These are derived from the original model using the **imatrix calibration dataset**.
- **Performance**: Optimized for low-memory usage and efficient inference on GPUs/CPUs. Recommended quantization types include `Q6_K_L` (near-perfect quality) or `Q4_K_M` (default, balanced performance).
- **Downloads**: Available via Hugging Face CLI. Split into multiple files if needed for large models.
- **License**: Apache-2.0.
### Recommended Quantization:
- Use `Q6_K_L` for highest quality (near-perfect performance).
- Use `Q4_K_M` for balanced performance and size.
- Avoid lower-quality options (e.g., `Q3_K_S`) unless specific hardware constraints apply.
This model is ideal for deploying on GPUs/CPUs with limited memory, leveraging efficient quantization for practical use cases.
overrides:
parameters:
model: llama-cpp/models/allenai_Olmo-3.1-32B-Think-Q4_K_M.gguf
name: allenai_Olmo-3.1-32B-Think-GGUF
backend: llama-cpp
template:
use_tokenizer_template: true
known_usecases:
- chat
function:
grammar:
disable: true
description: Imported from https://huggingface.co/bartowski/allenai_Olmo-3.1-32B-Think-GGUF
options:
- use_jinja:true
files:
- filename: llama-cpp/models/allenai_Olmo-3.1-32B-Think-Q4_K_M.gguf
sha256: 09ca87494efb75f6658a0c047414cccc5fb29d26a49c650a90af7c8f0412fdac
uri: https://huggingface.co/bartowski/allenai_Olmo-3.1-32B-Think-GGUF/resolve/main/allenai_Olmo-3.1-32B-Think-Q4_K_M.gguf
- name: "huihui-glm-4.6v-flash-abliterated"
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
urls:
- https://huggingface.co/huihui-ai/Huihui-GLM-4.6V-Flash-abliterated-GGUF
description: |
**Huihui-GLM-4.6V-Flash (Abliterated)**
A text-based large language model derived from the **zai-org/GLM-4.6V-Flash** base model, featuring reduced safety filters and uncensored capabilities. Designed for text generation, it supports conversational tasks but excludes image processing.
**Key Features:**
- **Base Model**: GLM-4.6V-Flash (original author: zai-org)
- **Quantized Format**: GGUF (optimized for efficiency).
- **No Image Support**: Only text-based interactions are enabled.
- **Custom Training**: Abliterated to remove restrictive outputs, prioritizing openness over safety.
**Important Notes:**
- **Risk of Sensitive Content**: Reduced filtering may generate inappropriate or controversial outputs.
- **Ethical Use**: Suitable for research or controlled environments; not recommended for public or commercial deployment without caution.
- **Legal Responsibility**: Users must ensure compliance with local laws and ethical guidelines.
**Use Cases:**
- Experimental text generation.
- Controlled research environments.
- Testing safety filtering mechanisms.
*Note: This model is not suitable for production or public-facing applications without thorough review.*
tags:
- llm
- gguf
- glm
- text-to-text
- instruction-tuned
overrides:
parameters:
model: llama-cpp/models/ggml-model-Q4_K_M.gguf
name: Huihui-GLM-4.6V-Flash-abliterated-GGUF
backend: llama-cpp
template:
use_tokenizer_template: true
known_usecases:
- chat
function:
grammar:
disable: true
mmproj: llama-cpp/mmproj/mmproj-model-f16.gguf
description: Imported from https://huggingface.co/huihui-ai/Huihui-GLM-4.6V-Flash-abliterated-GGUF
options:
- use_jinja:true
files:
- filename: llama-cpp/models/ggml-model-Q4_K_M.gguf
sha256: 14145c3c95a21c7251362ac80d9bde72a3c6e129ca834ac3c57efe2277409699
uri: https://huggingface.co/huihui-ai/Huihui-GLM-4.6V-Flash-abliterated-GGUF/resolve/main/ggml-model-Q4_K_M.gguf
- filename: llama-cpp/mmproj/mmproj-model-f16.gguf
sha256: 1044beaf5cb799d309b1252ac149a985b69f1cf0391f7c8c54e7aed267bc98a9
uri: https://huggingface.co/huihui-ai/Huihui-GLM-4.6V-Flash-abliterated-GGUF/resolve/main/mmproj-model-f16.gguf
- name: "qwen3-coder-30b-a3b-instruct-rtpurbo-i1"
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
urls:
- https://huggingface.co/mradermacher/Qwen3-Coder-30B-A3B-Instruct-RTPurbo-i1-GGUF
description: |
The model in question is a quantized version of the original **Qwen3-Coder** large language model, specifically tailored for code generation. The base model, **RTP-LLM/Qwen3-Coder-30B-A3B-Instruct-RTPurbo**, is a 30B-parameter variant optimized for instruction-following and code-related tasks. It employs the **A3B attention mechanism** and is trained on diverse data to excel in programming and logical reasoning. The current repository provides a quantized (compressed) version of this model, which is suitable for deployment on hardware with limited memory but loses some precision compared to the original. For a high-fidelity version, the unquantized base model is recommended.
tags:
- llm
- code
- instruction-tuned
- text-to-text
- gguf
- qwen3
overrides:
parameters:
model: llama-cpp/models/Qwen3-Coder-30B-A3B-Instruct-RTPurbo.i1-Q4_K_M.gguf
name: Qwen3-Coder-30B-A3B-Instruct-RTPurbo-i1-GGUF
backend: llama-cpp
template:
use_tokenizer_template: true
known_usecases:
- chat
function:
grammar:
disable: true
description: Imported from https://huggingface.co/mradermacher/Qwen3-Coder-30B-A3B-Instruct-RTPurbo-i1-GGUF
options:
- use_jinja:true
files:
- filename: llama-cpp/models/Qwen3-Coder-30B-A3B-Instruct-RTPurbo.i1-Q4_K_M.gguf
sha256: a25f1817a557da703ab685e6b98550cd7ed87e4a74573b5057e6e2f26b21140e
uri: https://huggingface.co/mradermacher/Qwen3-Coder-30B-A3B-Instruct-RTPurbo-i1-GGUF/resolve/main/Qwen3-Coder-30B-A3B-Instruct-RTPurbo.i1-Q4_K_M.gguf
- name: "glm-4.5v-i1"
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
urls:
- https://huggingface.co/mradermacher/GLM-4.5V-i1-GGUF
description: |
The model in question is a **quantized version** of the **GLM-4.5V** large language model, originally developed by **zai-org**. This repository provides multiple quantized variants of the model, optimized for different trade-offs between size, speed, and quality. The base model, **GLM-4.5V**, is a multilingual (Chinese/English) large language model, and this quantized version is designed for efficient inference on hardware with limited memory.
Key features include:
- **Quantization options**: IQ2_M, Q2_K, Q4_K_M, IQ3_M, IQ4_XS, etc., with sizes ranging from 43 GB to 96 GB.
- **Performance**: Optimized for inference, with some variants (e.g., Q4_K_M) balancing speed and quality.
- **Vision support**: The model is a vision model, with mmproj files available in the static repository.
- **License**: MIT-licensed.
This quantized version is ideal for applications requiring compact, efficient models while retaining most of the original capabilities of the base GLM-4.5V.
license: "mit"
tags:
- llm
- gguf
- multimodal
- vision
- image-to-text
- text-to-text
- glm
overrides:
parameters:
model: llama-cpp/models/GLM-4.5V.i1-Q4_K_M.gguf
name: GLM-4.5V-i1-GGUF
backend: llama-cpp
template:
use_tokenizer_template: true
known_usecases:
- chat
function:
grammar:
disable: true
description: Imported from https://huggingface.co/mradermacher/GLM-4.5V-i1-GGUF
options:
- use_jinja:true
files:
- filename: llama-cpp/models/GLM-4.5V.i1-Q4_K_M.gguf
sha256: 0d5786b78b73997f46c11ba2cc11d0f5a36644db0c248caa82fad3fb6f30be1a
uri: https://huggingface.co/mradermacher/GLM-4.5V-i1-GGUF/resolve/main/GLM-4.5V.i1-Q4_K_M.gguf
- &vibevoice
url: "github:mudler/LocalAI/gallery/vibevoice.yaml@master"
icon: https://github.com/microsoft/VibeVoice/raw/main/Figures/VibeVoice_logo_white.png
license: mit
tags:
- text-to-speech
- TTS
name: "vibevoice"
urls:
- https://github.com/microsoft/VibeVoice
# Download voice preset files
# Voice presets are downloaded to: {models_dir}/voices/streaming_model/
# The voices_dir option above tells the backend to look in this location
files:
# English voices
- filename: voices/streaming_model/en-Frank_man.pt
uri: https://raw.githubusercontent.com/microsoft/VibeVoice/main/demo/voices/streaming_model/en-Frank_man.pt
sha256: acaa8f1a4f46a79f8f5660cfb7a3af06ef473389319df7debc07376fdc840e47
- filename: voices/streaming_model/en-Grace_woman.pt
uri: https://raw.githubusercontent.com/microsoft/VibeVoice/main/demo/voices/streaming_model/en-Grace_woman.pt
sha256: 5f0ef02a3f3cace04cf721608b65273879466bb15fe4044e46ec6842190f6bb1
- filename: voices/streaming_model/en-Mike_man.pt
uri: https://raw.githubusercontent.com/microsoft/VibeVoice/main/demo/voices/streaming_model/en-Mike_man.pt
sha256: afb64b580fbc6fab09af04572bbbd2b3906ff8ed35a28731a90b8681e47bdc89
- filename: voices/streaming_model/en-Emma_woman.pt
uri: https://raw.githubusercontent.com/microsoft/VibeVoice/main/demo/voices/streaming_model/en-Emma_woman.pt
sha256: 75b15c481e0d848991f1789620aa9929c583ec2c5f701f8152362cf74498bbf8
- filename: voices/streaming_model/en-Carter_man.pt
uri: https://raw.githubusercontent.com/microsoft/VibeVoice/main/demo/voices/streaming_model/en-Carter_man.pt
sha256: a7bfdf1cd4939c22469bcfc6f427ae9c4467b3df46c2c14303a39c294cfc6897
- filename: voices/streaming_model/en-Davis_man.pt
uri: https://raw.githubusercontent.com/microsoft/VibeVoice/main/demo/voices/streaming_model/en-Davis_man.pt
sha256: 67561d63bfa2153616e4c02fd967007c182593fc53738a6ad94bf5f84e8832ac
- &qwen3vl
url: "github:mudler/LocalAI/gallery/qwen3.yaml@master"
icon: https://cdn-avatars.huggingface.co/v1/production/uploads/620760a26e3b7210c2ff1943/-s1gyJfvbE1RgO5iBeNOi.png
license: apache-2.0
tags:
- llm
- gguf
- gpu
- image-to-text
- multimodal
- cpu
- qwen
- qwen3
- thinking
- reasoning
name: "qwen3-vl-30b-a3b-instruct"
urls:
- https://huggingface.co/unsloth/Qwen3-VL-30B-A3B-Instruct-GGUF
description: |
Meet Qwen3-VL — the most powerful vision-language model in the Qwen series to date.
This generation delivers comprehensive upgrades across the board: superior text understanding & generation, deeper visual perception & reasoning, extended context length, enhanced spatial and video dynamics comprehension, and stronger agent interaction capabilities.
Available in Dense and MoE architectures that scale from edge to cloud, with Instruct and reasoning‑enhanced Thinking editions for flexible, on-demand deployment.
#### Key Enhancements:
* **Visual Agent**: Operates PC/mobile GUIs—recognizes elements, understands functions, invokes tools, completes tasks.
* **Visual Coding Boost**: Generates Draw.io/HTML/CSS/JS from images/videos.
* **Advanced Spatial Perception**: Judges object positions, viewpoints, and occlusions; provides stronger 2D grounding and enables 3D grounding for spatial reasoning and embodied AI.
* **Long Context & Video Understanding**: Native 256K context, expandable to 1M; handles books and hours-long video with full recall and second-level indexing.
* **Enhanced Multimodal Reasoning**: Excels in STEM/Math—causal analysis and logical, evidence-based answers.
* **Upgraded Visual Recognition**: Broader, higher-quality pretraining is able to “recognize everything”—celebrities, anime, products, landmarks, flora/fauna, etc.
* **Expanded OCR**: Supports 32 languages (up from 19); robust in low light, blur, and tilt; better with rare/ancient characters and jargon; improved long-document structure parsing.
* **Text Understanding on par with pure LLMs**: Seamless text–vision fusion for lossless, unified comprehension.
#### Model Architecture Updates:
1. **Interleaved-MRoPE**: Full‑frequency allocation over time, width, and height via robust positional embeddings, enhancing long‑horizon video reasoning.
2. **DeepStack**: Fuses multi‑level ViT features to capture fine-grained details and sharpen image–text alignment.
3. **Text–Timestamp Alignment:** Moves beyond T‑RoPE to precise, timestamp‑grounded event localization for stronger video temporal modeling.
This is the weight repository for Qwen3-VL-30B-A3B-Instruct.
overrides:
mmproj: mmproj/mmproj-F16.gguf
parameters:
model: Qwen3-VL-30B-A3B-Instruct-Q4_K_M.gguf
files:
- filename: Qwen3-VL-30B-A3B-Instruct-Q4_K_M.gguf
uri: huggingface://unsloth/Qwen3-VL-30B-A3B-Instruct-GGUF/Qwen3-VL-30B-A3B-Instruct-Q4_K_M.gguf
sha256: 7ea0a652b4bda1c1911a93a79a7cd98b92011dfea078e87328285294b2b4ab44
- filename: mmproj/mmproj-F16.gguf
sha256: 9f248089357599a08a23af40cb5ce0030de14a2e119b7ef57f66cb339bd20819
uri: huggingface://unsloth/Qwen3-VL-30B-A3B-Instruct-GGUF/mmproj-F16.gguf
- !!merge <<: *qwen3vl
name: "qwen3-vl-30b-a3b-thinking"
urls:
- https://huggingface.co/unsloth/Qwen3-VL-30B-A3B-Thinking-GGUF
description: |
Qwen3-VL-30B-A3B-Thinking is a 30B parameter model that is thinking.
overrides:
mmproj: mmproj/mmproj-F16.gguf
parameters:
model: Qwen3-VL-30B-A3B-Thinking-Q4_K_M.gguf
files:
- filename: Qwen3-VL-30B-A3B-Thinking-Q4_K_M.gguf
uri: huggingface://unsloth/Qwen3-VL-30B-A3B-Thinking-GGUF/Qwen3-VL-30B-A3B-Thinking-Q4_K_M.gguf
sha256: b5622d28d2deb398558841fb29060f0ad241bd30f6afe79ed3fcf78d5fbf887b
- filename: mmproj/mmproj-F16.gguf
uri: huggingface://unsloth/Qwen3-VL-30B-A3B-Thinking-GGUF/mmproj-F16.gguf
sha256: 7c5d39a9dc4645fc49a39a1c5a96157825af4d1c6e0961bed5d667a65b4b9572
- !!merge <<: *qwen3vl
name: "qwen3-vl-4b-instruct"
urls:
- https://huggingface.co/unsloth/Qwen3-VL-4B-Instruct-GGUF
description: |
Qwen3-VL-4B-Instruct is the 4B parameter model of the Qwen3-VL series.
overrides:
mmproj: mmproj/mmproj-Qwen3-VL-4B-Instruct-F16.gguf
parameters:
model: Qwen3-VL-4B-Instruct-Q4_K_M.gguf
files:
- filename: Qwen3-VL-4B-Instruct-Q4_K_M.gguf
sha256: d4dcd426bfba75752a312b266b80fec8136fbaca13c62d93b7ac41fa67f0492b
uri: huggingface://unsloth/Qwen3-VL-4B-Instruct-GGUF/Qwen3-VL-4B-Instruct-Q4_K_M.gguf
- filename: mmproj/mmproj-Qwen3-VL-4B-Instruct-F16.gguf
sha256: 1b9f4e92f0fbda14d7d7b58baed86039b8a980fe503d9d6a9393f25c0028f1fc
uri: huggingface://unsloth/Qwen3-VL-4B-Instruct-GGUF/mmproj-F16.gguf
- !!merge <<: *qwen3vl
name: "qwen3-vl-32b-instruct"
urls:
- https://huggingface.co/unsloth/Qwen3-VL-32B-Instruct-GGUF
description: |
Qwen3-VL-32B-Instruct is the 32B parameter model of the Qwen3-VL series.
overrides:
mmproj: mmproj/mmproj-Qwen3-VL-32B-Instruct-F16.gguf
parameters:
model: Qwen3-VL-32B-Instruct-Q4_K_M.gguf
files:
- filename: Qwen3-VL-32B-Instruct-Q4_K_M.gguf
uri: huggingface://unsloth/Qwen3-VL-32B-Instruct-GGUF/Qwen3-VL-32B-Instruct-Q4_K_M.gguf
sha256: 92d605566f8661b296251c535ed028ecf81c32e14e06948a3d8bef829e96a804
- filename: mmproj/mmproj-Qwen3-VL-32B-Instruct-F16.gguf
uri: huggingface://unsloth/Qwen3-VL-32B-Instruct-GGUF/mmproj-F16.gguf
sha256: dde7e407cf72e601455976c2d0daa960d16ee34ba3f0c78718c881d8cd8c1052
- !!merge <<: *qwen3vl
name: "qwen3-vl-4b-thinking"
urls:
- https://huggingface.co/unsloth/Qwen3-VL-4B-Thinking-GGUF
description: |
Qwen3-VL-4B-Thinking is the 4B parameter model of the Qwen3-VL series that is thinking.
overrides:
mmproj: mmproj/mmproj-Qwen3-VL-4B-Thinking-F16.gguf
parameters:
model: Qwen3-VL-4B-Thinking-Q4_K_M.gguf
files:
- filename: Qwen3-VL-4B-Thinking-Q4_K_M.gguf
sha256: bd73237f16265a1014979b7ed34ff9265e7e200ae6745bb1da383a1bbe0f9211
uri: huggingface://unsloth/Qwen3-VL-4B-Thinking-GGUF/Qwen3-VL-4B-Thinking-Q4_K_M.gguf
- filename: mmproj/mmproj-Qwen3-VL-4B-Thinking-F16.gguf
sha256: 72354fcd3fc75935b84e745ca492d6e78dd003bb5a020d71b296e7650926ac87
uri: huggingface://unsloth/Qwen3-VL-4B-Thinking-GGUF/mmproj-F16.gguf
- !!merge <<: *qwen3vl
name: "qwen3-vl-2b-thinking"
urls:
- https://huggingface.co/unsloth/Qwen3-VL-2B-Thinking-GGUF
description: |
Qwen3-VL-2B-Thinking is the 2B parameter model of the Qwen3-VL series that is thinking.
overrides:
mmproj: mmproj/mmproj-Qwen3-VL-2B-Thinking-F16.gguf
parameters:
model: Qwen3-VL-2B-Thinking-Q4_K_M.gguf
files:
- filename: Qwen3-VL-2B-Thinking-Q4_K_M.gguf
uri: huggingface://unsloth/Qwen3-VL-2B-Thinking-GGUF/Qwen3-VL-2B-Thinking-Q4_K_M.gguf
sha256: 6b3c336314bca30dd7efed54109fd3430a0b1bfd177b0300e5f11f8eae987f30
- filename: mmproj/mmproj-Qwen3-VL-2B-Thinking-F16.gguf
sha256: 4eabc90a52fe890d6ca1dad92548782eab6edc91f012a365fff95cf027ba529d
uri: huggingface://unsloth/Qwen3-VL-2B-Thinking-GGUF/mmproj-F16.gguf
- !!merge <<: *qwen3vl
name: "qwen3-vl-2b-instruct"
urls:
- https://huggingface.co/unsloth/Qwen3-VL-2B-Instruct-GGUF
description: |
Qwen3-VL-2B-Instruct is the 2B parameter model of the Qwen3-VL series.
overrides:
mmproj: mmproj/mmproj-Qwen3-VL-2B-Instruct-F16.gguf
parameters:
model: Qwen3-VL-2B-Instruct-Q4_K_M.gguf
files:
- filename: Qwen3-VL-2B-Instruct-Q4_K_M.gguf
sha256: 858fcf2a39dc73b26dd86592cb0a5f949b59d1edb365d1dea98e46b02e955e56
uri: huggingface://unsloth/Qwen3-VL-2B-Instruct-GGUF/Qwen3-VL-2B-Instruct-Q4_K_M.gguf
- filename: mmproj/mmproj-Qwen3-VL-2B-Instruct-F16.gguf
sha256: cd5a851d3928697fa1bd76d459d2cc409b6cf40c9d9682b2f5c8e7c6a9f9630f
uri: huggingface://unsloth/Qwen3-VL-2B-Instruct-GGUF/mmproj-F16.gguf
- !!merge <<: *qwen3vl
name: "huihui-qwen3-vl-30b-a3b-instruct-abliterated"
urls:
- https://huggingface.co/noctrex/Huihui-Qwen3-VL-30B-A3B-Instruct-abliterated-GGUF
description: |
These are quantizations of the model Huihui-Qwen3-VL-30B-A3B-Instruct-abliterated-GGUF
overrides:
mmproj: mmproj/mmproj-Huihui-Qwen3-VL-30B-A3B-F16.gguf
parameters:
model: Huihui-Qwen3-VL-30B-A3B-Instruct-abliterated-Q4_K_M.gguf
files:
- filename: Huihui-Qwen3-VL-30B-A3B-Instruct-abliterated-Q4_K_M.gguf
sha256: 1e94a65167a39d2ff4427393746d4dbc838f3d163c639d932e9ce983f575eabf
uri: huggingface://noctrex/Huihui-Qwen3-VL-30B-A3B-Instruct-abliterated-GGUF/Huihui-Qwen3-VL-30B-A3B-Instruct-abliterated-Q4_K_M.gguf
- filename: mmproj/mmproj-Huihui-Qwen3-VL-30B-A3B-F16.gguf
sha256: 4bfd655851a5609b29201154e0bd4fe5f9274073766b8ab35b3a8acba0dd77a7
uri: huggingface://noctrex/Huihui-Qwen3-VL-30B-A3B-Instruct-abliterated-GGUF/mmproj-F16.gguf
- &jamba
icon: https://cdn-avatars.huggingface.co/v1/production/uploads/65e60c0ed5313c06372446ff/QwehUHgP2HtVAMW5MzJ2j.png
name: "ai21labs_ai21-jamba-reasoning-3b"
url: "github:mudler/LocalAI/gallery/jamba.yaml@master"
license: apache-2.0
tags:
- gguf
- GPU
- CPU
- text-to-text
- jamba
- mamba
urls:
- https://huggingface.co/ai21labs/AI21-Jamba-Reasoning-3B
- https://huggingface.co/bartowski/ai21labs_AI21-Jamba-Reasoning-3B-GGUF
description: |
AI21’s Jamba Reasoning 3B is a top-performing reasoning model that packs leading scores on intelligence benchmarks and highly-efficient processing into a compact 3B build.
The hybrid design combines Transformer attention with Mamba (a state-space model). Mamba layers are more efficient for sequence processing, while attention layers capture complex dependencies. This mix reduces memory overhead, improves throughput, and makes the model run smoothly on laptops, GPUs, and even mobile devices, while maintainig impressive quality.
overrides:
parameters:
model: ai21labs_AI21-Jamba-Reasoning-3B-Q4_K_M.gguf
files:
- filename: ai21labs_AI21-Jamba-Reasoning-3B-Q4_K_M.gguf
sha256: ac7ec0648dea62d1efb5ef6e7268c748ffc71f1c26eebe97eccff0a8d41608e6
uri: huggingface://bartowski/ai21labs_AI21-Jamba-Reasoning-3B-GGUF/ai21labs_AI21-Jamba-Reasoning-3B-Q4_K_M.gguf
- &granite4
url: "github:mudler/LocalAI/gallery/granite4.yaml@master"
name: "ibm-granite_granite-4.0-h-small"
license: apache-2.0
icon: https://cdn-avatars.huggingface.co/v1/production/uploads/639bcaa2445b133a4e942436/CEW-OjXkRkDNmTxSu8Egh.png
tags:
- gguf
- GPU
- CPU
- text-to-text
urls:
- https://huggingface.co/ibm-granite/granite-4.0-h-small
- https://huggingface.co/bartowski/ibm-granite_granite-4.0-h-small-GGUF
description: |
Granite-4.0-H-Small is a 32B parameter long-context instruct model finetuned from Granite-4.0-H-Small-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets. This model is developed using a diverse set of techniques with a structured chat format, including supervised finetuning, model alignment using reinforcement learning, and model merging. Granite 4.0 instruct models feature improved instruction following (IF) and tool-calling capabilities, making them more effective in enterprise applications.
overrides:
parameters:
model: ibm-granite_granite-4.0-h-small-Q4_K_M.gguf
files:
- filename: ibm-granite_granite-4.0-h-small-Q4_K_M.gguf
sha256: c59ce76239bd5794acdbdf88616dfc296247f4e78792a9678d4b3e24966ead69
uri: huggingface://bartowski/ibm-granite_granite-4.0-h-small-GGUF/ibm-granite_granite-4.0-h-small-Q4_K_M.gguf
- !!merge <<: *granite4
name: "ibm-granite_granite-4.0-h-tiny"
urls:
- https://huggingface.co/ibm-granite/granite-4.0-h-tiny
- https://huggingface.co/bartowski/ibm-granite_granite-4.0-h-tiny-GGUF
description: |
Granite-4.0-H-Tiny is a 7B parameter long-context instruct model finetuned from Granite-4.0-H-Tiny-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets. This model is developed using a diverse set of techniques with a structured chat format, including supervised finetuning, model alignment using reinforcement learning, and model merging. Granite 4.0 instruct models feature improved instruction following (IF) and tool-calling capabilities, making them more effective in enterprise applications.
overrides:
parameters:
model: ibm-granite_granite-4.0-h-tiny-Q4_K_M.gguf
files:
- filename: ibm-granite_granite-4.0-h-tiny-Q4_K_M.gguf
sha256: 33a689fe7f35b14ebab3ae599b65aaa3ed8548c393373b1b0eebee36c653146f
uri: huggingface://bartowski/ibm-granite_granite-4.0-h-tiny-GGUF/ibm-granite_granite-4.0-h-tiny-Q4_K_M.gguf
- !!merge <<: *granite4
name: "ibm-granite_granite-4.0-h-micro"
urls:
- https://huggingface.co/ibm-granite/granite-4.0-h-micro
- https://huggingface.co/bartowski/ibm-granite_granite-4.0-h-micro-GGUF
description: |
Granite-4.0-H-Micro is a 3B parameter long-context instruct model finetuned from Granite-4.0-H-Micro-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets. This model is developed using a diverse set of techniques with a structured chat format, including supervised finetuning, model alignment using reinforcement learning, and model merging. Granite 4.0 instruct models feature improved instruction following (IF) and tool-calling capabilities, making them more effective in enterprise applications.
overrides:
parameters:
model: ibm-granite_granite-4.0-h-micro-Q4_K_M.gguf
files:
- filename: ibm-granite_granite-4.0-h-micro-Q4_K_M.gguf
sha256: 48376d61449687a56b3811a418d92cc0e8e77b4d96ec13eb6c9d9503968c9f20
uri: huggingface://bartowski/ibm-granite_granite-4.0-h-micro-GGUF/ibm-granite_granite-4.0-h-micro-Q4_K_M.gguf
- !!merge <<: *granite4
name: "ibm-granite_granite-4.0-micro"
urls:
- https://huggingface.co/ibm-granite/granite-4.0-micro
- https://huggingface.co/bartowski/ibm-granite_granite-4.0-micro-GGUF
description: |
Granite-4.0-Micro is a 3B parameter long-context instruct model finetuned from Granite-4.0-Micro-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets. This model is developed using a diverse set of techniques with a structured chat format, including supervised finetuning, model alignment using reinforcement learning, and model merging. Granite 4.0 instruct models feature improved instruction following (IF) and tool-calling capabilities, making them more effective in enterprise applications.
overrides:
parameters:
model: ibm-granite_granite-4.0-micro-Q4_K_M.gguf
files:
- filename: ibm-granite_granite-4.0-micro-Q4_K_M.gguf
sha256: bd9d7b4795b9dc44e3e81aeae93bb5d8e6b891b7e823be5bf9910ed3ac060baf
uri: huggingface://bartowski/ibm-granite_granite-4.0-micro-GGUF/ibm-granite_granite-4.0-micro-Q4_K_M.gguf
- &ernie
url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
name: "baidu_ernie-4.5-21b-a3b-thinking"
license: apache-2.0
tags:
- gguf
- GPU
- CPU
- text-to-text
icon: https://cdn-avatars.huggingface.co/v1/production/uploads/64f187a2cc1c03340ac30498/TYYUxK8xD1AxExFMWqbZD.png
urls:
- https://huggingface.co/baidu/ERNIE-4.5-21B-A3B-Thinking
- https://huggingface.co/bartowski/baidu_ERNIE-4.5-21B-A3B-Thinking-GGUF
description: |
Over the past three months, we have continued to scale the thinking capability of ERNIE-4.5-21B-A3B, improving both the quality and depth of reasoning, thereby advancing the competitiveness of ERNIE lightweight models in complex reasoning tasks. We are pleased to introduce ERNIE-4.5-21B-A3B-Thinking, featuring the following key enhancements:
Significantly improved performance on reasoning tasks, including logical reasoning, mathematics, science, coding, text generation, and academic benchmarks that typically require human expertise.
Efficient tool usage capabilities.
Enhanced 128K long-context understanding capabilities.
Note: This version has an increased thinking length. We strongly recommend its use in highly complex reasoning tasks. ERNIE-4.5-21B-A3B-Thinking is a text MoE post-trained model, with 21B total parameters and 3B activated parameters for each token.
overrides:
parameters:
model: baidu_ERNIE-4.5-21B-A3B-Thinking-Q4_K_M.gguf
files:
- filename: baidu_ERNIE-4.5-21B-A3B-Thinking-Q4_K_M.gguf
sha256: f309f225c413324c585e74ce28c55e76dec25340156374551d39707fc2966840
uri: huggingface://bartowski/baidu_ERNIE-4.5-21B-A3B-Thinking-GGUF/baidu_ERNIE-4.5-21B-A3B-Thinking-Q4_K_M.gguf
- &mimo
license: mit
tags:
- gguf
- GPU
- CPU
- text-to-text
icon: https://cdn-uploads.huggingface.co/production/uploads/634262af8d8089ebaefd410e/9Bnn2AnIjfQFWBGkhDNmI.png
name: "aurore-reveil_koto-small-7b-it"
urls:
- https://huggingface.co/Aurore-Reveil/Koto-Small-7B-IT
- https://huggingface.co/bartowski/Aurore-Reveil_Koto-Small-7B-IT-GGUF
description: |
Koto-Small-7B-IT is an instruct-tuned version of Koto-Small-7B-PT, which was trained on MiMo-7B-Base for almost a billion tokens of creative-writing data. This model is meant for roleplaying and instruct usecases.
overrides:
parameters:
model: Aurore-Reveil_Koto-Small-7B-IT-Q4_K_M.gguf
files:
- filename: Aurore-Reveil_Koto-Small-7B-IT-Q4_K_M.gguf
sha256: c5c38bfa5d8d5100e91a2e0050a0b2f3e082cd4bfd423cb527abc3b6f1ae180c
uri: huggingface://bartowski/Aurore-Reveil_Koto-Small-7B-IT-GGUF/Aurore-Reveil_Koto-Small-7B-IT-Q4_K_M.gguf
- &internvl35
name: "opengvlab_internvl3_5-30b-a3b"
url: "github:mudler/LocalAI/gallery/qwen3.yaml@master"
icon: https://cdn-uploads.huggingface.co/production/uploads/64006c09330a45b03605bba3/zJsd2hqd3EevgXo6fNgC-.png
urls:
- https://huggingface.co/OpenGVLab/InternVL3_5-30B-A3B
- https://huggingface.co/bartowski/OpenGVLab_InternVL3_5-30B-A3B-GGUF
license: apache-2.0
tags:
- multimodal
- gguf
- GPU
- Cpu
- image-to-text
- text-to-text
description: |
We introduce InternVL3.5, a new family of open-source multimodal models that significantly advances versatility, reasoning capability, and inference efficiency along the InternVL series. A key innovation is the Cascade Reinforcement Learning (Cascade RL) framework, which enhances reasoning through a two-stage process: offline RL for stable convergence and online RL for refined alignment. This coarse-to-fine training strategy leads to substantial improvements on downstream reasoning tasks, e.g., MMMU and MathVista. To optimize efficiency, we propose a Visual Resolution Router (ViR) that dynamically adjusts the resolution of visual tokens without compromising performance. Coupled with ViR, our Decoupled Vision-Language Deployment (DvD) strategy separates the vision encoder and language model across different GPUs, effectively balancing computational load. These contributions collectively enable InternVL3.5 to achieve up to a +16.0% gain in overall reasoning performance and a 4.05 ×\times× inference speedup compared to its predecessor, i.e., InternVL3. In addition, InternVL3.5 supports novel capabilities such as GUI interaction and embodied agency. Notably, our largest model, i.e., InternVL3.5-241B-A28B, attains state-of-the-art results among open-source MLLMs across general multimodal, reasoning, text, and agentic tasks—narrowing the performance gap with leading commercial models like GPT-5. All models and code are publicly released.
overrides:
parameters:
model: OpenGVLab_InternVL3_5-30B-A3B-Q4_K_M.gguf
mmproj: mmproj-OpenGVLab_InternVL3_5-30B-A3B-f16.gguf
files:
- filename: OpenGVLab_InternVL3_5-30B-A3B-Q4_K_M.gguf
sha256: c352004ac811cf9aa198e11f698ebd5fd3c49b483cb31a2b081fb415dd8347c2
uri: huggingface://bartowski/OpenGVLab_InternVL3_5-30B-A3B-GGUF/OpenGVLab_InternVL3_5-30B-A3B-Q4_K_M.gguf
- filename: mmproj-OpenGVLab_InternVL3_5-30B-A3B-f16.gguf
sha256: fa362a7396c3dddecf6f9a714144ed86207211d6c68ef39ea0d7dfe21b969b8d
uri: huggingface://bartowski/OpenGVLab_InternVL3_5-30B-A3B-GGUF/mmproj-OpenGVLab_InternVL3_5-30B-A3B-f16.gguf
- !!merge <<: *internvl35
name: "opengvlab_internvl3_5-30b-a3b-q8_0"
urls:
- https://huggingface.co/OpenGVLab/InternVL3_5-30B-A3B
- https://huggingface.co/bartowski/OpenGVLab_InternVL3_5-30B-A3B-GGUF
overrides:
parameters:
model: OpenGVLab_InternVL3_5-30B-A3B-Q8_0.gguf
mmproj: mmproj-OpenGVLab_InternVL3_5-30B-A3B-f16.gguf
files:
- filename: OpenGVLab_InternVL3_5-30B-A3B-Q8_0.gguf
sha256: 79ac13df1d3f784cd5702b2835ede749cdfd274f141d1e0df25581af2a2a6720
uri: huggingface://bartowski/OpenGVLab_InternVL3_5-30B-A3B-GGUF/OpenGVLab_InternVL3_5-30B-A3B-Q8_0.gguf
- filename: mmproj-OpenGVLab_InternVL3_5-30B-A3B-f16.gguf
sha256: fa362a7396c3dddecf6f9a714144ed86207211d6c68ef39ea0d7dfe21b969b8d
uri: huggingface://bartowski/OpenGVLab_InternVL3_5-30B-A3B-GGUF/mmproj-OpenGVLab_InternVL3_5-30B-A3B-f16.gguf
- !!merge <<: *internvl35
name: "opengvlab_internvl3_5-14b-q8_0"
urls:
- https://huggingface.co/OpenGVLab/InternVL3_5-14B
- https://huggingface.co/bartowski/OpenGVLab_InternVL3_5-14B-GGUF
overrides:
parameters:
model: OpenGVLab_InternVL3_5-14B-Q8_0.gguf
mmproj: mmproj-OpenGVLab_InternVL3_5-14B-f16.gguf
files:
- filename: OpenGVLab_InternVL3_5-14B-Q8_0.gguf
sha256: e097b9c837347ec8050f9ed95410d1001030a4701eb9551c1be04793af16677a
uri: huggingface://bartowski/OpenGVLab_InternVL3_5-14B-GGUF/OpenGVLab_InternVL3_5-14B-Q8_0.gguf
- filename: mmproj-OpenGVLab_InternVL3_5-14B-f16.gguf
sha256: c9625c981969d267052464e2d345f8ff5bc7e841871f5284a2bd972461c7356d
uri: huggingface://bartowski/OpenGVLab_InternVL3_5-14B-GGUF/mmproj-OpenGVLab_InternVL3_5-14B-f16.gguf
- !!merge <<: *internvl35
name: "opengvlab_internvl3_5-14b"
urls:
- https://huggingface.co/OpenGVLab/InternVL3_5-14B
- https://huggingface.co/bartowski/OpenGVLab_InternVL3_5-14B-GGUF
overrides:
mmproj: mmproj-OpenGVLab_InternVL3_5-14B-f16.gguf
parameters:
model: OpenGVLab_InternVL3_5-14B-Q4_K_M.gguf
files:
- filename: OpenGVLab_InternVL3_5-14B-Q4_K_M.gguf
sha256: 5bb86ab56ee543bb72ba0cab58658ecb54713504f1bc9d1d075d202a35419032
uri: huggingface://bartowski/OpenGVLab_InternVL3_5-14B-GGUF/OpenGVLab_InternVL3_5-14B-Q4_K_M.gguf
- filename: mmproj-OpenGVLab_InternVL3_5-14B-f16.gguf
sha256: c9625c981969d267052464e2d345f8ff5bc7e841871f5284a2bd972461c7356d
uri: huggingface://bartowski/OpenGVLab_InternVL3_5-14B-GGUF/mmproj-OpenGVLab_InternVL3_5-14B-f16.gguf
- !!merge <<: *internvl35
name: "opengvlab_internvl3_5-8b"
urls:
- https://huggingface.co/OpenGVLab/InternVL3_5-8B
- https://huggingface.co/bartowski/OpenGVLab_InternVL3_5-8B-GGUF
overrides:
mmproj: mmproj-OpenGVLab_InternVL3_5-8B-f16.gguf
parameters:
model: OpenGVLab_InternVL3_5-8B-Q4_K_M.gguf
files:
- filename: OpenGVLab_InternVL3_5-8B-Q4_K_M.gguf
sha256: f3792d241a77a88be986445fed2498489e7360947ae4556e58cb0833e9fbc697
uri: huggingface://bartowski/OpenGVLab_InternVL3_5-8B-GGUF/OpenGVLab_InternVL3_5-8B-Q4_K_M.gguf
- filename: mmproj-OpenGVLab_InternVL3_5-8B-f16.gguf
sha256: 212cc090f81ea2981b870186d4b424fae69489a5313a14e52ffdb2e877852389
uri: huggingface://bartowski/OpenGVLab_InternVL3_5-8B-GGUF/mmproj-OpenGVLab_InternVL3_5-8B-f16.gguf
- !!merge <<: *internvl35
name: "opengvlab_internvl3_5-8b-q8_0"
urls:
- https://huggingface.co/OpenGVLab/InternVL3_5-8B
- https://huggingface.co/bartowski/OpenGVLab_InternVL3_5-8B-GGUF
overrides:
mmproj: mmproj-OpenGVLab_InternVL3_5-8B-f16.gguf
parameters:
model: OpenGVLab_InternVL3_5-8B-Q8_0.gguf
files:
- filename: OpenGVLab_InternVL3_5-8B-Q8_0.gguf
sha256: d81138703d9a641485c8bb064faa87f18cbc2adc9975bbedd20ab21dc7318260
uri: huggingface://bartowski/OpenGVLab_InternVL3_5-8B-GGUF/OpenGVLab_InternVL3_5-8B-Q8_0.gguf
- filename: mmproj-OpenGVLab_InternVL3_5-8B-f16.gguf
sha256: 212cc090f81ea2981b870186d4b424fae69489a5313a14e52ffdb2e877852389
uri: huggingface://bartowski/OpenGVLab_InternVL3_5-8B-GGUF/mmproj-OpenGVLab_InternVL3_5-8B-f16.gguf
- !!merge <<: *internvl35
name: "opengvlab_internvl3_5-4b"
urls:
- https://huggingface.co/OpenGVLab/InternVL3_5-4B
- https://huggingface.co/bartowski/OpenGVLab_InternVL3_5-4B-GGUF
overrides:
mmproj: mmproj-OpenGVLab_InternVL3_5-4B-f16.gguf
parameters:
model: OpenGVLab_InternVL3_5-4B-Q4_K_M.gguf
files:
- filename: OpenGVLab_InternVL3_5-4B-Q4_K_M.gguf
sha256: 7c1612b6896ad14caa501238e72afa17a600651d0984225e3ff78b39de86099c
uri: huggingface://bartowski/OpenGVLab_InternVL3_5-4B-GGUF/OpenGVLab_InternVL3_5-4B-Q4_K_M.gguf
- filename: mmproj-OpenGVLab_InternVL3_5-4B-f16.gguf
sha256: 0f9704972fcb9cb0a4f2c0f4eb7fe4f58e53ccd4b06ec17cf7a80271aa963eb7
uri: huggingface://bartowski/OpenGVLab_InternVL3_5-8B-GGUF/mmproj-OpenGVLab_InternVL3_5-4B-f16.gguf
- !!merge <<: *internvl35
name: "opengvlab_internvl3_5-4b-q8_0"
urls:
- https://huggingface.co/OpenGVLab/InternVL3_5-4B
- https://huggingface.co/bartowski/OpenGVLab_InternVL3_5-4B-GGUF
overrides:
mmproj: mmproj-OpenGVLab_InternVL3_5-4B-f16.gguf
parameters:
model: OpenGVLab_InternVL3_5-4B-Q8_0.gguf
files:
- filename: OpenGVLab_InternVL3_5-4B-Q8_0.gguf
sha256: ece87031e20486b1a4b86a0ba0f06b8b3b6eed676c8c6842e31041524489992d
uri: huggingface://bartowski/OpenGVLab_InternVL3_5-4B-GGUF/OpenGVLab_InternVL3_5-4B-Q8_0.gguf
- filename: mmproj-OpenGVLab_InternVL3_5-4B-f16.gguf
sha256: 0f9704972fcb9cb0a4f2c0f4eb7fe4f58e53ccd4b06ec17cf7a80271aa963eb7
uri: huggingface://bartowski/OpenGVLab_InternVL3_5-8B-GGUF/mmproj-OpenGVLab_InternVL3_5-4B-f16.gguf
- !!merge <<: *internvl35
name: "opengvlab_internvl3_5-2b"
urls:
- https://huggingface.co/OpenGVLab/InternVL3_5-2B
- https://huggingface.co/bartowski/OpenGVLab_InternVL3_5-2B-GGUF
overrides:
mmproj: mmproj-OpenGVLab_InternVL3_5-2B-f16.gguf
parameters:
model: OpenGVLab_InternVL3_5-2B-Q8_0.gguf
files:
- filename: OpenGVLab_InternVL3_5-2B-Q8_0.gguf
sha256: 6997c6e3a1fe5920ac1429a21a3ec15d545e14eb695ee3656834859e617800b5
uri: huggingface://bartowski/OpenGVLab_InternVL3_5-2B-GGUF/OpenGVLab_InternVL3_5-2B-Q8_0.gguf
- filename: mmproj-OpenGVLab_InternVL3_5-2B-f16.gguf
sha256: e83ba6e675b747f7801557dc24594f43c17a7850b6129d4972d55e3e9b010359
uri: huggingface://bartowski/OpenGVLab_InternVL3_5-8B-GGUF/mmproj-OpenGVLab_InternVL3_5-2B-f16.gguf
- &lfm2
url: "github:mudler/LocalAI/gallery/lfm.yaml@master"
name: "lfm2-vl-450m"
license: lfm1.0
tags:
- multimodal
- image-to-text
- gguf
- cpu
- gpu
- edge
icon: https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/7_6D7rWrLxp2hb6OHSV1p.png
urls:
- https://huggingface.co/LiquidAI/LFM2-VL-450M
- https://huggingface.co/LiquidAI/LFM2-VL-450M-GGUF
description: |
LFM2‑VL is Liquid AI's first series of multimodal models, designed to process text and images with variable resolutions. Built on the LFM2 backbone, it is optimized for low-latency and edge AI applications.
We're releasing the weights of two post-trained checkpoints with 450M (for highly constrained devices) and 1.6B (more capable yet still lightweight) parameters.
2× faster inference speed on GPUs compared to existing VLMs while maintaining competitive accuracy
Flexible architecture with user-tunable speed-quality tradeoffs at inference time
Native resolution processing up to 512×512 with intelligent patch-based handling for larger images, avoiding upscaling and distortion
overrides:
parameters:
model: LFM2-VL-450M-F16.gguf
mmproj: mmproj-LFM2-VL-450M-F16.gguf
files:
- filename: LFM2-VL-450M-F16.gguf
sha256: 0197edb886bb25136b52ac47e8c75a1d51e7ba41deda7eb18e8258b193b59a3b
uri: huggingface://LiquidAI/LFM2-VL-450M-GGUF/LFM2-VL-450M-F16.gguf
- filename: mmproj-LFM2-VL-450M-F16.gguf
sha256: 416a085c5c7ba0f8d02bb8326c719a6f8f2210c2641c6bf64194a57c11c76e59
uri: huggingface://LiquidAI/LFM2-VL-450M-GGUF/mmproj-LFM2-VL-450M-F16.gguf
- !!merge <<: *lfm2
name: "lfm2-vl-1.6b"
urls:
- https://huggingface.co/LiquidAI/LFM2-VL-1.6B
- https://huggingface.co/LiquidAI/LFM2-VL-1.6B-GGUF
overrides:
parameters:
model: LFM2-VL-1.6B-F16.gguf
mmproj: mmproj-LFM2-VL-1.6B-F16.gguf
files:
- filename: LFM2-VL-1.6B-F16.gguf
sha256: 0a82498edc354b50247fee78081c8954ae7f4deee9068f8464a5ee774e82118a
uri: huggingface://LiquidAI/LFM2-VL-1.6B-GGUF/LFM2-VL-1.6B-F16.gguf
- filename: mmproj-LFM2-VL-1.6B-F16.gguf
sha256: b637bfa6060be2bc7503ec23ba48b407843d08c2ca83f52be206ea8563ccbae2
uri: huggingface://LiquidAI/LFM2-VL-1.6B-GGUF/mmproj-LFM2-VL-1.6B-F16.gguf
- !!merge <<: *lfm2
name: "lfm2-1.2b"
urls:
- https://huggingface.co/LiquidAI/LFM2-1.2B
- https://huggingface.co/LiquidAI/LFM2-1.2B-GGUF
overrides:
parameters:
model: LFM2-1.2B-F16.gguf
files:
- filename: LFM2-1.2B-F16.gguf
sha256: 0ddedfb8c5f7f73e77f19678bbc0f6ba2554d0534dd0feea65ea5bca2907d5f2
uri: huggingface://LiquidAI/LFM2-1.2B-GGUF/LFM2-1.2B-F16.gguf
- !!merge <<: *lfm2
name: "liquidai_lfm2-350m-extract"
urls:
- https://huggingface.co/LiquidAI/LFM2-350M-Extract
- https://huggingface.co/bartowski/LiquidAI_LFM2-350M-Extract-GGUF
description: |
Based on LFM2-350M, LFM2-350M-Extract is designed to extract important information from a wide variety of unstructured documents (such as articles, transcripts, or reports) into structured outputs like JSON, XML, or YAML.
Use cases:
Extracting invoice details from emails into structured JSON.
Converting regulatory filings into XML for compliance systems.
Transforming customer support tickets into YAML for analytics pipelines.
Populating knowledge graphs with entities and attributes from unstructured reports.
You can find more information about other task-specific models in this blog post.
overrides:
parameters:
model: LiquidAI_LFM2-350M-Extract-Q4_K_M.gguf
files:
- filename: LiquidAI_LFM2-350M-Extract-Q4_K_M.gguf
sha256: 340a7fb24b98a7dbe933169dbbb869f4d89f8c7bf27ee45d62afabfc5b376743
uri: huggingface://bartowski/LiquidAI_LFM2-350M-Extract-GGUF/LiquidAI_LFM2-350M-Extract-Q4_K_M.gguf
- !!merge <<: *lfm2
name: "liquidai_lfm2-1.2b-extract"
urls:
- https://huggingface.co/LiquidAI/LFM2-1.2B-Extract
- https://huggingface.co/bartowski/LiquidAI_LFM2-1.2B-Extract-GGUF
description: |
Based on LFM2-1.2B, LFM2-1.2B-Extract is designed to extract important information from a wide variety of unstructured documents (such as articles, transcripts, or reports) into structured outputs like JSON, XML, or YAML.
Use cases:
Extracting invoice details from emails into structured JSON.
Converting regulatory filings into XML for compliance systems.
Transforming customer support tickets into YAML for analytics pipelines.
Populating knowledge graphs with entities and attributes from unstructured reports.
overrides:
parameters:
model: LiquidAI_LFM2-1.2B-Extract-Q4_K_M.gguf
files:
- filename: LiquidAI_LFM2-1.2B-Extract-Q4_K_M.gguf
sha256: 97a1c5600045e9ade49bc4a9e3df083cef7c82b05a6d47ea2e58ab44cc98b16a
uri: huggingface://bartowski/LiquidAI_LFM2-1.2B-Extract-GGUF/LiquidAI_LFM2-1.2B-Extract-Q4_K_M.gguf
- !!merge <<: *lfm2
name: "liquidai_lfm2-1.2b-rag"
urls:
- https://huggingface.co/LiquidAI/LFM2-1.2B-RAG
- https://huggingface.co/bartowski/LiquidAI_LFM2-1.2B-RAG-GGUF
description: |
Based on LFM2-1.2B, LFM2-1.2B-RAG is specialized in answering questions based on provided contextual documents, for use in RAG (Retrieval-Augmented Generation) systems.
Use cases:
Chatbot to ask questions about the documentation of a particular product.
Custom support with an internal knowledge base to provide grounded answers.
Academic research assistant with multi-turn conversations about research papers and course materials.
overrides:
parameters:
model: LiquidAI_LFM2-1.2B-RAG-Q4_K_M.gguf
files:
- filename: LiquidAI_LFM2-1.2B-RAG-Q4_K_M.gguf
sha256: 11c93b5ae81612ab532fcfb395fddd2fb478b5d6215e1b46eeee3576a31eaa2d
uri: huggingface://bartowski/LiquidAI_LFM2-1.2B-RAG-GGUF/LiquidAI_LFM2-1.2B-RAG-Q4_K_M.gguf
- !!merge <<: *lfm2
name: "liquidai_lfm2-1.2b-tool"
urls:
- https://huggingface.co/LiquidAI/LFM2-1.2B-Tool
- https://huggingface.co/bartowski/LiquidAI_LFM2-1.2B-Tool-GGUF
description: |
Based on LFM2-1.2B, LFM2-1.2B-Tool is designed for concise and precise tool calling. The key challenge was designing a non-thinking model that outperforms similarly sized thinking models for tool use.
Use cases:
Mobile and edge devices requiring instant API calls, database queries, or system integrations without cloud dependency.
Real-time assistants in cars, IoT devices, or customer support, where response latency is critical.
Resource-constrained environments like embedded systems or battery-powered devices needing efficient tool execution.
overrides:
parameters:
model: LiquidAI_LFM2-1.2B-Tool-Q4_K_M.gguf
files:
- filename: LiquidAI_LFM2-1.2B-Tool-Q4_K_M.gguf
sha256: 6bdf2292a137c12264a065d73c12b61065293440b753249727cec0b6dc350d64
uri: huggingface://bartowski/LiquidAI_LFM2-1.2B-Tool-GGUF/LiquidAI_LFM2-1.2B-Tool-Q4_K_M.gguf
- !!merge <<: *lfm2
name: "liquidai_lfm2-350m-math"
urls:
- https://huggingface.co/LiquidAI/LFM2-350M-Math