Releases · mostlygeek/llama-swap

31 May 00:07

03d58e5

v220 Latest

Latest

Another release? Who needs to touch grass?

This release includes a small patch by first time contributor @Luiszzzor. As well a new load and concurrency testing tool has been added to the Playground. It's easier to show than tell so check out this demo video with a swap matrix example:

llama-swap-concurrency.mp4

Side note: I forgot to mention in the video that to support more than 5 or 6 (depending on your browser) concurrent requests you will need a valid TLS certificate. llama-swap supports http/2 however that requires https. In the demo I run it on my tailnet with tailscale serve 8080 which generates the Let's Encrypt cert for me.

Changelog

03d58e5 Add load testing tool to the UI (#805)
c790d0e fix: update the concurrency middleware to respond with a JSON payload (#798)

Contributors

Luiszzzor

Assets 9

29 May 22:31

github-actions

v219

4ca9c47

v219 (fixes v218)

Notes

Including details for v218 (broken) and PR #790.

llama-swap has a new routing backend. What started as a small experiment to improve the concurrency handling exploded into a full refactor of the backend. For users this the biggest change is swapping is more efficient. Requests are collated so requests for models that are already loaded will take precedence over those that awaiting loading.

It looks like:

new router: A B A B A B -> A A A B B B
old router: A B A B A B -> A B A B A B

However, just doing that wouldn't require a 12,009 line PR. There were a lot of architectural changes that makes developer quality of life a bit easier. Redundant code was removed, repo organization is centralized around the internal/ packages, new funny loading remarks were added, etc.

Also a new concurrency tester sneaked in under cmd/concurrency-tester.

Changelog

4ca9c47 Makefile,internal/server: various release tweaks
146a9ea ui-svelte: update build directory (#801)

Assets 9

29 May 16:57

github-actions

v218

02e015f

v218 (broken DO NOT USE)

This one has a bug where the release did not include the UI. Use v219 instead.

Changelog

02e015f Introduce new routing backend (#790)
63bc266 Add new power draw column header for rocm-smi monitoring (#788)

Assets 9

22 May 07:20

github-actions

v217

636b53e

v217

Changelog

636b53e Improve rocm-smi performance monitoring (#775)
59cd3b6 Added Windows performance monitoring using nvidia-smi (#773)
5d1e62d Disable auto review feature in coderabbit config
dbb869d Increase inactivity thresholds for stale issues
26bb17e config.example.yaml: Improve matrix vs groups info

Assets 9

17 May 18:48

github-actions

v216

2982dd3

v216

Changelog

2982dd3 ui-svelte: update link to performance discussion thread

Assets 9

17 May 17:28

github-actions

v215

79dc87f

v215

Adds ROCm support to the new experimental performance monitor.
Thank you to @knguyen298 for this patch.

Changelog

79dc87f Add ROCm stats via rocm-smi (#767)

Contributors

knguyen298

Assets 9

15 May 23:49

github-actions

v214

b2fcc2d

v214

This release fixes a couple of small bugs in the UI and the new performance monitor

Contributors

@krzychdre (#760) for finding and fixing the negative counting in the UI
@cdwaage (#759) for fixing the bug in the nvidia-smi fallback for the performance monitor

Changelog

b2fcc2d ui-svelte: fix cached tokens total counting -1 sentinel (#760)
6a9c4ef fix: use --loop instead of -loop for nvidia-smi (driver 540+ compat) (#759)

Contributors

krzychdre and cdwaage

Assets 9

15 May 04:59

github-actions

v213

0c813e4

v213

Changelog

0c813e4 ui-svelte: package updates
fe71e8a proxy,ui-svelte: improve support for v1/messages and v1/responses (#758)

Assets 9

14 May 05:14

github-actions

v212

aac7b87

v212

This release packs a lot into it. It introduces a new experimental performance monitor for linux machines first. In the UI there is a new tab that will show up to the last hour of statistics:

Additionally a /metrics for the common prometheus and grafana combo. A grafana dashboard example is provided to get you started. It looks like this:

Other small changes

versionless API endpoints were added that do not require the v1/ prefix. These help with upstream peers like z.ai that do not follow the v1 versioning convention
the -watch-config system has been refactored. It supports a mounting the config file into a docker container now. This removes the requirement to mount a directory with the config in it.

Contributions from the community

Much thanks to @bankjaneo (#741), @rhtenhove (#746), @sousekd (#753).

Changelog

aac7b87 ci: set go-version-file in release workflow
4e606fe ci: fix workflow bugs in release and go-ci
a4b91e0 Changes and fixes before the release (docs/small tweaks) (#750)
3e3646f perf: ignore LACT devices reporting zero VRAM (#753)
a01afe2 ci: use manifest-aware cleanup action for multi-arch :cpu (#751)
174e856 Multi arch cpu (#746)
085b54b proxy: fix data race in /running endpoint and typo in error message (#748)
2be3416 ui: add auto theme switch mode based on system theme (#741)
7e3e94a proxy,ui: add performance monitoring with Prometheus metrics (#743)
e261745 proxy: add versionless API endpoint (#733)
11b7913 llama-swap.go: remove debounce, replace fmt.Printlns (#731)

Contributors

rhtenhove, bankjaneo, and sousekd

Assets 9

02 May 19:22

github-actions

v211

c79114d

v211

Changelog

c79114d proxy: fix logger not checking matrix for processes

Assets 9

Releases: mostlygeek/llama-swap

v220

Changelog

Contributors

Uh oh!

v219 (fixes v218)

Notes

Changelog

Uh oh!

v218 (broken DO NOT USE)

Changelog

Uh oh!

v217

Changelog

Uh oh!

v216

Changelog

Uh oh!

v215

Changelog

Contributors

Uh oh!

v214

Contributors

Changelog

Contributors

Uh oh!

v213

Changelog

Uh oh!

v212

Other small changes

Contributions from the community

Changelog

Contributors

Uh oh!

v211

Changelog

Uh oh!