Skip to content

Releases: llm-d/llm-d-inference-scheduler

v0.4.0

01 Dec 16:17
v0.4.0
86f5af7

Choose a tag to compare

Docker image is available at:

docker pull ghcr.io/llm-d/llm-d-inference-scheduler:v0.4.0

What's Changed

  • Use a production version of Istio by @shmuelk in #334
  • add vMaroon as code owner by @elevran in #342
  • Upgrade github.com/llm-d/llm-d-kv-cache-manager import to v0.3.0 by @vMaroon in #344
  • add a hold label when PRs are pushed to branch other than main by @nirrozenbaum in #345
  • sync gic to latest v1.0.0 release by @nirrozenbaum in #353
  • deps(actions): bump actions/stale from 9 to 10 by @dependabot[bot] in #350
  • deps(actions): bump actions/setup-go from 5 to 6 by @dependabot[bot] in #351
  • deps(actions): bump crate-ci/typos from 1.35.7 to 1.36.2 by @dependabot[bot] in #348
  • deps(go): bump the go-dependencies group with 7 updates by @dependabot[bot] in #349
  • bump llm-d-kv-cache-manager version by @vMaroon in #359
  • fix: Rename config to kv-cache-utilization-scorer from kv-cache-scorer by @yankay in #358
  • updating release issue-template by @kfswain in #361
  • bump llm-d-kv-cache-manager version (v0.3.2) by @vMaroon in #365
  • deps(go): bump github.com/onsi/ginkgo/v2 from 2.25.3 to 2.26.0 in the go-dependencies group by @dependabot[bot] in #368
  • feat: Add a scoring plugin to distribute new groups evenly by @usize in #357
  • implement PreRequest and PostResponse interface checks by @learner0810 in #372
  • deps(go): bump the kubernetes group with 2 updates by @dependabot[bot] in #369
  • deps(go): bump google.golang.org/grpc from 1.75.1 to 1.76.0 in the go-dependencies group by @dependabot[bot] in #374
  • Supports the ResponseComplete plugin by @learner0810 in #378
  • deps(actions): bump crate-ci/typos from 1.36.2 to 1.38.1 by @dependabot[bot] in #373
  • Fix multi-architecture image issues with Kind by @shmuelk in #362
  • feat: Moved the Routing Sidecar from its own repo to the inference-scheduler repo by @shmuelk in #379
  • Upgrade to use Gateway Inference Extension 1.1.0 rc.1 by @shmuelk in #384
  • deps(go): bump github.com/onsi/ginkgo/v2 from 2.26.0 to 2.27.1 in the go-dependencies group by @dependabot[bot] in #389
  • Ensure that max_completion_tokens=1 in Prefill by @shmuelk in #403
  • Add explanation of inference-scheduler relation to IGW/GIE by @elevran in #393
  • Add test coverage to test-unit Makefile target by @carlory in #391
  • Add regression tests for max_completion_tokens by @pierDipi in #411
  • Makefile refactoring to minimize the number of targets by @shmuelk in #397
  • feat: Add vLLM Data Parallel support to llm-d-inference-scheduler by @shmuelk in #392
  • fix(scorer): prevent potential division by zero in ActiveRequest.Score by @googs1025 in #413
  • Fixed wildcard targets by @shmuelk in #416
  • deps(actions): bump crate-ci/typos from 1.38.1 to 1.39.0 by @dependabot[bot] in #419
  • deps(go): bump github.com/onsi/ginkgo/v2 from 2.27.1 to 2.27.2 in the go-dependencies group by @dependabot[bot] in #417
  • Missed change to the Go code coverage output file names in the Makefile refactoring by @shmuelk in #422
  • Fix: Remove reference to the missing make target by @andreyod in #423
  • deps(actions): bump actions/upload-artifact from 4 to 5 by @dependabot[bot] in #420
  • deps(go): bump sigs.k8s.io/controller-runtime from 0.22.3 to 0.22.4 in the kubernetes group by @dependabot[bot] in #418
  • Enhancement: return 503 instead of 502 when decode node is not ready by @Phil-OSophy-42 in #412
  • Remove endpointslices from RBAC by @elevran in #424
  • Fix Image Loading for Podman in E2E Tests by @hdefazio in #406
  • readme meetings update by @nirrozenbaum in #427
  • Fix references to the SideCar's tag by @shmuelk in #428
  • Remove duplicate error logs by @hyeongyun0916 in #429
  • Upgrade to istio-1.28 by @irar2 in #431
  • Complete upgrade to Istio 1.28.0 by @shmuelk in #433
  • Upgrade GIE dependency to 1.1.0 by @shmuelk in #435
  • Remove dev from branch list in PR actions by @elevran in #434
  • Added support for Data Parallel in a Disagregated Prefil/Decode setup by @shmuelk in #432
  • Remove code coverage from CI workflow by @carlory in #437
  • test: Scale up and down the model server during an end to end test by @shmuelk in #354
  • fix: add validation in ByLabelFactory to prevent invalid configurations by @googs1025 in #440
  • deps(actions): bump golangci/golangci-lint-action from 8 to 9 by @dependabot[bot] in #444
  • change lmcache connector to nixlv2 by @googs1025 in #446
  • fix: Roll back automatic updates to Dockerfiles by @shmuelk in #447
  • deps(go): bump golang.org/x/sync from 0.17.0 to 0.18.0 in the go-dependencies group by @dependabot[bot] in #443
  • fix(profile): validate handler parameters to prevent invalid config by @googs1025 in #449
  • Added chat completions preprocessing support by @guygir in #426
  • docs: add integration guide for external prefill/decode workloads by @googs1025 in #451
  • Define and manage PR lifecycle by @elevran in #450
  • test: End to End test for Data Parallel support by @shmuelk in #442
  • docs: add PD-aware examples for by-label and by-label-selector plugins by @googs1025 in #454
  • deps(actions): bump crate-ci/typos from 1.39.0 to 1.39.2 by @dependabot[bot] in #459
  • Add SGLang Connector for Prefill/Decode Disaggregation (migrated from llm-d-routing-sidecar#64) by @bongwoobak in #456
  • deps(go): bump the kubernetes group with 4 updates by @dependabot[bot] in #460
  • add unit test in scheduler plugin part(by-label, data-parallel-profile-handler, pd-profile-handler) by @googs1025 in #461
  • test: Enable running the end to end tests on K8S clusters other than Kind by @shmuelk in #453
  • Allow the sidecar to sample from a list of prefill host ports by @smarterclayton in #404
  • fix: Fixed issues running locally 'make lint' and 'make test-unit' by @shmuelk in #464
  • cleanup: Followup to Python paths fix by @shmuelk in #468
  • Replace tab with spaces to avoid treating as make target by @elevran in #469
  • minor refactoring of precise-prefix-cache scorer plugin by @vMaroon in #473
  • feat: Add initial metrics and update dependencies by...
Read more

v0.4.0-rc.1

24 Nov 12:07
v0.4.0-rc.1
cd7f004

Choose a tag to compare

v0.4.0-rc.1 Pre-release
Pre-release

Docker image is available here:

docker pull ghcr.io/llm-d/llm-d-inference-scheduler:v0.4.0-rc.1

What's Changed

  • Use a production version of Istio by @shmuelk in #334
  • add vMaroon as code owner by @elevran in #342
  • Upgrade github.com/llm-d/llm-d-kv-cache-manager import to v0.3.0 by @vMaroon in #344
  • add a hold label when PRs are pushed to branch other than main by @nirrozenbaum in #345
  • sync gic to latest v1.0.0 release by @nirrozenbaum in #353
  • deps(actions): bump actions/stale from 9 to 10 by @dependabot[bot] in #350
  • deps(actions): bump actions/setup-go from 5 to 6 by @dependabot[bot] in #351
  • deps(actions): bump crate-ci/typos from 1.35.7 to 1.36.2 by @dependabot[bot] in #348
  • deps(go): bump the go-dependencies group with 7 updates by @dependabot[bot] in #349
  • bump llm-d-kv-cache-manager version by @vMaroon in #359
  • fix: Rename config to kv-cache-utilization-scorer from kv-cache-scorer by @yankay in #358
  • updating release issue-template by @kfswain in #361
  • bump llm-d-kv-cache-manager version (v0.3.2) by @vMaroon in #365
  • deps(go): bump github.com/onsi/ginkgo/v2 from 2.25.3 to 2.26.0 in the go-dependencies group by @dependabot[bot] in #368
  • feat: Add a scoring plugin to distribute new groups evenly by @usize in #357
  • implement PreRequest and PostResponse interface checks by @learner0810 in #372
  • deps(go): bump the kubernetes group with 2 updates by @dependabot[bot] in #369
  • deps(go): bump google.golang.org/grpc from 1.75.1 to 1.76.0 in the go-dependencies group by @dependabot[bot] in #374
  • Supports the ResponseComplete plugin by @learner0810 in #378
  • deps(actions): bump crate-ci/typos from 1.36.2 to 1.38.1 by @dependabot[bot] in #373
  • Fix multi-architecture image issues with Kind by @shmuelk in #362
  • feat: Moved the Routing Sidecar from its own repo to the inference-scheduler repo by @shmuelk in #379
  • Upgrade to use Gateway Inference Extension 1.1.0 rc.1 by @shmuelk in #384
  • deps(go): bump github.com/onsi/ginkgo/v2 from 2.26.0 to 2.27.1 in the go-dependencies group by @dependabot[bot] in #389
  • Ensure that max_completion_tokens=1 in Prefill by @shmuelk in #403
  • Add explanation of inference-scheduler relation to IGW/GIE by @elevran in #393
  • Add test coverage to test-unit Makefile target by @carlory in #391
  • Add regression tests for max_completion_tokens by @pierDipi in #411
  • Makefile refactoring to minimize the number of targets by @shmuelk in #397
  • feat: Add vLLM Data Parallel support to llm-d-inference-scheduler by @shmuelk in #392
  • fix(scorer): prevent potential division by zero in ActiveRequest.Score by @googs1025 in #413
  • Fixed wildcard targets by @shmuelk in #416
  • deps(actions): bump crate-ci/typos from 1.38.1 to 1.39.0 by @dependabot[bot] in #419
  • deps(go): bump github.com/onsi/ginkgo/v2 from 2.27.1 to 2.27.2 in the go-dependencies group by @dependabot[bot] in #417
  • Missed change to the Go code coverage output file names in the Makefile refactoring by @shmuelk in #422
  • Fix: Remove reference to the missing make target by @andreyod in #423
  • deps(actions): bump actions/upload-artifact from 4 to 5 by @dependabot[bot] in #420
  • deps(go): bump sigs.k8s.io/controller-runtime from 0.22.3 to 0.22.4 in the kubernetes group by @dependabot[bot] in #418
  • Enhancement: return 503 instead of 502 when decode node is not ready by @Phil-OSophy-42 in #412
  • Remove endpointslices from RBAC by @elevran in #424
  • Fix Image Loading for Podman in E2E Tests by @hdefazio in #406
  • readme meetings update by @nirrozenbaum in #427
  • Fix references to the SideCar's tag by @shmuelk in #428
  • Remove duplicate error logs by @hyeongyun0916 in #429
  • Upgrade to istio-1.28 by @irar2 in #431
  • Complete upgrade to Istio 1.28.0 by @shmuelk in #433
  • Upgrade GIE dependency to 1.1.0 by @shmuelk in #435
  • Remove dev from branch list in PR actions by @elevran in #434
  • Added support for Data Parallel in a Disagregated Prefil/Decode setup by @shmuelk in #432
  • Remove code coverage from CI workflow by @carlory in #437
  • test: Scale up and down the model server during an end to end test by @shmuelk in #354
  • fix: add validation in ByLabelFactory to prevent invalid configurations by @googs1025 in #440
  • deps(actions): bump golangci/golangci-lint-action from 8 to 9 by @dependabot[bot] in #444
  • change lmcache connector to nixlv2 by @googs1025 in #446
  • fix: Roll back automatic updates to Dockerfiles by @shmuelk in #447
  • deps(go): bump golang.org/x/sync from 0.17.0 to 0.18.0 in the go-dependencies group by @dependabot[bot] in #443
  • fix(profile): validate handler parameters to prevent invalid config by @googs1025 in #449
  • Added chat completions preprocessing support by @guygir in #426
  • docs: add integration guide for external prefill/decode workloads by @googs1025 in #451
  • Define and manage PR lifecycle by @elevran in #450
  • test: End to End test for Data Parallel support by @shmuelk in #442
  • docs: add PD-aware examples for by-label and by-label-selector plugins by @googs1025 in #454
  • deps(actions): bump crate-ci/typos from 1.39.0 to 1.39.2 by @dependabot[bot] in #459
  • Add SGLang Connector for Prefill/Decode Disaggregation (migrated from llm-d-routing-sidecar#64) by @bongwoobak in #456
  • deps(go): bump the kubernetes group with 4 updates by @dependabot[bot] in #460
  • add unit test in scheduler plugin part(by-label, data-parallel-profile-handler, pd-profile-handler) by @googs1025 in #461
  • test: Enable running the end to end tests on K8S clusters other than Kind by @shmuelk in #453
  • Allow the sidecar to sample from a list of prefill host ports by @smarterclayton in #404
  • fix: Fixed issues running locally 'make lint' and 'make test-unit' by @shmuelk in #464
  • cleanup: Followup to Python paths fix by @shmuelk in #468
  • Replace tab with spaces to avoid treating as make target by @elevran in #469
  • minor refactoring of precise-prefix-cache scorer plugin by @vMaroon in #473
  • feat: Add initial metrics and update dependencies...
Read more

v0.3.2

09 Oct 00:08
v0.3.2

Choose a tag to compare

In addition to the below changes these patches include fixes to the kv-cache-manager dependency

What's Changed

New Contributors

Full Changelog: v0.2.1...v0.3.2

v0.3.2-rc.1

03 Oct 18:51
v0.3.2-rc.1

Choose a tag to compare

v0.3.2-rc.1 Pre-release
Pre-release

Small fixes to kv-cache-manager required updated dependencies

v0.3.1

29 Sep 20:52
v0.3.1

Choose a tag to compare

Small patch updating kv cache manager dependency to include support in v0.3

See the full v0.3 changes here:

What's Changed

New Contributors

Full Changelog: v0.2.1...v0.3.1

v0.3.1-rc.1

26 Sep 02:03
v0.3.1-rc.1

Choose a tag to compare

v0.3.1-rc.1 Pre-release
Pre-release

Full Changelog: v0.3.0...v0.3.1-rc.1

v0.3.0

24 Sep 19:30
v0.3.0
1889019

Choose a tag to compare

Image pull example: docker pull ghcr.io/llm-d/llm-d-inference-scheduler:v0.3.0

What's Changed

New Contributors

Full Changelog: v0.2.1...v0.3.0

v0.3.0-rc.2

17 Sep 11:06
v0.3.0-rc.2
1889019

Choose a tag to compare

v0.3.0-rc.2 Pre-release
Pre-release

Image is available here: docker pull ghcr.io/llm-d/llm-d-inference-scheduler:v0.3.0-rc.2

v0.3.0-rc.1

05 Sep 07:49
v0.3.0-rc.1
92619ae

Choose a tag to compare

v0.3.0-rc.1 Pre-release
Pre-release

Image is available here: docker pull ghcr.io/llm-d/llm-d-inference-scheduler:v0.3.0-rc.1

What's Changed

New Contributors

Full Changelog: v0.2.0-rc.2...v0.3.0-rc.1

v0.2.1

24 Jul 05:58
v0.2.1
c97e2ea

Choose a tag to compare

Image is available here: docker pull ghcr.io/llm-d/llm-d-inference-scheduler:v0.2.1

This patch fix is intended to resolve a few bug fixes.
Justification & breakdown here: kubernetes-sigs/gateway-api-inference-extension#1215

Full Changelog: v0.2.0...v0.2.1