- Update tmux version from 3.4 to 3.5a (#3000)
- Enable per-user UID/GID set for containers via user creation and update GraphQL APIs (#3352)
- Update SDK and CLI to support per-user UID/GID configuration (#3361)
- Add timeout configuration for Docker image push (#3412)
- Add configurable directory permission for vfolders to support mount vfolders on customized UID/GID containers (#3510)
- Add new Pydantic handling api decorator for Request/Response validation (#3511)
- Add force delete API for VFolder that bypasses the trash bin (#3546)
- Add storage-watcher API to delete VFolders with elevated permissions (#3548)
- Add skeleton vFolder handler Interface of manager (#3493)
- Add reject middleware for web security (#2937)
- Optimize the route selection in App Proxy using
random.choices()
based on the native C implementation in CPython (#3199) - Fix GQL
vfolder_mounts
field resolver ofcompute_session
type (#3461) - Fix empty tag image scan error in docker registry. (#3513)
- Fixed "permission denied" error by creating the
grafana-data
directory with 757 permissions (#3570) - Fix Broken CSS by allowing
unsafe-inline
content security policy. (#3572) - Updated route pattern to allow any path ending with "login/" for POST requests to
/pipeline/{path:.*login/$}
(#3574) - Fix vfolder delete SDK function to call 'delete by id' API rather than 'delete by name' API (#3581)
- Check intrinsic time files exist before mount (#3583)
- Fixed to ensure unique values in the mount list of the compute session (#3593)
- The installer changes from downloading the checksum files for each package separately to receiving a consolidated checksum file and using them separately. (#3597)
- Remove foreign key constraint from
EndpointRow.image
column. (#3599)
No significant changes.
- Implement fine-grained seccomp profile managed by Backend.AI Agent. (#3019)
- Enable image rescanning by project. (#3237)
- Support auto-scaling of model services by observing proxy and app-specific metrics as configured by autoscaling rules bound to each endpoint (#3277)
- Deprecate the JWT-based
X-BackendAI-SSO
header to reduce complexity in authentication process for the pipeline service (#3353) - Add Grafana and Prometheus to Docker Compose (#3458)
- Integrate Pyroscope with Backend.AI (#3459)
- Update SDK to retrieve and use IDs for VFolder API operations instead of names (#3471)
- Refactor container registries' projects traversal logic of the image rescanning. (#2979)
- Fix regression of outdated
vfolder
GQL resolver. (#3047) - Fix image without metadata label not working (#3341)
- Enforce VFolder name length restriction through the API schema, not by the DB column constraint (#3363)
- Fix password based SSH login not working on sessions based on certain images (#3387)
- Fix purge API to allow deletion of owner-deleted VFolders by directly retrieving VFolders using the folder ID (#3388)
- Fix certain customized images not being pushed to registry properly (#3391)
- Fix formatting errors when logging exceptions raised from the current local process that did not pass our custom serialization step (#3410)
- Fix scanning and loading container images with no labels at all (
null
in the image manifests) (#3411) - Fix missing CPU architecture name lookup in
LocalRegistry
to directly scan and load container images from the local Docker daemon in dev setups (#3420) - Utilization idle checker computes kernel resource usages correctly (#3442)
- Filter vfolders by status before initiating a vfolder deletion task (#3446)
- Fix a mis-implementation that has prevented using UUIDs to indicate an exact vfolder when invoking the vfolder REST API (#3451)
- Fix the required state output logic in the openopi reference documentation correctly (#3460)
- Raise exception if multiple VFolders exist in decorator (#3465)
- Deprecate non relay container registry GQL explicitly. (#3231)
- Upgrade pantsbuild from 2.21 to 2.23, replacing the scie plugin with the intrinsic pex's scie build support (#3377)
- Fix broken session CLI commands due to invalid initialization of
ComputeSession
. (#3222) - Fix a regression that modifying a model service endpoint's replica count always sets it to 1 regardless of the user input (#3337)
- Fix the commit message format when assigning the PR number to an anonymous news fragment (#3309)
- Add
PREPARED
status for compute sessions and kernels to indicate completion of pre-creation tasks such as image pull (#2647) - Add new
CREATING
session status to represent container creation phase, and redefinePREPARING
status to specifically indicate pre-container preparation phases (#3114)
- Migrate container registry config storage from
Etcd
toPostgreSQL
(#1917) - Add background task that reports manager DB status. (#2566)
- Add manager DB stat API compatible with Prometheus. (#2567)
- Allow regular users to assign agent manually if
hide-agent
configuration is disabled (#2614) - Implement ID-based client workflow to ContainerRegistry API. (#2615)
- Rafactor Base ContainerRegistry's
scan_tag
and implementMEDIA_TYPE_DOCKER_MANIFEST
type handling. (#2620) - Support GitHub Container Registry. (#2621)
- Support GitLab Container Registry. (#2622)
- Support AWS ECR Public Container Registry. (#2623)
- Support AWS ECR Private Container Registry. (#2624)
- Replace rescan command's
--local
flag with local container registry record. (#2665) - Add public API webapp to allow externel services to query insensitive metrics (#2695)
- Add
project
column to the images table and refactoringImageRef
logic. (#2707) - Check if agent has the required image before creating compute kernels (#2721)
- Introduce network feature (#2726)
- Support docker image manifest v2 schema1. (#2815)
- Support setting health check interval for model service. (#2825)
- Add session status checker GQL mutation. (#2836)
- Add
filter
andorder
parameters to Group GQL Relay API. (#2863) - Add GQL
agent
type and resolver (#2873) - Add
vast_use_auth_token
config to utilize VASTData API token optionally. (#2901) - Use a valid value for the
id
field in the GQL schema query resolver forContainerRegistry
. (#2908) - Add GQL Relay domain query schema and resolver (#2934)
- Add
namespace
,base_image_name
,tags
andversion
fields to GQL image schema (#2939) - Allow container user to join extra Linux groups. (#2944)
- Add filtering and ordering by
open_to_public
field in endpoint queries (#2954) - Hide FastTrack (
pipeline
) menu by default on installation byinstall-dev.sh
script. (#3010) - Support batch session timeout. (#3066)
- Add an
show_non_installed_images
option to show all images regardless of installation on environment select section in session/service launcher page. (#3124) - Allow destroying sessions in
PULLING
status for all users (#3128) - Show live stats from inference framework when supported (#3133)
- Allow specifying a full shell script string in
start_command
ofmodel-definition.yaml
while preserving shell variable expansions to allow access to environment variables in service definitions (#3248) - Rename
endpoint.desired_session_count
toendpoint.replicas
(#3257) - Add several commonly used GPU configuration environment variables defined in containers by default:
GPU_TYPE
,GPU_COUNT
,GPU_CONFIG
,GPU_MODEL_NAME
andTF_GPU_MEMORY_ALLOC
(#3275) - Populate
BACKEND_MODEL_NAME
environment variable automatically on inference session (#3281) - Fix container cleanup process failing with error
AttributeError: 'DockerKernel' object has no attribute 'network_driver'
(#3286)
- Convert VFolder deletion from blocking response to event-driven pattern (#3063)
- Explicitly wait for readiness of the Docker daemon and the compose stack before pouring database fixtures in
install-dev.sh
for when installing at the provisioning stage of Codespaces and integration tests in CI. (#2378) - Fix silent failure of
DockerAgent.push_image()
,DockerAgent.pull_image()
. (#2572) - Fix missing notification of cancellation or failure of background tasks when shutting down the server (#2579)
- Add missing implementation of wsproxy and manager CLI's log-level customization options (#2698)
- Add missing batch execution call after session starts (#2884)
- Fix a regression of the unicode-aware slug update that prevented creation of dot-prefixed (automount) vfolders (#2892)
- Fix invalid image format log spam in Agent (#2894)
- Fix wrong creation of
raw_configs
in_create_kernels_in_one_agent
(#2896) - Disallow
None
id encoding inAsyncNode.to_global_id()
. (#2898) - Assign valid value to
id
field inContainerRegistryNode
GQL schema query resolver. (#2899) - Update vast quota rather than raise error when quota exists. (#2900)
- Calculate correct expiration time of VAST auth token and add
vast_force_login
config to enable login before every REST API call (#2911) - Update Dellemc OneFS storage backend to correctly initialize volume object and wrong http request arguments (#2918)
- Fix
modify_endpoint()
mutation to handle emptyJSONString
properly for environment variables (#2922) - Fix
order
GQL query argument parser ofgroup_nodes
(#2927) - Set the
postgres_readonly
flag tofalse
when begin generic sessions (#2946) - Fix wrong container registry migration script. (#2949)
- Let GPFS client keep polling when GPFS job is running (#2961)
- Handle
IndexError
when parse string toBinarySize
(#2962) - Handle error when convert
shmem
string value intoBinarySize
(#2972) - Make image, container_registry table's
project
column nullable and improve container registry storage config migration script. (#2978) - Fix a wrong parameter when call 'recalc_agent_resource_occupancy()' (#2982)
- Allow the
modify_compute_session
mutation works withoutpriority
field in input argument and let the mutation validatesname
value (#2985) - Fix wrong password limit in container registry migration script. (#2986)
- Fix
architecture
condition not applied when queryimages
rows (#2989) - Deprecate
project_id
GQL argument and add nullablescope_id
GQL argument (#2991) - Strengthen join condition between kernels and images to prevent incorrect matches (#2993)
- Enable session commit to different registry, project. (#2997)
- Wrong field reference in
ImageNode
resolver (#3002) - Fix obsolete logic of
untag()
ofHarborRegistry_v2
. (#3004) - Fix
Agent.compute_containers
GraphQL field by adding missing resolver (#3011) - Fix
Agent
GQL Regression error. (#3013) - Fix
backend.ai apps
command's faulty argument handling logic. (#3015) - Check Vast data quota with a given name exists before creating quota and change default value of
force_login
config to true (#3023) - Fix model service traffics not distributed equally to every sessions when there are 10 or more replicas (#3027)
- Fix the TUI installer to make the install path always visible (#3029)
- Prevent redis password from being logged. (#3031)
- Fix
get_logs_from_agent()
to raiseInstanceNotFound
exception for kernels not assigned to agents (#3032) - Fix regression of
ComputeContainer
GraphQL queries due to newly introduced relationship fields (#3042) - Fix regression of the
AgentSummary
resolver caused by an incorrectbatch_load_func
assignment. (#3045) - Fix regression of
LegacyComputeSession
GraphQL queries. (#3046) - Include missing legacy logging module in the pex. (#3054)
- Change the name of deleted vfolders with a timestamp suffix when sending them to DELETE_ONGOING status to allow reuse of the vfolder name, for cases when actual deletion takes a long time (#3061)
- Fix model service not routing traffics based on traffic ratio (#3075)
- Fix the broken
ComputeContainer.batch_load_detail
due to the misuse ofselectinload
as follow-up to #3042 (#3078) - Fix session
status_info
not being updated correctly when batch executions fail, ensuring failed batch execution states are properly reflected in the sessions table (#3085) - agent not loading
krunner-extractor
image when Docker instance does not support loading XZ compressed images (#3101) - Fix outdated image string join logic in
ImageRow.image_ref
. (#3125) - Allow admins to delete other users' vfolders by enabling vfolder fetching for precondition checks (#3137)
- Fix Libc version not detected on unlabeled images when image has custom entrypoint set (#3173)
- Fix service not started when
[logging].rotation-size
config is set (#3174) - Allow purging vfolders by enabling name-based queries of deleted VFolders (#3176)
- Fix the issue where the value of occupying slots abnormally multiplies when creating a compute session (#3186)
- Add missing
extra
field toContainerRegistryNode
GQL query, mutations. (#3208) - Fix purge functionality that deletes VFolder records by allowing admins to query other users' VFolders (#3223)
- Fix CLI test failures caused by
yarl.URL._val
type change. (#3235) - Prevent vfolder
request-download
API from accessing host filesystem. (#3241) - Fix
1d42c726d8a3
revision execution failing (#3254) - Ensure the string formatting of BinarySize values containing subtle fractions to be floating point numbers (instead of scientific notations) always (#3272)
- Fix invalid API version checks in the session creation API of Manager. (#3291)
- Update the package installation documentation to include instructions on adding the manager's RPC key pair. (#2052)
- Upgrade the base CPython version from 3.12.6 to 3.12.8 (#3302)
- Add support for optional payload encryption in the client SDK and CLI as a follow-up to #484 (#493)
- Allow unicode characters in project(user group) name and domain name. (#1663)
- Improve exception logging stability by pre-formatting exception objects instead of pickling/unpickling them (#1759)
- Add new API to create new image from live session (#1973)
- Clear
error_logs
records in theclear-history
command (#1989) - Introduce
mgr schema dump-history
andmgr schema apply-missing-revisions
command to ease the major upgrade involving deviation of database migration histories (#2002) - Update
image forget
CLI command to untag image from registry before forgetting it from the database (#2010) - Update
etcd-client-py
to 0.3.0 (#2014) - Allow self-ssh in single-node single-container compute sessions. (#2032)
- Prevent deleting mounted folders. (#2036)
- Allow agent to report its internal registry snapshot via UNIX domain socket server (#2038)
- New redis client (experimental) (#2041)
- Expose user info to environment variables (#2043)
- Introduce the
rolling_count
GraphQL field to provide the current rate limit counter for a keypair within the designated time window slice (#2050) - Deprecate the reliance on HTTP cookies for authenticating the pipeline service, switching to the use of HTTP headers instead (#2051)
- Allow user to explicitly set filename of model definition YAML (#2063)
- Add the
backend.ai plugin scan
command to inspect the plugin scan results from various entrypoint sources (#2070) - Bring back etcetra-backed Etcd as an option for ditributed lock backend (#2079)
- Enable distribute-lock configuration (#2080)
- Cache volume objects in
RootContext.get_volume
(#2081) - Revamp images GQL query by changing image filtering from flag-based to feature set-based and add
aliases
field to customized image GQL schema (#2136) - Added missing fields for
keypair_resource_policy
in client-py, models, etc. (#2146) - Add parameters to
check-presets
SDK function (#2153) - Add relay-aware
VirtualFolderNode
GQL Query (#2165) - Also perform basic model service validation process when updating model service via
ModifyEndpoint
(#2167) - Add support for mounting arbitrary VFolders on model service session (#2168)
- Add support for CentOS 8 based kernels (#2220)
- Clear zombie routes automatically (#2229)
- Add
scaling_group.agent_count_by_status
andscaling_group.agent_total_resource_slots_by_status
GQL fields to query the count and the resource allocation of agents that belong to a scaling group. (#2254) - Allow modifying model service session's environment variable setup (#2255)
- Add
endpoint.runtime_variant
column (#2256) - Add new API to show list of supported inference runtimes (#2258)
- Add support for model service provisioning without
model-definition.yaml
(#2260) - Allow superadmins to force-update session status through destroy API. (#2275)
- Add session status check & update API. (#2312)
- Add support for fetching container logs of a specific kernel. (#2364)
- Introduce Python native WSProxy (#2372)
- Implement scanning plugin entrypoints of external packages (#2377)
- Add
row_id
,type
andcontainer_registry
fields to theGroupNode
GQL schema. (#2409) - Add support for PureStorage RapidFiles Toolkit v2 (#2419)
- Add API that extends lifespan of webserver's login session. (#2456)
- Allow bulk association and disassociation of scaling groups with domains, user groups, and key pairs. (#2473)
- Match container's timezone to container host OS when available (#2503)
- Add a pre-setup configuration menu to the TUI installer to allow setting the public-facing address of Backend.AI components (#2541)
- Now Backend.AI can run arbitrary container images without Backend.AI-specific metadata labels by introducing good default values and replacing intrinsic kernel-runner binaries with statically built ones (#2582)
- Allow
Bearer
as valid token type on model service authentication (#2583) - Introduce automatic creation of a 'model-store' group upon inserting a new domain. (#2611)
- Add support for declaring custom description field for GraphQL
relay
edge types. (#2643) - Add an
enable_LLM_playground
option to show/hide the LLM playground tab on the serving page. (#2677) - Add
max_gaudi2_devices_per_container
config on webserver (#2685) - Add
max_atom_plus_device_per_container
config on webserver (#2686) - Introduce Account-manager component. (#2688)
-
- Add query depth limit config of GQL.
- Add page size limit config of GQL Connection.
- Set default page size of GQL Connection to 10. (#2709)
- Add compute session GQL Relay query schema. (#2711)
- Allow
DataLoaderManager
to get a loader function by function itself rather than function name. (#2717) - Allow filter and order in endpointlist gql request. (#2723)
- Add new vfolder API to update sharing status. (#2740)
- Avoid raising a type error even if a particular table in the toml file is empty, as long as the default value for all settings exists. (#2782)
- Add an explicit configuration
scaling-group-type
toagent.toml
so that the agent could distinguish whether itself belongs to an SFTP resource group or not (#2796) - Add per-session priority attributes and
ModifyComputeSession
GraphQL mutation to update session names and priorities (#2840) - Add dependee/dependent/graph ComputeSessionNode connection queries (#2844)
- Implement the priority-aware scheduler that applies to any arbitrary scheduler plugin (#2848)
- Add support for setting a timeout when pulling Docker images and upgrade aiodocker to version 0.23.0. (#2852)
- Enable robust DB connection handling by allowing
pool-pre-ping
setting. (#1991) - Enhance update mechanism of session & kernel status. (#2311)
- Remove database-level foreign key constraints in
vfolders.{user,group}
columns to decouple the timing of vfolder deletion and user/group deletion. (#2404) - Implement storage-host RBAC interface. (#2505)
- Optimize the query latency when fetching a large number of agents with stat metrics from Redis (#2558)
- Split out
ai.backend.logging
package from theai.backend.common
to improve reusability and reduce the startup time (i.e., import latencies) (#2760) - Avoid using
collections.OrderedDict
when not necessary in the manager API and client SDK (#2842)
- Remove no longer used
env-tester-{admin,user,user2}.sh
scripts and all references (#1956)
- Merge
kernels.role
intosessions.session_type
and check the image compatibility based on comparison with theai.backend.role
label (#1587) - Refactor
PendingSession
Scheduler intoPendingSession
scheduler andAgentSelector
, and replaceroundrobin
flag withAgentSelectionStrategy.RoundRobin
policy. (#1655) - Do not omit to update session's occupying resources to DB when a kernel starts. (#1832)
- Fix DDN command output handling when exceeding quotas. (#1901)
- Explicitly specify the storage-side UID/GID when creating qtrees in the NetApp storage backend (#1983)
- Sync mismatch between
kernels.session_name
andsessions.name
and fix session-rename API to updatesession_name
of sibling kernels atomically. (#1985) - Change function default arguments from mutable object to
None
. (#1986) - Revert some VFolder APIs response type to remove mismatch between
Content-Type
header and body. (#1988) - Upgrade pants to 2.21.0.dev4 for Python 3.12 support in their embedded pex/pip versions (#1998)
- Fix Graylog log adapter not working after upgrading to Python 3.12 (#1999)
- Fix
compute_container
GraphQL query resolver functions. (#2012) - Fix harbor v2 image scanner skipping importing rest of the artifacts when any of the item does not include tag (#2015)
- Let external log viewers display more accurate, meaningful stack frames of logger invocations. (#2019)
- Fix handling of undefined values in the ModifyImage GraphQL mutation. (#2028)
- Fix container commit not working on certain docker engine versions (#2040)
- add omitted request fetching from client to manager about deleting vfolder in trash bin. (#2042)
- Fix a buggy restriction on VFolder deletion due to wrong query condition (#2055)
- Fix wrong usage of dataloader in GQL group resolver. (#2056)
- Ensure that vfolders, including automount vfolders, are mounted during session creation only if their status is not set to "DEAD" (i.e., deleted). (#2059)
- Fix wrong calculation of resource usage (#2062)
- Fix VFolder file operation not working when user has been granted access to shared but deleted VFolder which has same name with the normal one (#2072)
- Add missing type argument in group query (#2073)
- Let the
backend.ai mgr clear-history
command clears session records as well as kernel records (#2077) - Fix
compute_session_list
GQL query not responding on an abundant amount of sessions (#2084) - Fix VFolder invitation not accepted when inviting VFolder shares name with already deleted one (#2093)
- Fix orphan model service routes being created (#2096)
- Fix initialization of the resource usage API's kernel-level usage aggregation (#2102)
- Fix model server starting on every kernels (including sub role kernels) on multi container infernce session (#2124)
- Add missing
commit_session_to_file
toOP_EXC
(#2127) - Fix wrong SQL query build for GQL Relay node (#2128)
- Pass ImageRef.canonical in
commit_session_to_file
(#2134) - Handle fileset-already-exists response of
create-filset
API request and make sure to wait between all GPFS job polling iterations (#2144) - Skip any possible redundant quota update requests when creating new quota (#2145)
-
- Fix error when calling
check_presets
Client SDK API with an invalidgroup
parameter - Rewrite Client SDK to access all APIConfig fields (#2152)
- Fix error when calling
- Ensure that all pending sessions are picked by schedulers (#2155)
- Fix user creation error when any model-store does not exists. (#2160)
- Fix buggy resolver of
model_card
GQL Query. (#2161) - Fix security vulnerability for
sudo_session_enabled
(#2162) - Rename
endpoints.model_mount_destiation
tomodel_mount_destination
(#2163) - Wait for real quota scope directory creation after Netapp
create_qtree()
call (#2170) - Fix wrong per-user concurrency calculation logic (#2175)
- Keep
sync_container_lifecycles()
bgtask alive in a loop. (#2178) - Fix missing check for group (project) vfolder count limit and error handling with an invalid
group
parameter (#2190) - Fix model service persisting on
degraded
status forever in rare chance when trying to delete the service (#2191) - Fix error when query or mutate GraphQL using
BigInt
field type (#2203) - Ensure that utilization idleness is checked after a set period. (#2205)
- Fix
backend.ai ssh
command execution when packaged as SCIE/PEX (#2226) -
- fix
endpoints
query not working when trying to loadimage_row.aliases
- fix
endpoints.status
reportingPROVISIONING
when its status is inDESTROYING
state (#2233)
- fix
- Fix GQL raising error when trying to resolve
endpoints.errors
field occasionally (#2236) - Fix
ZeroDivisionError
in volume usage calculation by returning 0% when volume capacity is zero (#2245) - Fix GraphQL to support query to non-installed images (#2250)
- Add missing
push_image
method implementation to Dummy Agent (#2253) - Rename no-op
access_key
parameter ofendpoint_list
GQL Query touser_uuid
(#2287) - Fix
ai.backend.service-ports
label syntax broken when image does not expose built-in service port (#2288) - Improve stability of
untag_image_from_registry
mutation (#2289) - SSH not working between kernels started with customized image (#2290)
- Invalid container memory capacity reported (#2291)
- Corrected an issue where the
resource_policy
field in the user model was incorrectly mapped todomain_name
. (#2314) - Omit to clean containerless kernels which are still creating its container. (#2317)
- Fix model service sessions created before 24.03.5 failing to spawn (#2318)
- Image commit not working (#2319)
- model service session scheduler (
scale_services()
) failing when sessions bound to active route already marked as terminated (#2320) - Fix container metric collection halted on systems with Cgroups v1 (#2321)
- Run batch execution after the batch session starts. (#2327)
- Add support for configuring
sync_container_lifecycles()
task. (#2338) - Fix mismatches between responses of
/services/_runtimes
and new model service creation input (#2371) - Fix incorrect check of values returned from docker stat API. (#2389)
- Shutdown agent properly by removing a code that waits a cancelled task. (#2392)
- Restrict GraphQL query to
user_nodes
field to requiresuperadmin
privilege (#2401) - Handle all possible exceptions when scheduling single node session so that the status information of pending session is not empty. (#2411)
- Utilize
ExtendedJSONEncoder
for error logging to handleUUID
objects inextra_data
(#2415) - Change outdated references in event module from
kernels
tosessions
. (#2421) - Upgrade
inquirer
to remove dependency on deprecateddistutils
, which breaks up execution of the scie builds (#2424) - Allow specific status of vfolders to query to purge. (#2429)
- Update the install-dev scripts to use
pnpm
instead ofnpm
to speed up installation and resolve some peculiar version resolution issues related to esbuild. (#2436) - Fix a packaging issue in the
backendai-webserver
scie executable due to missing explicit requirement of setuptools (#2454) - Improve pruning of non-physical filesystems when measuring disk usage in agents (#2460)
- Update the install-dev scripts to install
pnpm
if pnpm isn't installed. (#2472) - Improve error handling of initialization failures in the kernel runner (#2478)
- Fix
BACKEND_MODEL_NAME
environment always overwritten to model name specified at model definition (#2481) - Do not allow assigning preopen port which collides with image's own service port definition (#2482)
- Fix GET requests with queryparams defined in API spec occasionally throwing 400 Bad Request error (#2483)
- Check null value of user mutation by
Undefined
sentinel value rather thanNone
. (#2506) - Do null check on
groups.total_resource_slots
anddomains.total_resource_slots
value. (#2509) - Fix hearbeat processing failing when agent reports image with its name not compilant to Backend.AI's naming rule (#2516)
- Corrected a typo (
maanger
corrected tomanager
) in thecheck_status()
API response of the storage component (#2523) - Rename
images.image_filters
GQL Query argument toimages.image_types
(#2555) - Prevent session status from being transit to
PULLING
status event if image pull is not required (#2556) - Prevent other user's customized image from being listed as a response of
images
GQL query (#2557) - skip resolving malformed
ModelCard
GQL item (#2570) - Delete sessions DB records when purging project. (#2573)
- Initialize Redis connection pool objects with specified connection opts rather than ignoring them. (#2574)
- Fix
GET /func/folders/{folderName}
API returning string literal"null"
instead of null value onuser
andgroup
fields (#2584) - Update
GQLPrivilegeCheckMiddleware
to align with upstream changes ongraphql-core
package (#2598) - Robust type check when idle checker fetches utilization data. (#2601)
- Skip mounting zero-byte lxcfs files when lxcfs is activated to prevent crashes in session containers (#2604)
- Fix typo in minilang query field spec and column map. (#2605)
- Remove duplicate CPU quota arguments when creating containers (#2608)
- Increase
MAX_CMD_LEN
of dropbear to improve compatibility with PyCharm debugger (#2613) - Silence falsy Redis timeout warnings when retrying blocking commands if the timeout does not exceed the expected command timeout (#2632)
- Fix a regression of #2483 in the session-download API used by the
backend.ai ssh
command (#2635) - Implement missing
StrEnumType
handling inpopulate_fixture()
. (#2648) - Let
GET /resource/usage/period
request contain data in query parameter rather than JSON body. (#2661) - Allow sudo-enabled container users to ovewrite
/usr/bin/scp
and/usr/libexec/sftp-server
by unifying the intrinsic ssh binaries to use the mergeddropbearmulti
executable. (#2667) - Update
webserver
logout API to respond with HTTP 200 OK (#2681) - Fix WSProxy not properly handling WebSocket request sent from Firefox (#2684)
- Scan parent directory of created qtree to avoid creating quota on non-existing directory. (#2696)
- Fix
list_files
,get_fstab_contents
,get_performance_metric
andshared_vfolder_info
Python SDK function not working withValidationError
exception printed (#2706) - Resolve the issue where the vfolder id does not match in
list_shared_vfolders
. (#2731) - Handle OS Error when deleting vfolders. (#2741)
- Fix typo in Virtual-folder status update code. (#2742)
- Correct
msgpack
deserialization ofResourceSlot
. (#2754) - Fix regression error of
session create_from_template
command. (#2761) - Silence
model_
namespace warnings with pydantic-based model classes (#2765) - Change the initialization order of PackageContext to apply
target_path
correctly in the TUI installer (#2768) - Make the regex patterns to update configuration files working with multiline texts correctly in the TUI installer (#2771)
- Omit null parameter when call
usage-per-period
API. (#2777) - Delete vfolder invitation and permission rows when deleting vfolders. (#2780)
- Handle container port mismatch when creating kernel. (#2786)
- Explicitly set the protected service ports depending on the resource group type and the service types (#2797)
- Correct session status determiner function. (#2803)
- Fix
endpoint_list.total_count
GQL field returning incorrect value (#2805) - Fix
Service.create()
SDK method andservice create
CLI command not working withUnboundLocalError
exception (#2806) - Refresh expiration time of login session when login. (#2816)
- Fix
kernel_id
assignment for main kernel log retrieval (#2820) - Use a safer TLS version (v1.2) when creating SSL sockets in the logstash handler (#2827)
- Wrong count of concurrent compute sessions. (#2829)
- Create kernels with correct
scaling_group
value. (#2837) - Fix a regression in progress bar rendering of the TUI installer after upgrading the Textual library (#2867)
- Add note about installing client library with same version as server (#1976)
- Remove deprecated
version
from the docker compose YAML templates in package installation docs. (#2035) - Fix a typo in the
agent.toml
example of the package-based installation guide to have a duplicate double quote (#2069)
- Upgrade the base runtime (CPython) version from 3.11.6 to 3.12.2 (#1994)
- Upgrade aiodocker to v0.22.0 with minor bug fixes found by improved type annotations (#2339)
- Update the halfstack containers to point the latest stable versions (#2367)
- Upgrade aiodocker to 0.22.1 to fix error handling when trying to extract the log of non-existing containers (#2402)
- Upgrade the base CPython from 3.12.2 to 3.12.4 (#2449)
- Upgrade Python (3.12.4 -> 3.12.6) and common/tool dependencies to prepare for Python 3.13 and apply latest fixes (#2851)
- Wrap RPC authentication error to custom error for better logging. (#1970)
- Add
requested_slots
field to compute session GQL type. (#1984) - Allow
pydantic.BaseModel
as the API handler return schema. (#1987) - Fix incorrect version notation of GQL Field. (#1993)
- Add max_pending_session_count field to Keypair resource policy GQL schema (#2013)
- Handle container creation exception and start exception in separate try-except contexts. (#2316)
- Fix broken the workflow call for the action that auto-assigns PR numbers to news fragments (#2358)
- Finally stabilize the hanging tests in our CI due to docker-internal races on TCP port mappings to concurrently spawned fixture containers by introducing monotonically increasing TCP port numbers (#2379)
- Further improve the monotonic port allocation logic for the test containers to remove maximum concurrency restrictions (#2396)
- Add PEX, SCIE binary build configs for the plugin subsystem. (#2422)
-
- Add POST
/folders
API endpoints to replace DELETE APIs that require request body. - Allow
DELETE
requests to have body data. (#2571)
- Add POST
- Enhacne type hints for potential
None
arguments (#2580) - Add
ai.backend.manager.models.graphql
module for better code base management. (#2669) - Remove Scheduler related types that are no longer used. (#2705)
- Allow adding required GQL field argument to schema. (#2712)
- Upgrade
readthedocs
build environment to Python 3.12 (#2814)## 24.03.0rc1 (2024-03-31)
- Allw filter
compute_session
query byuser_id
. (#1805) - Allow overriding vfolder mount permissions in API calls and CLI commands to create new sessions, with addition of a generic parser of comma-separated "key=value" list for CLI args and API params (#1838)
- Always enable
ai.backend.accelerator.cuda_open
in the scie-based installer (#1966) - Use
config["pipeline"]["endpoint"]
as default value ofconfig["pipeline"]["frontend-endpoint"]
when not provided (#1972)
- Set single agent per kernel resource usage. (#1725)
- Abort container creation when duplicate container port definition exists (#1750)
- To update image metadata, check if the min/max values in
resource_limits
are undefined. (#1941) - Explicitly disable the user-site package detection in the krunner python commands to avoid potential conflicts with user-installed packages in
.local
directories (#1962) - Fix
caf54fcc17ab
migration to drop a primary key only if it exists and in589c764a18f1
migration, add missing table arguments. (#1963)
- Update docstrings in
ai.backend.client.request.Request:fetch()
andai.backend.client.request.FetchContextManager
as the support for synchronous context manager has been deprecated. (#1801) - Resize font-size of footer text in ethical ads in documentation hosted by read-the-docs (#1965)
- Only resize font-size of footer text in ethical ads not in title of content in documentation (#1967)
- Revert response type of service create API. (#1979)