Releases: FilippoAiraldi/mpc-reinforcement-learning
Releases · FilippoAiraldi/mpc-reinforcement-learning
1.3.1
Changes
- implemented
mpcrl.util.geometry.ConvexPolytopeUniformSampler
which allows for uniformly sampling points from the interior or surface of n-dimensional convex polytopes - improvements to
mpcrl.wrappers.env.MonitorEpisodes
- modified
mpcril.util.control.cbf
and other methods about Control Barrier Function such that they do not need to return a casadi.Function - improvements to docs
v1.3.0
Changes
Major
- heavily improved documentation, but some portions are still work-in-progress
- updated to
csnlp==1.6.1
- improved
WarmStartStrategy
to provide initial conditions for non-warmstarted variables - implemented continuous-time, discrete-time and input constrained Control Barrier Functions in
mpcrl.util.control.cbf
,dcbf
, andiccbf
respectively - implemented
mpcrl.util.geometry.ConvexPolytopeUniformSampler
Minor
- added property
is_training
to agents - improvements to
mpcrl.util.control.dlqr
andlqr
methods - adjusted dependencies
- fixed tests, warnings, and deprecation messages
v1.2.1
Changes
- Updated dependency to
csnlp >= 1.6.0
- In case of additive exploration, implemented clipping based on the
env.action_space
(if this is agymnasium.spaces.Box
instance) - Now passing an
exploration
inLstdDpgAgent
is mandatory (otherwise, mathematically, the agent won't learn because the advantage function is always zero) LstdDpgAgent
now supportshessian_type = natural
- Implemented
wrappers.agents.Evaluate
that allows to periodically evaluate the performance of the training agent - Implemented new exploration class:
OrnsteinUhlenbeckExploration
- Implemented
bound_consistency
forGradientBasedOptimizer
instances (ensures values of parameters are clipped within bounds whenTrue
) - Minor computation simplifications in
LstdQLearningAgent
- Fixed some deprecation warnings from
numpy
andgymnasium
- Fixed tests and docstrings
v1.2.0.post1
Changes
Major
- implemented a base gradient-free agent
GlobOptLearningAgent
, and a corresponding Bayesion Optimization example based on BoTorch - implemented off-policy Q-learning (see method
train_offpolicy
) - implemented
WarmStartStrategy
(allows for finer control on how multistart MPC is fed random initial points)
Minor
- reworked internal structure of optimizers (introduced
BaseOptimizer
) - reworked internal structure of agents
- better sensitivity computations in Q-learning
- updated tests and docstrings
v1.1.9
Changes
Major
- improvements to
Agent
's hooking mechanism: now it uses adict
to keep track of callbacks instead of nested function wrappers - improved seeding
- allowing
csnlp.ScenarioBasedMpc
to be used as MPC byAgent
Minor
- improvements to internal files
- switched to prehooks
- improved readability of full hessian calculation in
LstdQLearningAgent
- improvements to imports and docstrings
- updated tests
v1.1.8
Changes
Major
- upgraded dependency to
csnlp==1.5.8
- reworked inner computations both in
LstdQlearningAgent
andLstdDpgAgent
for performance and adherence to theory - reworked inner workings of callbacks: now they are stored in an internal dict, so easier to debug
- fixed disrupting bug in the computations of the parameters' bounds for a constrained update
- implemented the
mpcrl.optim
sub-module: it contains different optimizers such as- Stochastic Gradient Descent
- Newton's Method
- Adam
- RMSprop
- moved parameters' constrained update solver to OSQP (QRQP was having scaling issues)
- removed
LearningRate
class - implemented
schedulers.Chain
, allowing to chain multiple schedulers into a single one
Minor
- added possibility to pass integer argument to
experience
. This will create a buffer with the specified size - improvements to
mpcrl.util.math
- improvements to
wrappers.agents.Log
(now uses lazy logging) - fixed bugs on
on_episode_end
andon_episode_start
callback hook - improvements to examples
v1.1.7
Changes
Major
- further reworked the QP problem that solves for the RL update
- added flag
remove_bounds_on_initial_action
to remove bounds on first action onQ(s,a)
to avoid LICQ problems
Minor
- added
use_last_action_on_fail
to let the agent use the last successful action if the MPC fails - small updates to examples
- small improvements to docstrings and testing
v1.1.6
Changes
Major
- removed support for Python 3.8, and added Python 3.11
- implemented
StepWiseExploration
strategy (a wrapper for exploration strategies) - added possibility of using first-order Q-learning
- improved DPG: removed hessian (was wrong), changed default linear solver to
csparse
due to the sparse nature of the system, and added ridge regression - Changed the QP update problem to take the hessian information directly into account in the hessian of the QP
Minor
- improved numba with caching and parallel computations where possible
- better seeding with recommended
np.random.SeedSequence
- simplified hooking mechanism with lambdas
- fixed bug in examples, where the reward was computed on the next state instead of the current state
- updated README and docstrings
- updated tests
v1.1.4
Changes
- moved and added some control methods, e.g.,
dlqr
andrk4
, to dedicated fileutil.control
- added flag to disable updates in learning-based agents during evaluation
- fixed bugs
- callback hooking in learning-based agents
- rollout experience never consolidated in DPG agent
- added additional check on the shape of learnable parameters to avoid unwanted broadcasting
- updated dependency to
csnlp==1.5.6
- fixed some tests
v1.1.3
Changes
Major
- now learnable parameters (i.e., as per
LearnableParameter
andLearnableParametersDict
) do not need to be 1D/flatten vectors, but also matrices - added
skip_first
toUpdateStrategy
to allow to skip the firstn
updates for, e.g., building enough experience before updating - updated to
csnlp==1.5.4
7andcasadi==3.6.0
Minor
- streamlined rollout operations in
LstdDpgAgent
- renamed attribute
include_last
toinclude_latest
inExperienceReplay
- better type hints and tests