Skip to content

Releases: FilippoAiraldi/mpc-reinforcement-learning

1.3.1

15 Nov 21:49
Compare
Choose a tag to compare

Changes

  • implemented mpcrl.util.geometry.ConvexPolytopeUniformSampler which allows for uniformly sampling points from the interior or surface of n-dimensional convex polytopes
  • improvements to mpcrl.wrappers.env.MonitorEpisodes
  • modified mpcril.util.control.cbf and other methods about Control Barrier Function such that they do not need to return a casadi.Function
  • improvements to docs

v1.3.0

18 Oct 08:04
Compare
Choose a tag to compare

Changes

Major

  • heavily improved documentation, but some portions are still work-in-progress
  • updated to csnlp==1.6.1
  • improved WarmStartStrategy to provide initial conditions for non-warmstarted variables
  • implemented continuous-time, discrete-time and input constrained Control Barrier Functions in mpcrl.util.control.cbf, dcbf, and iccbf respectively
  • implemented mpcrl.util.geometry.ConvexPolytopeUniformSampler

Minor

  • added property is_training to agents
  • improvements to mpcrl.util.control.dlqr and lqr methods
  • adjusted dependencies
  • fixed tests, warnings, and deprecation messages

v1.2.1

17 Jul 14:49
Compare
Choose a tag to compare

Changes

  • Updated dependency to csnlp >= 1.6.0
  • In case of additive exploration, implemented clipping based on the env.action_space (if this is a gymnasium.spaces.Box instance)
  • Now passing an exploration in LstdDpgAgent is mandatory (otherwise, mathematically, the agent won't learn because the advantage function is always zero)
  • LstdDpgAgent now supports hessian_type = natural
  • Implemented wrappers.agents.Evaluate that allows to periodically evaluate the performance of the training agent
  • Implemented new exploration class: OrnsteinUhlenbeckExploration
  • Implemented bound_consistency for GradientBasedOptimizer instances (ensures values of parameters are clipped within bounds when True)
  • Minor computation simplifications in LstdQLearningAgent
  • Fixed some deprecation warnings from numpy and gymnasium
  • Fixed tests and docstrings

v1.2.0.post1

11 Apr 14:38
Compare
Choose a tag to compare

Changes

Major

  • implemented a base gradient-free agent GlobOptLearningAgent, and a corresponding Bayesion Optimization example based on BoTorch
  • implemented off-policy Q-learning (see method train_offpolicy)
  • implemented WarmStartStrategy (allows for finer control on how multistart MPC is fed random initial points)

Minor

  • reworked internal structure of optimizers (introduced BaseOptimizer)
  • reworked internal structure of agents
  • better sensitivity computations in Q-learning
  • updated tests and docstrings

v1.1.9

29 Dec 16:30
Compare
Choose a tag to compare

Changes

Major

  • improvements to Agent's hooking mechanism: now it uses a dict to keep track of callbacks instead of nested function wrappers
  • improved seeding
  • allowing csnlp.ScenarioBasedMpc to be used as MPC by Agent

Minor

  • improvements to internal files
  • switched to prehooks
  • improved readability of full hessian calculation in LstdQLearningAgent
  • improvements to imports and docstrings
  • updated tests

v1.1.8

26 Oct 16:52
Compare
Choose a tag to compare

Changes

Major

  • upgraded dependency to csnlp==1.5.8
  • reworked inner computations both in LstdQlearningAgent and LstdDpgAgent for performance and adherence to theory
  • reworked inner workings of callbacks: now they are stored in an internal dict, so easier to debug
  • fixed disrupting bug in the computations of the parameters' bounds for a constrained update
  • implemented the mpcrl.optim sub-module: it contains different optimizers such as
    • Stochastic Gradient Descent
    • Newton's Method
    • Adam
    • RMSprop
  • moved parameters' constrained update solver to OSQP (QRQP was having scaling issues)
  • removed LearningRate class
  • implemented schedulers.Chain, allowing to chain multiple schedulers into a single one

Minor

  • added possibility to pass integer argument to experience. This will create a buffer with the specified size
  • improvements to mpcrl.util.math
  • improvements to wrappers.agents.Log (now uses lazy logging)
  • fixed bugs on on_episode_end and on_episode_start callback hook
  • improvements to examples

v1.1.7

13 Sep 09:03
Compare
Choose a tag to compare

Changes

Major

  • further reworked the QP problem that solves for the RL update
  • added flag remove_bounds_on_initial_action to remove bounds on first action on Q(s,a) to avoid LICQ problems

Minor

  • added use_last_action_on_fail to let the agent use the last successful action if the MPC fails
  • small updates to examples
  • small improvements to docstrings and testing

v1.1.6

29 Aug 09:04
Compare
Choose a tag to compare

Changes

Major

  • removed support for Python 3.8, and added Python 3.11
  • implemented StepWiseExploration strategy (a wrapper for exploration strategies)
  • added possibility of using first-order Q-learning
  • improved DPG: removed hessian (was wrong), changed default linear solver to csparse due to the sparse nature of the system, and added ridge regression
  • Changed the QP update problem to take the hessian information directly into account in the hessian of the QP

Minor

  • improved numba with caching and parallel computations where possible
  • better seeding with recommended np.random.SeedSequence
  • simplified hooking mechanism with lambdas
  • fixed bug in examples, where the reward was computed on the next state instead of the current state
  • updated README and docstrings
  • updated tests

v1.1.4

20 Jun 15:37
Compare
Choose a tag to compare

Changes

  • moved and added some control methods, e.g., dlqr and rk4, to dedicated file util.control
  • added flag to disable updates in learning-based agents during evaluation
  • fixed bugs
    • callback hooking in learning-based agents
    • rollout experience never consolidated in DPG agent
  • added additional check on the shape of learnable parameters to avoid unwanted broadcasting
  • updated dependency to csnlp==1.5.6
  • fixed some tests

v1.1.3

05 Apr 18:18
Compare
Choose a tag to compare

Changes

Major

  • now learnable parameters (i.e., as per LearnableParameter and LearnableParametersDict) do not need to be 1D/flatten vectors, but also matrices
  • added skip_first to UpdateStrategy to allow to skip the first n updates for, e.g., building enough experience before updating
  • updated to csnlp==1.5.4 7and casadi==3.6.0

Minor

  • streamlined rollout operations in LstdDpgAgent
  • renamed attribute include_last to include_latest in ExperienceReplay
  • better type hints and tests