I'm trying to train a PPO agent to control a single Crazyflie 2.X in gym-pybullet-drones (Physics.PYB). My action space is direct motor RPMs (4 motors), normalized to [-1,1] and then scaled to [0, max_rpm]. However, the drone becomes highly unstable immediately after takeoff: roll/pitch angles exceed 30° within a few steps and it crashes. The angular velocity seems to saturate at ~115°/s, but the tilt keeps increasing.
I have tried:
- Increasing PID gains (kp_roll/pitch up to 40, kd up to 15)
- Clipping the action to smaller ranges (e.g., [-0.5,0.5])
- Using Physics.DYN instead of PYB
- Reducing control frequency (from 48 Hz to 24 Hz)
- Adding a safety layer that reduces action when tilt > 10°
None of these prevented the crash. The only thing that works is to use a high-level controller (e.g., output position deltas and let a PID convert them to RPMs), but I would like to understand if direct RPM control is inherently unstable in this simulator.
My questions:
- Is it physically/algorithmically possible to achieve stable flight with direct RPM control using RL?
- If yes, what are the key tricks (action scaling, reward shaping, simulation parameters) that make it work?
- If not, is there a known reason (e.g., the physics engine's angular velocity clamping) that makes this approach infeasible?
Any insights or references would be greatly appreciated!
I'm trying to train a PPO agent to control a single Crazyflie 2.X in gym-pybullet-drones (Physics.PYB). My action space is direct motor RPMs (4 motors), normalized to [-1,1] and then scaled to [0, max_rpm]. However, the drone becomes highly unstable immediately after takeoff: roll/pitch angles exceed 30° within a few steps and it crashes. The angular velocity seems to saturate at ~115°/s, but the tilt keeps increasing.
I have tried:
None of these prevented the crash. The only thing that works is to use a high-level controller (e.g., output position deltas and let a PID convert them to RPMs), but I would like to understand if direct RPM control is inherently unstable in this simulator.
My questions:
Any insights or references would be greatly appreciated!