Adding Multidrone Deployment Feature with Zenoh-ROS2#74
Adding Multidrone Deployment Feature with Zenoh-ROS2#74N0OBSTUDENT wants to merge 55 commits intolearnsyslab:devfrom
Conversation
* Add pngs to installed files * Bump workflow pixi version * Fix cam config * Update lock file
…earnsyslab#65) - Fix UnboundErrors raised in finally blocks if creating a ros connector fails - Set SCIPY_ARRAY_API flags in scripts
-barrier is shared via socket -barrier setup in configuration -barrier could start by any deploy node, or standalone barrier process started by host
…lti-drone racing configuration Other dimension problem fixed within deploy
…_drone_racing into zenoh_multidrone
|
And here is the test cases I would like to do, but after we made the structure clear. |
ratheron
left a comment
There was a problem hiding this comment.
Generally nice implementation with proper safe guards. Thank you for the work!
There was a problem hiding this comment.
Please make sure the initial positions etc are equal between the levels
There was a problem hiding this comment.
Why is this necessary for this PR?
| self.taken_off = False | ||
|
|
||
|
|
||
| class RealMultiDroneRaceEnvClient(Env): |
There was a problem hiding this comment.
Why not inherit from RealRaceCoreEnv? Much of the functionality should be identical.
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| class CrazyflieWorker: |
There was a problem hiding this comment.
This class should also inherit RealRaceCoreEnv. Or better: This class is used by the real race core env. In that case, we should move it somewhere else.
There was a problem hiding this comment.
General comment on the ros stuff: I am not happy at all with having a ros_ws in a python package. Ros packages should go into whatever catkin workspace is already there on the system, not be included inside a Python package. How do other packages handle this (apart from crazyswarm)? Do we need these messages? Can't we build them out of existing ones?
There was a problem hiding this comment.
Now I am cloning the message from an external repository, following the style of mocap pacakge. We will be sharing the pacakge with simulation on https://github.com/rducrist/drone_racing_msgs
| except Exception as e: | ||
| logger.error(f"Host encountered error: {e}", exc_info=True) |
There was a problem hiding this comment.
Why do we need a bare except here? That's an absolute last resort. Finally triggers anyways
There was a problem hiding this comment.
The problem now is that multiple processes may throw tons of messages. I was trying to make the belonging of the message clearer
|
|
||
|
|
||
| def main( | ||
| config: str = "multi_level2.toml", controller: str | None = None, drone_rank: int | None = None |
There was a problem hiding this comment.
How do we ensure the same configs are selected on all clients? Do the settings need to match somehow, or are heterogeneous settings even desirable?
There was a problem hiding this comment.
Currently there is no way we could ensure that... I suppose the setting do not need to match perfectly for every client. Only the track and drone settings need to be the same
| if drone_rank is None: | ||
| raise ValueError("drone_rank must be specified") |
There was a problem hiding this comment.
None does not make sense in the first place. Why type hint a None if None is not possible? Why default to a value that throws an error?
|
|
||
| Observation space: | ||
| A dictionary containing the state of all drones in the race, mirroring | ||
| :class:`lsy_drone_racing.envs.multi_drone_race.MultiDroneRaceEnv`. |
There was a problem hiding this comment.
That's the sim class, and it should stay
| if self._comm is None: | ||
| self._init_comm() | ||
|
|
||
| current_pos, _, _, _ = self._get_all_drone_states() |
| self._send_state_update( | ||
| np.zeros(4 if self.control_mode == "attitude" else 13), stopped=True | ||
| ) | ||
| time.sleep(0.1) # allow the executor thread to flush the message before shutdown |
There was a problem hiding this comment.
Can we ensure this instead of just waiting? I.e. send something with flush=True?
There was a problem hiding this comment.
I will test whether the problem still exists rightnow. By then I was encountering issues with ros publisher closing too fast before the last message is sent
| @property | ||
| def node(self) -> Node: | ||
| """The underlying rclpy node.""" | ||
| return self._node |
There was a problem hiding this comment.
Why do we need the property then?
There was a problem hiding this comment.
That's my mistake. Changed it in N0OBSTUDENT@2ed0147
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| class CrazyflieWorker: |
There was a problem hiding this comment.
Could we turn this into one function instead? We only execute run() anyways
There was a problem hiding this comment.
It is doable, but the function will be as long as a tiolet paper roll
| class RealRaceHost: | ||
| """Base class for multi-drone race hosts. | ||
|
|
||
| Subclasses implement :meth:`load_config`, :meth:`connect_drones`, | ||
| :meth:`host_main_loop`, and :meth:`close` for a specific drone platform. | ||
| """ |
There was a problem hiding this comment.
Why do we subclass this? We're only running crazyflies for now. Also, if anything, this should then be a protocol
There was a problem hiding this comment.
My initial is that we could have different implementation of the Host with common communiation between the client and the host. So I had communicators initialized in RealRaceHost...
| def _suppress_shutdown_thread_errors(): | ||
| """Install a threading.excepthook that silences expected ROS2 shutdown exceptions. | ||
|
|
||
| Replaces the noisy default traceback for :class:`~rclpy.executors.ExternalShutdownException` | ||
| and :class:`KeyboardInterrupt` in background spin threads with a single DEBUG log line. | ||
| Any other uncaught thread exception still goes through the default handler. | ||
| """ | ||
| _original = threading.excepthook | ||
|
|
||
| def _hook(args: threading.ExceptHookArgs) -> None: | ||
| if args.exc_type in (ExternalShutdownException, KeyboardInterrupt) or ( | ||
| args.exc_type.__name__ == "RCLError" | ||
| ): | ||
| logger.debug(f"Thread '{args.thread.name}' stopped (shutdown)") | ||
| else: | ||
| _original(args) | ||
|
|
||
| threading.excepthook = _hook |
There was a problem hiding this comment.
This is fairly complex. Was this a major issue during testing on the real hardware, or why was this included?
51e8bc2 to
4a63d77
Compare
And use only one stop event for now
Now the setup_mocap.sh script clones an external repository including the message at https://github.com/rducrist/drone_racing_msgs
Co-authored-by: Copilot <copilot@github.com>
Multi-drone real race support via host-client ROS2 architecture
Host-client split: Replaces the monolithic
multi_deploy.pywith separatedeploy_host.pyand
deploy_client.pyscripts. The host manages all Crazyflie workers and race coordination;each client runs on its own compute unit and controls a single drone.
New environment classes: Adds
RealMultiDroneRaceEnvHostandRealMultiDroneRaceEnvClient—Gymnasium-compatible environments for the two sides of the host-client protocol.
ROS2 race communication layer: New
RaceCommNodeinros_race_comm.pyhandles allinter-process messaging (host-ready, race-start, client-state, clock calibration) using
lsy_race_msgsROS2 message types.Clock synchronisation: Clients calibrate their clock offset against the host via a ROS2
service before the race starts.
Multi-drone controller wrappers: Adds
AttitudeControllerandAttitudeMPCsubclassesthat slice the per-drone observation from the joint multi-drone obs dict using the drone's
rank.Unknown observations default to NaN: Considering that other drones could be out of mocap range, set the default observation to np.nan
Config updates:
multi_level2.tomlandmulti_level0.tomlupdated to fit the newarchitecture.