Skip to content

Adding Multidrone Deployment Feature with Zenoh-ROS2#74

Draft
N0OBSTUDENT wants to merge 55 commits intolearnsyslab:devfrom
N0OBSTUDENT:zenoh_ros2_multidrone
Draft

Adding Multidrone Deployment Feature with Zenoh-ROS2#74
N0OBSTUDENT wants to merge 55 commits intolearnsyslab:devfrom
N0OBSTUDENT:zenoh_ros2_multidrone

Conversation

@N0OBSTUDENT
Copy link
Copy Markdown
Collaborator

@N0OBSTUDENT N0OBSTUDENT commented Mar 27, 2026

Multi-drone real race support via host-client ROS2 architecture

  • Host-client split: Replaces the monolithic multi_deploy.py with separate deploy_host.py
    and deploy_client.py scripts. The host manages all Crazyflie workers and race coordination;
    each client runs on its own compute unit and controls a single drone.

  • New environment classes: Adds RealMultiDroneRaceEnvHost and RealMultiDroneRaceEnvClient
    Gymnasium-compatible environments for the two sides of the host-client protocol.

  • ROS2 race communication layer: New RaceCommNode in ros_race_comm.py handles all
    inter-process messaging (host-ready, race-start, client-state, clock calibration) using
    lsy_race_msgs ROS2 message types.

  • Clock synchronisation: Clients calibrate their clock offset against the host via a ROS2
    service before the race starts.

  • Multi-drone controller wrappers: Adds AttitudeController and AttitudeMPC subclasses
    that slice the per-drone observation from the joint multi-drone obs dict using the drone's rank.

  • Unknown observations default to NaN: Considering that other drones could be out of mocap range, set the default observation to np.nan

  • Config updates: multi_level2.toml and multi_level0.toml updated to fit the new
    architecture.

ratheron and others added 22 commits December 20, 2025 00:01
* Add pngs to installed files

* Bump workflow pixi version

* Fix cam config

* Update lock file
…earnsyslab#65)

- Fix UnboundErrors raised in finally blocks if creating a ros connector fails
- Set SCIPY_ARRAY_API flags in scripts
-barrier is shared via socket
-barrier setup in configuration
-barrier could start by any deploy node, or standalone barrier process started by host
…lti-drone racing configuration

Other dimension problem fixed within deploy
@N0OBSTUDENT N0OBSTUDENT marked this pull request as draft March 29, 2026 14:26
@N0OBSTUDENT
Copy link
Copy Markdown
Collaborator Author

And here is the test cases I would like to do, but after we made the structure clear.
https://docs.google.com/spreadsheets/d/1a_WvakzvkbI9FE5ZeVwlMI_acAK3nL8hTaW2EinpsPs/edit?usp=sharing

@ratheron ratheron added the enhancement New feature or request label Apr 3, 2026
Copy link
Copy Markdown
Collaborator

@ratheron ratheron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally nice implementation with proper safe guards. Thank you for the work!

Comment thread config/multi_level2.toml
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure the initial positions etc are equal between the levels

Comment thread lsy_drone_racing/control/attitude_controller_multi.py Outdated
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary for this PR?

Comment thread lsy_drone_racing/control/attitude_mpc_multi.py Outdated
Comment thread lsy_drone_racing/control/attitude_controller_multi.py
Comment thread lsy_drone_racing/envs/real_race_env_client.py Outdated
Comment thread lsy_drone_racing/envs/real_race_env_client.py Outdated
self.taken_off = False


class RealMultiDroneRaceEnvClient(Env):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not inherit from RealRaceCoreEnv? Much of the functionality should be identical.

logger = logging.getLogger(__name__)


class CrazyflieWorker:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class should also inherit RealRaceCoreEnv. Or better: This class is used by the real race core env. In that case, we should move it somewhere else.

Comment thread lsy_drone_racing/envs/real_race_host.py Outdated
Copy link
Copy Markdown
Collaborator

@amacati amacati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments on the surrounding changes.

TODO @amacati review ros_race_comm, real_race_env_client, real_race_host.

Comment thread lsy_drone_racing/control/attitude_controller_multi.py Outdated
Comment thread lsy_drone_racing/control/attitude_mpc_multi.py Outdated
Comment thread lsy_drone_racing/control/attitude_rl_multi.py Outdated
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comment on the ros stuff: I am not happy at all with having a ros_ws in a python package. Ros packages should go into whatever catkin workspace is already there on the system, not be included inside a Python package. How do other packages handle this (apart from crazyswarm)? Do we need these messages? Can't we build them out of existing ones?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I am cloning the message from an external repository, following the style of mocap pacakge. We will be sharing the pacakge with simulation on https://github.com/rducrist/drone_racing_msgs

Comment thread scripts/deploy_host.py
Comment on lines +48 to +49
except Exception as e:
logger.error(f"Host encountered error: {e}", exc_info=True)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a bare except here? That's an absolute last resort. Finally triggers anyways

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem now is that multiple processes may throw tons of messages. I was trying to make the belonging of the message clearer

Comment thread scripts/deploy_client.py


def main(
config: str = "multi_level2.toml", controller: str | None = None, drone_rank: int | None = None
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we ensure the same configs are selected on all clients? Do the settings need to match somehow, or are heterogeneous settings even desirable?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently there is no way we could ensure that... I suppose the setting do not need to match perfectly for every client. Only the track and drone settings need to be the same

Comment thread scripts/deploy_client.py Outdated
Comment on lines +43 to +44
if drone_rank is None:
raise ValueError("drone_rank must be specified")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None does not make sense in the first place. Why type hint a None if None is not possible? Why default to a value that throws an error?

Comment thread lsy_drone_racing/envs/real_race_env_client.py Outdated

Observation space:
A dictionary containing the state of all drones in the race, mirroring
:class:`lsy_drone_racing.envs.multi_drone_race.MultiDroneRaceEnv`.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the sim class, and it should stay

if self._comm is None:
self._init_comm()

current_pos, _, _, _ = self._get_all_drone_states()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Avoid getter names

Comment thread lsy_drone_racing/envs/real_race_env_client.py Outdated
Comment thread lsy_drone_racing/envs/real_race_env_client.py Outdated
Comment thread lsy_drone_racing/envs/real_race_env_client.py Outdated
self._send_state_update(
np.zeros(4 if self.control_mode == "attitude" else 13), stopped=True
)
time.sleep(0.1) # allow the executor thread to flush the message before shutdown
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we ensure this instead of just waiting? I.e. send something with flush=True?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will test whether the problem still exists rightnow. By then I was encountering issues with ros publisher closing too fast before the last message is sent

Comment thread lsy_drone_racing/utils/ros_race_comm.py Outdated
Comment on lines +141 to +144
@property
def node(self) -> Node:
"""The underlying rclpy node."""
return self._node
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the property then?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's my mistake. Changed it in N0OBSTUDENT@2ed0147

Comment thread lsy_drone_racing/envs/real_race_host.py Outdated
logger = logging.getLogger(__name__)


class CrazyflieWorker:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we turn this into one function instead? We only execute run() anyways

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is doable, but the function will be as long as a tiolet paper roll

Comment on lines +441 to +446
class RealRaceHost:
"""Base class for multi-drone race hosts.

Subclasses implement :meth:`load_config`, :meth:`connect_drones`,
:meth:`host_main_loop`, and :meth:`close` for a specific drone platform.
"""
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we subclass this? We're only running crazyflies for now. Also, if anything, this should then be a protocol

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial is that we could have different implementation of the Host with common communiation between the client and the host. So I had communicators initialized in RealRaceHost...

Comment on lines +28 to +45
def _suppress_shutdown_thread_errors():
"""Install a threading.excepthook that silences expected ROS2 shutdown exceptions.

Replaces the noisy default traceback for :class:`~rclpy.executors.ExternalShutdownException`
and :class:`KeyboardInterrupt` in background spin threads with a single DEBUG log line.
Any other uncaught thread exception still goes through the default handler.
"""
_original = threading.excepthook

def _hook(args: threading.ExceptHookArgs) -> None:
if args.exc_type in (ExternalShutdownException, KeyboardInterrupt) or (
args.exc_type.__name__ == "RCLError"
):
logger.debug(f"Thread '{args.thread.name}' stopped (shutdown)")
else:
_original(args)

threading.excepthook = _hook
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fairly complex. Was this a major issue during testing on the real hardware, or why was this included?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants