Skip to content

Bridge/cross repeater implementation #454

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: dev
Choose a base branch
from

Conversation

jbrazio
Copy link
Contributor

@jbrazio jbrazio commented Jun 28, 2025

Overview

This PR offers a solution for bridging two meshcore networks, enabling connectivity between:

  1. Devices operating on different frequencies (433/868) or same band with different settings;
  2. Connecting a local community to a distant repeater by using one high gain Yagi and one omnidirectional;
  3. Sending the origin signal further away or in a completely different direction by using two high gain Yagi antennas;
  4. .. add your own use case here.

Essentially, we create a bridge by using two repeaters interconnected via a serial port. The serial method was chosen because it ensures full compliance with an off-grid protocol by not relying on external networks or internet-based protocols, such as MQTT, that are prone to failure in emergency situations, while still opening up many new possibilities without compromising the RF only principle.

All types of traffic appear to be functioning properly, including paths, pings, administrative requests and direct messaging between clients on different bands. However, it is important to note that this solution should not be considered production-ready and requires further testing.

Configuration

To enable, open variants/your_board_here/platformio.ini, edit the Repeaterconfiguration and add the following defines:

  -D BRIDGE_OVER_SERIAL=Serial2
  -D BRIDGE_OVER_SERIAL_RX=34
  -D BRIDGE_OVER_SERIAL_TX=25

You can download pre-built test versions of this PR.
Flash them using the option Custom Firmware from the official flasher.

Pinout

- ------+            +------ -
B    RX | ---    --- | RX    B
O       |     \/     |       O    The TX on one board must be
A       |     /\     |       A    connected to the RX on the other board !
R    TX | ---    --- | TX    R         
D       |            |       D
        |            |
A   GND | ---------- | GND   B
- ------+            +------ -

Suggested IO ports to be used by each type of board:

  • Heltec v3: [Serial2, tx=6, rx=5]
  • Waveshare rp2040 LoRa: [Serial2, tx=8, rx=9]
  • LilyGo TLora V2/1.1.6: [Serial2, tx=25, rx=34]

@jbrazio jbrazio changed the title Serial bridge implementation Bridge/cross repeater implementation Jun 30, 2025
@ripplebiz
Copy link
Collaborator

I think this could be done in a much simpler way. Also, modifying the Dispatcher class is not necessary, or desirable.
You could feed mesh::Packet instances (deserialised over the UART link) into the repeater MyMesh class, by using the _mgr->queueInbound() method. And, you could do the opposite, by adding code to the MyMesh::logRx() method to serialize packets over the UART. So, there's no need for the queue's and the extra Packet class, etc

@jbrazio
Copy link
Contributor Author

jbrazio commented Jul 1, 2025

@ripplebiz

I had a design principle behind it:
RF should always be king and never block for serial bridging

Which lead to the current "parallel" architecture and the usage of queues.

If serialisation happens inside logRx() then the RF cycle needs to wait for the serial shenanigans to happen; if this is not a issue then it can be simplified indeed. I'm fine to make it much more leaner. But the serial packet abstraction would still be a valid choice as it's easier to parse the stream of data, and the added CRC keeps things sane.. think about noise environments, long unshielded serial cables etc.

@glen-gibson
Copy link

This is definitely something I've been looking for. Thank you @jbrazio

@fdlamotte
Copy link
Collaborator

fdlamotte commented Jul 7, 2025

This one open very cool use cases !

I've made some experiments out of curiosity and could make it work with RAK3272 (based on stm32), I published my own branch here :

fdlamotte@88edaa7

feel free to incorporate the stm32 bits ;)

Results were mixed when I tried with esp32-s3 on the lora side, acks were not going through (or maybe took too long), but results were better with stm32

@jbrazio
Copy link
Contributor Author

jbrazio commented Jul 7, 2025

thanks @fdlamotte I will merge your code as I didn't had any STM32 boards to test.

@glen-gibson
Copy link

@fdlamotte and @jbrazio you're both legends. I'm looking forward to studying the code from both of your repo's and coming up with something for my own use.

Got most of the C fundamentals down, currently learning Go, and my next stop is C++ for embedded, so this will help a lot. I really appreciate you sharing your work.

@fdlamotte
Copy link
Collaborator

fdlamotte commented Jul 8, 2025

still playing with your code @jbrazio ;)

I've tried to get rid of modifications in dispatcher as proposed by @ripplebiz

used logTx but by using logRx we could make the bridge transparent (atm it is seen as two repeaters, one on each interface) or semi-transparent (only one repeater)

Forgot the link: fdlamotte@22656d5

I think I now want a RAK11162 more than ever: https://docs.rakwireless.com/product-categories/wisblock/rak11162/overview/

@jbrazio
Copy link
Contributor Author

jbrazio commented Jul 8, 2025

I feel that a transparent bridge is less desirable, just because in the end of the day we are actually adding an additional hop to the path of the packet.

There is also a debate within our local CORE community, should we add options (flags) to allow packet filtering at the bridge. The most discussed example is the public channel:

  1. Should we stop bridging it ?
  2. Should we add flags to allow owners to choose ?

@fdlamotte
Copy link
Collaborator

Very interesting topics and I think this is generally related to repeater, not only bridges (one more reason not to have transparent bridges) ...

My own interpretation is that we could have some control at the repeater level for filtering nodes/channel and even some whitelist/blacklist mechanisms depending on source or destination

Also, we should be able to specify the number of hops we want for a channel message (I think this is possible at the packet level, but don't know if any application can use this, meshcli don't give that control for instance). That could be interesting to send public messages locally (0 hop away)

@jbrazio
Copy link
Contributor Author

jbrazio commented Jul 8, 2025

@ripplebiz please review the latest patch.

@fdlamotte
Copy link
Collaborator

fdlamotte commented Jul 8, 2025

on my side it seems ok. You should probably discard the changes to dispatcher ...

just doing some tests ... with this version the repeater hangs at some point

@jbrazio
Copy link
Contributor Author

jbrazio commented Jul 8, 2025

Where does it get stuck ? Is it on serial read ?
Try enable MESH_DEBUG so the raw packets get dumped.

@fdlamotte
Copy link
Collaborator

Where does it get stuck ? Is it on serial read ? Try enable MESH_DEBUG so the raw packets get dumped.

I'll try to investigate, first I've rolled back to what I was using this morning ;) as I was not in the same place to check if it was still working ;) ... because the setup is not that simple and an error on my side was possible
...

@fdlamotte
Copy link
Collaborator

fdlamotte commented Jul 8, 2025

I'm sorry, can't get it to work reliably (lots of acks are missed) and debug is very difficult

It has been hanging less though

edit: removed debug, hangs more quickly, probably a synchro issue ... the queue you had before was good for that kind of problems ;)
edit2: I'll try again using MESH_PACKET_LOGGING

@fdlamotte
Copy link
Collaborator

fdlamotte commented Jul 8, 2025

better with MESH_PACKET_LOGGING ...

can't go further right now but seems to be blocking while reading

image

left one is blocked ... that's on the stm32, may be able to debug with stlink

@fdlamotte
Copy link
Collaborator

fdlamotte commented Jul 8, 2025

it hangs at
bytes[i] = buffer[(head + 6 + i) % size];

but I have to backtrace (I get the cpu in an exception) and don't know yet what happens that get there or the values of the counters

@fdlamotte
Copy link
Collaborator

fdlamotte commented Jul 8, 2025

len = 65535 ;) => that's what causes the boom ...

@jbrazio
Copy link
Contributor Author

jbrazio commented Jul 8, 2025

Strange.. could this be a stm32 thing ? I've got my dev bridge running and no hangs but it's on rp2040.

len = 65535
This seems an overflow..

@fdlamotte
Copy link
Collaborator

either overflow, or some return code at -1 on the other side (esp32c3 with espnow for me) to calculate length. And yes this can be related to stm32 (some default behavior not working the same)

I have the tools set up now, I'll continue investigating ...

Strange.. could this be a stm32 thing ? I've got my dev bridge running and no hangs but it's on rp2040.

len = 65535
This seems an overflow..

@fdlamotte
Copy link
Collaborator

fdlamotte commented Jul 8, 2025

I now discard packets with -1 size and get coherent results

image

but len=-1 packet do come through, and they are not invented by stm32 ;) this might be specific to espnow

edit: It's been working consistently on the two sides of the bridge with the fix, next I'll investigate the otherside

@jbrazio
Copy link
Contributor Author

jbrazio commented Jul 8, 2025

In which line is it breaking ?

spkt.len = pkt->writeTo(spkt.payload);

OR

const uint16_t len = buffer[(head + 2) % size] | (buffer[(head + 3) % size] << 8);

@fdlamotte
Copy link
Collaborator

In which line is it breaking ?

Packets come with a size of -1

They are transmitted (from the esp side), because spkt.len is an uint and the guard is spkt.len > 0 ;)

Then it breaks at bytes[i] = buffer[(head + 6 + i) % size]; when i becomes too high

In your earlier version this was checked

I think you just have to write : size_t slen = pkt->writeTo(spkt.payload); spkt.len = slen; and then check for slen > 0 and it would be ok on the sender's side ;)

@fdlamotte
Copy link
Collaborator

fdlamotte commented Jul 8, 2025

Have some time now to investigate ...

It appears there is no packet with a size of -1 going out from the esp32-c3

but on the stm32 side, I saw packets starting with BEEFFFFF ! If I discard them it's ok ... I'm going back to debugger ;)

image

head at 208

@jbrazio
Copy link
Contributor Author

jbrazio commented Jul 8, 2025

Try the latest version.
This might be noise on the serial line.. so I've changed magic not to have any "FF" and also make the validation of len more robust.

@fdlamotte
Copy link
Collaborator

I'll try it ...
In the meantime here is what I've found in read from stm32

image

read can return -1 ... casted to char it will be a 0xFF

@jbrazio
Copy link
Contributor Author

jbrazio commented Jul 8, 2025

This is indeed a very special case.. how were you getting it so often ?

It needs to have a full packet on buffer with the right magic but followed by a broken read (-1 cast to uint8_t right after the magic). As len was not being verified for sanity before the data being copied to pkt payload, it would overflow and overwrite memory leading to a crash.

@fdlamotte
Copy link
Collaborator

fdlamotte commented Jul 8, 2025

This is indeed a very special case.. how were you getting it so often ?

Yes, that's why I'm not really sure ...

But in the end the last version works well with no other mitigation ! to be more cautious, you could read the serial port in an int and check for a negative value (and discard in this case) ... but this does not seem to happen any more

edit: Won't do more for today ... but it's been very cool ;)

#endif
} else {
pkt->readFrom(bytes, len);
_mgr->queueInbound(pkt, millis());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of millis(), should be futureMillis(0)

@@ -292,6 +315,22 @@ class MyMesh : public mesh::Mesh, public CommonCLICallbacks {
}
}
void logTx(mesh::Packet* pkt, int len) override {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure this should be logRx()

You want to send received packets over to the other side of the bridge.
If you send transmitted packets, you will also be sending back packets that came FROM the other side of bridge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use logRx() instead of logTx() then packets originated ON the bridge itself will not be sent over the bridge (adverts etc). So it makes impossible to manage bridge A from bridge B. Using logTx() we get duplicated packets on the serial line, but the hashing mechanism will prevent them from being transmitted over RF.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exactly what I've noticed ... and why I used logtx yesterday

@fdlamotte
Copy link
Collaborator

fdlamotte commented Jul 9, 2025

valid point, but there is a lot more to do when sending from logRx()

I won't be able to tackle this today, but here is a log my bridge was still up this morning ;) :

edit: smaller log ;) just doing clock sync and clock from espnow side towards esp32 repeater (that is bridged)

10:54:33 - 15/5/2024 U: BRIDGE: Write to serial len=127 crc=0xf24c
10:56:14 - 15/5/2024 U: BRIDGE: Read from serial len=22 crc=0x9cd2 valid=true
07:35:56 - 9/7/2025 U: TX, len=55 (type=2, route=D, payload_len=52) [B2 -> 93]
07:35:57 - 9/7/2025 U: BRIDGE: Write to serial len=55 crc=0x261d
07:35:57 - 9/7/2025 U: BRIDGE: Read from serial len=54 crc=0x6b45 valid=true
07:36:01 - 9/7/2025 U: BRIDGE: Read from serial len=22 crc=0x416b valid=true
07:36:02 - 9/7/2025 U: TX, len=39 (type=2, route=D, payload_len=36) [B2 -> 93]
07:36:03 - 9/7/2025 U: BRIDGE: Write to serial len=39 crc=0x646b
07:36:03 - 9/7/2025 U: BRIDGE: Read from serial len=38 crc=0x2793 valid=true
07:36:13 - 9/7/2025 U: TX, len=127 (type=4, route=D, payload_len=125)
07:36:14 - 9/7/2025 U: BRIDGE: Write to serial len=127 crc=0x67b

the ones at 7:36:03 (RX and TX) are indeed suspects of the kind of behavior pointed out by scott

@fdlamotte
Copy link
Collaborator

A better one:

  • 93 is an espnow node
  • D6 is repeater on espnow side
  • B2 is repeater on lora side (bridged with D6)
  • did a clock from 93 to B2

Log from D6

07:47:59 - 9/7/2025 U RAW: 0A01D6B2930BFF8BF633F26908FFCBB5281A80B6309F27
07:47:59 - 9/7/2025 U: RX, len=23 (type=2, route=D, payload_len=20) SNR=0 RSSI=0 score=0 hash=4BAB4BC228FCACFF [93 -> B2]
07:47:59 - 9/7/2025 U: TX, len=22 (type=2, route=D, payload_len=20) [93 -> B2]
07:47:59 - 9/7/2025 U: BRIDGE: Write to serial len=22 crc=0x0d67
07:48:00 - 9/7/2025 U: BRIDGE: Read from serial len=39 crc=0x7c08 valid=true
07:48:00 - 9/7/2025 U: TX, len=38 (type=2, route=D, payload_len=36) [B2 -> 93]
07:48:00 - 9/7/2025 U: BRIDGE: Write to serial len=38 crc=0x3f30

Log from B2

07:47:47 - 9/7/2025 U: BRIDGE: Read from serial len=22 crc=0x0d67 valid=true
07:47:48 - 9/7/2025 U: TX, len=39 (type=2, route=D, payload_len=36) [B2 -> 93]
07:47:49 - 9/7/2025 U: BRIDGE: Write to serial len=39 crc=0x7c08
07:47:49 - 9/7/2025 U: BRIDGE: Read from serial len=38 crc=0x3f30 valid=true

@jbrazio
Copy link
Contributor Author

jbrazio commented Jul 9, 2025

@fdlamotte: #454 (comment)

@fdlamotte
Copy link
Collaborator

fdlamotte commented Jul 9, 2025

will test this afternoon (edit: I get the same traces ;) )

do you think we could have an abstract Bridge (MeshBridge ?) class and one implementation as SerialBridge ? this would be lighter than your first implementation, but also pluggable ;)

@glen-gibson
Copy link

do you think we could have an abstract Bridge (MeshBridge ?) class and one implementation as SerialBridge ? this would be lighter than your first implementation, but also pluggable ;)

I like this idea. Opens the door to other bridge implementations in a standardised way.

@ripplebiz
Copy link
Collaborator

valid point, but there is a lot more to do when sending from logRx()

I won't be able to tackle this today, but here is a log my bridge was still up this morning ;) :

edit: smaller log ;) just doing clock sync and clock from espnow side towards esp32 repeater (that is bridged)

10:54:33 - 15/5/2024 U: BRIDGE: Write to serial len=127 crc=0xf24c
10:56:14 - 15/5/2024 U: BRIDGE: Read from serial len=22 crc=0x9cd2 valid=true
07:35:56 - 9/7/2025 U: TX, len=55 (type=2, route=D, payload_len=52) [B2 -> 93]
07:35:57 - 9/7/2025 U: BRIDGE: Write to serial len=55 crc=0x261d
07:35:57 - 9/7/2025 U: BRIDGE: Read from serial len=54 crc=0x6b45 valid=true
07:36:01 - 9/7/2025 U: BRIDGE: Read from serial len=22 crc=0x416b valid=true
07:36:02 - 9/7/2025 U: TX, len=39 (type=2, route=D, payload_len=36) [B2 -> 93]
07:36:03 - 9/7/2025 U: BRIDGE: Write to serial len=39 crc=0x646b
07:36:03 - 9/7/2025 U: BRIDGE: Read from serial len=38 crc=0x2793 valid=true
07:36:13 - 9/7/2025 U: TX, len=127 (type=4, route=D, payload_len=125)
07:36:14 - 9/7/2025 U: BRIDGE: Write to serial len=127 crc=0x67b

the ones at 7:36:03 (RX and TX) are indeed suspects of the kind of behavior pointed out by scott

For the problem of reflecting back packets that were inserted (with queueInbound()), when hooking into logTx(), you could have your own SimpleMeshTables instance, and call hasSeen() before queueInbound(), then in LogTx() have an if (!hasSeen()) ... write to serial bridge ....

Like @fdlamotte mentioned, would be good to have a separate SerialBridge class that inherits from some AbstractBridge class (prob only needs two abstract methods), and this class could have a SimpleMeshTables instance as a member.

@fdlamotte
Copy link
Collaborator

I like this idea. Opens the door to other bridge implementations in a standardised way.

I'm also thinking about having several bridges on the same repeater ! to have some sort of hub ;)

@jbrazio
Copy link
Contributor Author

jbrazio commented Jul 10, 2025

Okay this goes a step back into the direction I had it initially which is fine.

Here we are a bit limited by the number of serial ports on the devices, usually 0 is the root, 1 is taken mostly by gps and 2 is here being used for bridging.

To think about a "hub" it would be better implemented using a real network stack. I've been really cautious to use a physical serial connection and not network.. just because I don't want to be the one to blame to bring MqTT to Meshcore. 🥸

@fdlamotte
Copy link
Collaborator

fdlamotte commented Jul 10, 2025

Here we are a bit limited by the number of serial ports on the devices, usually 0 is the root, 1 is taken mostly by gps and 2 is here being used for bridging.

I was thinking about custom builds ... For me its one strong point with MC, hw/sw bricks that you can assemble as you wish, your bridge will fit well in that philosophy

@fdlamotte
Copy link
Collaborator

... just because I don't want to be the one to blame to bring MqTT to Meshcore. 🥸

well messenging over mqtt would be bad (and I agree we should not bridge over the internet in general) ... but I'm sure there are valid uses in mqtt, to monitor the network for instance (just collecting adverts and neighbours could give usefull information)

@jbrazio
Copy link
Contributor Author

jbrazio commented Jul 10, 2025

... just because I don't want to be the one to blame to bring MqTT to Meshcore. 🥸

well messenging over mqtt would be bad (and I agree we should not bridge over the internet in general) ... but I'm sure there are valid uses in mqtt, to monitor the network for instance (just collecting adverts and neighbours could give usefull information)

You already built the right tool for the job ! Your python API, I'm using it to map the network remotely.

@fdlamotte
Copy link
Collaborator

You already built the right tool for the job ! Your python API, I'm using it to map the network remotely.

I don't know where my mind was going ... I've been using MT again after 6 month (because there is some activity in the region) and you can already see the bad effects it has on me !

No MQTT inside nodes for sure ;)

@jbrazio jbrazio force-pushed the jbrazio/2025_3f11ad35 branch from c533821 to 04042e3 Compare July 28, 2025 23:29
@samuk
Copy link

samuk commented Aug 8, 2025

Would it be possible to repurpose LoraWAN gateway hardware for this?

Eg LR1302 It's relatively affordable $38

Could potentially be paired to an ESP32S3 with a Pi header such as Stackypi or similar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants