🐛 BUG: clients can communicate with lighthouse but not with each other #1354

methaqualon · 2025-03-16T02:14:22Z

What version of `nebula` are you using? (`nebula -version`)

1.6.1

What operating system are you using?

Debian 12

Describe the Bug

Hello! First of all, thank you very much for a serious product.
Secondly, there is a small problem and there is reason to believe that may be a bug.

Given:
3 machines on Debian 12.
1 - laptop behind NAT. 10.250.0.3
2 - lighthouse server, with a white external IP. 10.250.0.1
3 - "worker" server, with a white external IP. 10.250.0.10
all on Debian 12.
All have their time synchronized via NTP, they are in the same time zone. 2 and 3 are in the same data center.

everything works fine, but not in any direction.
laptop <-> lighthouse
lighthouse <-> worker
everything can be pinged in these directions, nginx curl without any problems.
but the laptop <-> worker is not working at all.

on the laptop in the debug information even the real external IP of the worker is visible from the lighthouse, but there is no handshake.
this is very strange, considering that the config files are the same, of course with the exception of the lighthouse.
I really ask for help.
changed "punchy" settings, advertising addresses, almost everything that could be related to this.

they should easily communicate with each other, are they?

Reminding - it may not be firewall (switched off fully on every machine after 1.5 days of researching, no changes), 1 and 3 can easily communicate with 2.
I hope it is just my mistake with configs.

Thank you so much for your attention!

Logs from affected hosts

That is the log of me trying from laptop to reach worker server via my laptop.
seems like lighthouse do not know what to do with packets which he need to address to worker 3, and dropping it.
Will try to record logs on lighthouse or worker if needed.

DEBU[9212] Tunnel status                                 tunnelCheck="map[method:passive state:alive]" vpnIp=10.250.0.1
DEBU[9213] Generated index                               index=1929147183
DEBU[9213] Packet store                                  length=1 stored=true vpnIp=10.250.0.10
INFO[9213] Handshake message sent                        handshake="map[stage:1 style:ix_psk0]" initiatorIndex=1929147183 udpAddrs="[WORKERSERVERIP:54358 10.42.1.0:44125 10.42.1.1:44125 192.168.0.106:44125]" vpnIp=10.250.0.10
INFO[9213] Handshake message sent                        handshake="map[stage:1 style:ix_psk0]" initiatorIndex=1929147183 udpAddrs="[WORKERSERVERIP:54358 10.42.1.0:44125 10.42.1.1:44125 192.168.0.106:44125]" vpnIp=10.250.0.10
INFO[9214] Handshake message sent                        handshake="map[stage:1 style:ix_psk0]" initiatorIndex=1929147183 udpAddrs="[WORKERSERVERIP:54358 10.42.1.0:44125 10.42.1.1:44125 192.168.0.106:44125]" vpnIp=10.250.0.10
DEBU[9214] Packet store                                  length=2 stored=true vpnIp=10.250.0.10
DEBU[9214] Tunnel status                                 tunnelCheck="map[method:passive state:alive]" vpnIp=10.250.0.1
INFO[9214] Handshake message sent                        handshake="map[stage:1 style:ix_psk0]" initiatorIndex=1929147183 udpAddrs="[WORKERSERVERIP:54358 10.42.1.0:44125 10.42.1.1:44125 192.168.0.106:44125]" vpnIp=10.250.0.10
DEBU[9215] Tunnel status                                 certName=notNecessary tunnelCheck="map[method:active state:testing]" vpnIp=10.250.0.1
DEBU[9215] Packet store                                  length=3 stored=true vpnIp=10.250.0.10
INFO[9215] Handshake message sent                        handshake="map[stage:1 style:ix_psk0]" initiatorIndex=1929147183 udpAddrs="[WORKERSERVERIP:54358 10.42.1.0:44125 10.42.1.1:44125 192.168.0.106:44125]" vpnIp=10.250.0.10
DEBU[9216] Tunnel status                                 tunnelCheck="map[method:passive state:alive]" vpnIp=10.250.0.1
DEBU[9216] Packet store                                  length=4 stored=true vpnIp=10.250.0.10
INFO[9216] Handshake message sent                        handshake="map[stage:1 style:ix_psk0]" initiatorIndex=1929147183 udpAddrs="[WORKERSERVERIP:54358 10.42.1.0:44125 10.42.1.1:44125 192.168.0.106:44125]" vpnIp=10.250.0.10
DEBU[9217] Packet store                                  length=5 stored=true vpnIp=10.250.0.10
INFO[9217] Handshake message sent                        handshake="map[stage:1 style:ix_psk0]" initiatorIndex=1929147183 udpAddrs="[WORKERSERVERIP:54358 10.42.1.0:44125 10.42.1.1:44125 192.168.0.106:44125]" vpnIp=10.250.0.10

Config files from affected hosts

lighthouse:

pki:
  ca: /nebula-conf/ca.crt
  cert: /nebula-conf/nebula.crt
  key: /nebula-conf/nebula.key

lighthouse:
  am_lighthouse: true

listen:
  host: 0.0.0.0
  port: 4443

punchy:
  punch: true

tun:
  dev: nebula-server
  drop_local_broadcast: false
  drop_multicast: false
  tx_queue: 500
  mtu: 1300
  routes:
  unsafe_routes:

firewall:
  conntrack:
    tcp_timeout: 120h
    udp_timeout: 3m
    default_timeout: 10m
    max_connections: 100000

  outbound:
    # Allow all outbound traffic from this node
    - port: any
      proto: any
      host: any

  inbound:
    - port: any
      proto: any
      host: any

laptop:

pki:
  ca: ../../ca.crt
  cert: operator-node.crt
  key: operator-node.key

static_host_map:
  "10.250.0.1": ["LIGHTHOUSEIP:PORT"]

lighthouse:
  hosts:
    - "10.250.0.1"

punchy:
  punch: false

  advertise_addrs:
    - "10.250.0.1"

tun:
  dev: nebula-client
  drop_local_broadcast: false
  drop_multicast: false
  tx_queue: 500
  mtu: 1300
  routes:
  unsafe_routes:

firewall:
  conntrack:
    tcp_timeout: 120h
    udp_timeout: 3m
    default_timeout: 10m
    max_connections: 100000

  outbound:
    # Allow all outbound traffic from this node
    - port: any
      proto: any
      host: any

  inbound:
    - port: any
      proto: any
      host: any
logging:
    level: debug
    format: text
    output: stderr

worker server:

pki:
  ca: nebulaca.crt
  cert: config.crt
  key: config.key

static_host_map:
  "10.250.0.1": ["LIGHTHOUSEIP:PORT"]

lighthouse:
  hosts:
    - "10.250.0.1"

punchy:
  punch: true

  advertise_addrs:
    - "10.250.0.10"

tun:
  dev: nebula-server
  drop_local_broadcast: false
  drop_multicast: false
  tx_queue: 500
  mtu: 1300
  routes:
  unsafe_routes:

firewall:
  conntrack:
    tcp_timeout: 120h
    udp_timeout: 3m
    default_timeout: 10m
    max_connections: 100000

  outbound:
    # Allow all outbound traffic from this node
    - port: any
      proto: any
      host: any

  inbound:
    - port: any
      proto: any
      host: any

The text was updated successfully, but these errors were encountered:

methaqualon · 2025-03-16T02:30:13Z

also can mention that the latest IOS app can not reach worker too.
but lighthouse loaded easily
unfortunately this thing is without logs, coz screenshots will be just funny
hope you will trust me =))

methaqualon · 2025-03-16T17:27:30Z

rechecked on the latest 1.9.5 - same issue

UPD:
work only with relay on lighthouse configured. but it should not be like that - worker also have a static white ip address!

UPD2:
also works if i place worker into static host map on laptop, and configure worker to listen on specific port
should not it just take an ip and port from lighthouse? ip appears in the logs on laptop!

methaqualon · 2025-03-18T15:36:36Z

Summary:
Without static_host_map my NAT laptop can't connect to the server nodes despite getting the correct public IP from Lighthouse (which shows up fine in the logs). It connects instantly if I write the IP to static_host_map. This indicates a bug or flaw in the automatic NAT traversal or the way it determines what to do

way to replicate the error:
just rent two VPS, sit in a cafe on WiFi, run Nebula on 3 machines even with the default example config (customized for Lighthouse and clients of course) and see...

Unfortunately it's too inconvenient to add lines to the config file of each small server, my network may have up to 1000 servers and it would be great if Nebula could resolve their white IPs itself
Still hoping that this is my misunderstanding of something in the configs, but spent the last ideas what to do about it

pri11er · 2025-03-19T01:54:13Z

Just another user here. What is see is incorrect use of "advertise_addrs:". That is for advertising listening IP's, not VPN overlay IP addresses.

In the case of laptop, you have it advertising the nebula vpn IP of the lighthouse. Remove that.

On the server, you are advertising a vpn IP. That is not the correct use of "advertise_addrs:".

Why is punchy:false on the laptop?

methaqualon · 2025-03-19T02:15:47Z

Just another user here. Seems to be a misuse of "advertise_addrs:". That's for advertising the eavesdropped IP, not the VPN overlay IP.

In the laptop case, you have a nebula vpn beacon IP ad. Remove that.

On the server, you're advertising the vpn IP. That's a misuse of "advertise_addrs:".

Why is punchy:false on the laptop?

Thanks a lot! I was really confused by the documentation and logs (I was thinking why shouldn't nebula just use ip:port too if I can see it in the terminal) and your comment was exactly what I needed. Saved me a few more days.
punchy was false by accident, just when I was pasting it in, I flipped it many times and there was no change.
Now I just set advertise_addr on the worker node to its own public ip, deleted it from the laptop, and everything works like a charm.

Thanks!

advertise_addrs are routable addresses that will be included along with discovered addresses to report in the
beacon, format is "ip:port". port can be 0, in which case the actual listening port will be used instead, useful if listen.port is set to 0.
This option is mostly useful when there are static IP addresses that the host can be reached on, which Nebula normally can't discover on its own. Examples are port forwarding or multiple paths to the internet.

It just didn't seem too obvious what to do.

methaqualon closed this as completed Mar 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐛 BUG: clients can communicate with lighthouse but not with each other #1354

🐛 BUG: clients can communicate with lighthouse but not with each other #1354

methaqualon commented Mar 16, 2025

methaqualon commented Mar 16, 2025

methaqualon commented Mar 16, 2025 •

edited

Loading

methaqualon commented Mar 18, 2025

pri11er commented Mar 19, 2025

methaqualon commented Mar 19, 2025

🐛 BUG: clients can communicate with lighthouse but not with each other #1354

🐛 BUG: clients can communicate with lighthouse but not with each other #1354

Comments

methaqualon commented Mar 16, 2025

What version of nebula are you using? (nebula -version)

What operating system are you using?

Describe the Bug

Logs from affected hosts

Config files from affected hosts

methaqualon commented Mar 16, 2025

methaqualon commented Mar 16, 2025 • edited Loading

methaqualon commented Mar 18, 2025

pri11er commented Mar 19, 2025

methaqualon commented Mar 19, 2025

What version of `nebula` are you using? (`nebula -version`)

methaqualon commented Mar 16, 2025 •

edited

Loading