Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 BUG: clients can communicate with lighthouse but not with each other #1354

Closed
methaqualon opened this issue Mar 16, 2025 · 5 comments
Closed

Comments

@methaqualon
Copy link

What version of nebula are you using? (nebula -version)

1.6.1

What operating system are you using?

Debian 12

Describe the Bug

Hello! First of all, thank you very much for a serious product.
Secondly, there is a small problem and there is reason to believe that may be a bug.

Given:
3 ​​machines on Debian 12.
1 - laptop behind NAT. 10.250.0.3
2 - lighthouse server, with a white external IP. 10.250.0.1
3 - "worker" server, with a white external IP. 10.250.0.10
all on Debian 12.
All have their time synchronized via NTP, they are in the same time zone. 2 and 3 are in the same data center.

everything works fine, but not in any direction.
laptop <-> lighthouse
lighthouse <-> worker
everything can be pinged in these directions, nginx curl without any problems.
but the laptop <-> worker is not working at all.

on the laptop in the debug information even the real external IP of the worker is visible from the lighthouse, but there is no handshake.
this is very strange, considering that the config files are the same, of course with the exception of the lighthouse.
I really ask for help.
changed "punchy" settings, advertising addresses, almost everything that could be related to this.

they should easily communicate with each other, are they?

Reminding - it may not be firewall (switched off fully on every machine after 1.5 days of researching, no changes), 1 and 3 can easily communicate with 2.
I hope it is just my mistake with configs.

Thank you so much for your attention!

Logs from affected hosts

That is the log of me trying from laptop to reach worker server via my laptop.
seems like lighthouse do not know what to do with packets which he need to address to worker 3, and dropping it.
Will try to record logs on lighthouse or worker if needed.

DEBU[9212] Tunnel status                                 tunnelCheck="map[method:passive state:alive]" vpnIp=10.250.0.1
DEBU[9213] Generated index                               index=1929147183
DEBU[9213] Packet store                                  length=1 stored=true vpnIp=10.250.0.10
INFO[9213] Handshake message sent                        handshake="map[stage:1 style:ix_psk0]" initiatorIndex=1929147183 udpAddrs="[WORKERSERVERIP:54358 10.42.1.0:44125 10.42.1.1:44125 192.168.0.106:44125]" vpnIp=10.250.0.10
INFO[9213] Handshake message sent                        handshake="map[stage:1 style:ix_psk0]" initiatorIndex=1929147183 udpAddrs="[WORKERSERVERIP:54358 10.42.1.0:44125 10.42.1.1:44125 192.168.0.106:44125]" vpnIp=10.250.0.10
INFO[9214] Handshake message sent                        handshake="map[stage:1 style:ix_psk0]" initiatorIndex=1929147183 udpAddrs="[WORKERSERVERIP:54358 10.42.1.0:44125 10.42.1.1:44125 192.168.0.106:44125]" vpnIp=10.250.0.10
DEBU[9214] Packet store                                  length=2 stored=true vpnIp=10.250.0.10
DEBU[9214] Tunnel status                                 tunnelCheck="map[method:passive state:alive]" vpnIp=10.250.0.1
INFO[9214] Handshake message sent                        handshake="map[stage:1 style:ix_psk0]" initiatorIndex=1929147183 udpAddrs="[WORKERSERVERIP:54358 10.42.1.0:44125 10.42.1.1:44125 192.168.0.106:44125]" vpnIp=10.250.0.10
DEBU[9215] Tunnel status                                 certName=notNecessary tunnelCheck="map[method:active state:testing]" vpnIp=10.250.0.1
DEBU[9215] Packet store                                  length=3 stored=true vpnIp=10.250.0.10
INFO[9215] Handshake message sent                        handshake="map[stage:1 style:ix_psk0]" initiatorIndex=1929147183 udpAddrs="[WORKERSERVERIP:54358 10.42.1.0:44125 10.42.1.1:44125 192.168.0.106:44125]" vpnIp=10.250.0.10
DEBU[9216] Tunnel status                                 tunnelCheck="map[method:passive state:alive]" vpnIp=10.250.0.1
DEBU[9216] Packet store                                  length=4 stored=true vpnIp=10.250.0.10
INFO[9216] Handshake message sent                        handshake="map[stage:1 style:ix_psk0]" initiatorIndex=1929147183 udpAddrs="[WORKERSERVERIP:54358 10.42.1.0:44125 10.42.1.1:44125 192.168.0.106:44125]" vpnIp=10.250.0.10
DEBU[9217] Packet store                                  length=5 stored=true vpnIp=10.250.0.10
INFO[9217] Handshake message sent                        handshake="map[stage:1 style:ix_psk0]" initiatorIndex=1929147183 udpAddrs="[WORKERSERVERIP:54358 10.42.1.0:44125 10.42.1.1:44125 192.168.0.106:44125]" vpnIp=10.250.0.10

Config files from affected hosts

lighthouse:

pki:
  ca: /nebula-conf/ca.crt
  cert: /nebula-conf/nebula.crt
  key: /nebula-conf/nebula.key

lighthouse:
  am_lighthouse: true

listen:
  host: 0.0.0.0
  port: 4443

punchy:
  punch: true

tun:
  dev: nebula-server
  drop_local_broadcast: false
  drop_multicast: false
  tx_queue: 500
  mtu: 1300
  routes:
  unsafe_routes:

firewall:
  conntrack:
    tcp_timeout: 120h
    udp_timeout: 3m
    default_timeout: 10m
    max_connections: 100000

  outbound:
    # Allow all outbound traffic from this node
    - port: any
      proto: any
      host: any

  inbound:
    - port: any
      proto: any
      host: any

laptop:

pki:
  ca: ../../ca.crt
  cert: operator-node.crt
  key: operator-node.key

static_host_map:
  "10.250.0.1": ["LIGHTHOUSEIP:PORT"]

lighthouse:
  hosts:
    - "10.250.0.1"

punchy:
  punch: false

  advertise_addrs:
    - "10.250.0.1"

tun:
  dev: nebula-client
  drop_local_broadcast: false
  drop_multicast: false
  tx_queue: 500
  mtu: 1300
  routes:
  unsafe_routes:

firewall:
  conntrack:
    tcp_timeout: 120h
    udp_timeout: 3m
    default_timeout: 10m
    max_connections: 100000

  outbound:
    # Allow all outbound traffic from this node
    - port: any
      proto: any
      host: any

  inbound:
    - port: any
      proto: any
      host: any
logging:
    level: debug
    format: text
    output: stderr

worker server:

pki:
  ca: nebulaca.crt
  cert: config.crt
  key: config.key

static_host_map:
  "10.250.0.1": ["LIGHTHOUSEIP:PORT"]

lighthouse:
  hosts:
    - "10.250.0.1"

punchy:
  punch: true

  advertise_addrs:
    - "10.250.0.10"

tun:
  dev: nebula-server
  drop_local_broadcast: false
  drop_multicast: false
  tx_queue: 500
  mtu: 1300
  routes:
  unsafe_routes:

firewall:
  conntrack:
    tcp_timeout: 120h
    udp_timeout: 3m
    default_timeout: 10m
    max_connections: 100000

  outbound:
    # Allow all outbound traffic from this node
    - port: any
      proto: any
      host: any

  inbound:
    - port: any
      proto: any
      host: any
@methaqualon
Copy link
Author

also can mention that the latest IOS app can not reach worker too.
but lighthouse loaded easily
unfortunately this thing is without logs, coz screenshots will be just funny
hope you will trust me =))

@methaqualon
Copy link
Author

methaqualon commented Mar 16, 2025

rechecked on the latest 1.9.5 - same issue

UPD:
work only with relay on lighthouse configured. but it should not be like that - worker also have a static white ip address!

UPD2:
also works if i place worker into static host map on laptop, and configure worker to listen on specific port
should not it just take an ip and port from lighthouse? ip appears in the logs on laptop!

@methaqualon
Copy link
Author

Summary:
Without static_host_map my NAT laptop can't connect to the server nodes despite getting the correct public IP from Lighthouse (which shows up fine in the logs). It connects instantly if I write the IP to static_host_map. This indicates a bug or flaw in the automatic NAT traversal or the way it determines what to do

way to replicate the error:
just rent two VPS, sit in a cafe on WiFi, run Nebula on 3 machines even with the default example config (customized for Lighthouse and clients of course) and see...

Unfortunately it's too inconvenient to add lines to the config file of each small server, my network may have up to 1000 servers and it would be great if Nebula could resolve their white IPs itself
Still hoping that this is my misunderstanding of something in the configs, but spent the last ideas what to do about it

@pri11er
Copy link

pri11er commented Mar 19, 2025

Just another user here. What is see is incorrect use of "advertise_addrs:". That is for advertising listening IP's, not VPN overlay IP addresses.

In the case of laptop, you have it advertising the nebula vpn IP of the lighthouse. Remove that.

On the server, you are advertising a vpn IP. That is not the correct use of "advertise_addrs:".

Why is punchy:false on the laptop?

@methaqualon
Copy link
Author

Just another user here. Seems to be a misuse of "advertise_addrs:". That's for advertising the eavesdropped IP, not the VPN overlay IP.

In the laptop case, you have a nebula vpn beacon IP ad. Remove that.

On the server, you're advertising the vpn IP. That's a misuse of "advertise_addrs:".

Why is punchy:false on the laptop?

Thanks a lot! I was really confused by the documentation and logs (I was thinking why shouldn't nebula just use ip:port too if I can see it in the terminal) and your comment was exactly what I needed. Saved me a few more days.
punchy was false by accident, just when I was pasting it in, I flipped it many times and there was no change.
Now I just set advertise_addr on the worker node to its own public ip, deleted it from the laptop, and everything works like a charm.

Thanks!

advertise_addrs are routable addresses that will be included along with discovered addresses to report in the
beacon, format is "ip:port". port can be 0, in which case the actual listening port will be used instead, useful if listen.port is set to 0.
This option is mostly useful when there are static IP addresses that the host can be reached on, which Nebula normally can't discover on its own. Examples are port forwarding or multiple paths to the internet.

It just didn't seem too obvious what to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants