Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dnsmasq: strict-order functionality not working as expected #8416

Open
2 tasks done
gspannu opened this issue Mar 8, 2025 · 13 comments
Open
2 tasks done

dnsmasq: strict-order functionality not working as expected #8416

gspannu opened this issue Mar 8, 2025 · 13 comments
Labels
support Community support

Comments

@gspannu
Copy link

gspannu commented Mar 8, 2025

Important notices

Before you add a new report, we ask you kindly to acknowledge the following:

Describe the bug

dnsmasq in 25.1.1/2 has a ‘possible’ bug that the dnsmasq strict-order functionality does not work as expected.

OPNsense respects the strict-order and forwards the dns request to the 1st listed server (rather than all in parallel); however, if the 1st server does not respond, dnsmasq does not forward the request to the next server; and instead waits for a long time and then the query times out.

To Reproduce

Steps to reproduce the behavior:

  • I have BlockyDNS running on port 53035.
  • I have a custom dnsmasq configuration file (0-custom.conf) with the following entries.
    The servers list on the GUI page are blank
server=192.168.1.1#53035
server=1.1.1.1
add-mac
add-subnet=32,128

If I enable the strict-order option in the GUI, my requests are only forwarded to 192.168.1.1:53035 and all works as expected.

However, if I disable my ‘Blocky DNS Instance’ running on port 53035, then the query just times out after a long time, rather than being forwarded to 1.1.1.1 as expected.

Expected behavior

dnsmasq to forward the query to the next listed dnsserver (1.1.1.1) rather than timing out.

Additional context

In addition, since a lot of changes are going into dnsmasq; may I request that these features be added to the dnsmasq GUI

  • GUI option to specify a port number along with the dns server address (default 53 if unspecified)
  • GUI options for add-mac
  • GUI options for add-subnet

Environment
OPNsense 25.1.2

@AdSchellevis
Copy link
Member

I kind of expect we can drop the order again as strict-order doesn't apply to server statements but merely to entries found int resolv.conf according to the manual page.

By default, dnsmasq will send queries to any of the upstream servers it knows about and tries to favour servers that are known to be up. Setting this flag forces dnsmasq to try each query with each server strictly in the order they appear in /etc/resolv.conf

The option you might be looking for is --all-servers, but this doesn't warrant any ordering:

By default, when dnsmasq has more than one upstream server available, it will send queries to just one server. Setting this flag forces dnsmasq to send all queries to all available servers. The reply from the server which answers first will be returned to the original requester.

@gspannu
Copy link
Author

gspannu commented Mar 8, 2025

dnsmasq does respect ‘strict-order’ for the server statements in the custom conf file. OPNsense also works this way, problem occurs when the 1st server is unavailable. The query is supposed to be passed onto the next server in the list; but currently it just times out (after trying the 1st server).

‘all-server’ is not what I am looking for, all-servers is the default behaviour of dnsmasq, I am specifically looking for ‘strict-order’ as this is the only mechanism to deliver a fail-safe backup.
dnsmasq first tries the 1st listed server and only if it fails, it tries the next one - thereby providing full control over which dnsservers to forward the queries too.

Also, as written earlier - OPNsense 24.x was working exactly like this, something changed during the 24.7 cycle and the new behaviour has carried over to 25.x

If dnsmasq was to work as you state, then what feature does ‘strict-order’ provide?
… unless I have misunderstood something basic.

@AdSchellevis
Copy link
Member

I merely pointing you to the documentation, if the upstream documentation is wrong, my comment obviously is so too.

@fichtner
Copy link
Member

fichtner commented Mar 8, 2025

@fichtner fichtner added the support Community support label Mar 8, 2025
@gspannu
Copy link
Author

gspannu commented Mar 8, 2025

https://forum.opnsense.org/index.php?topic=44966.msg226435#msg226435

Hi @fichtner
Should this issue not be labelled as a bug, rather than support?

Not questioning your judgement ( as you guys are a lot smarter than most), just an observation…

@Monviech
Copy link
Member

Monviech commented Mar 8, 2025

From what I read it seems like strict-order really only influences resolv.conf.

Yet server directives are also processed somewhat sequential from the configuration file, so the order can influence something. Yet not in any strict way. Details are a little hazy.

@Monviech
Copy link
Member

@AdSchellevis

From empiric tests it looks like the upstream documentation is wrong:

  • When strict order is off, all Domains that forward a specific domain get queried in parallel
  • When strict order is on, the sequence in the configuration file is honored and only the first one gets the first query
Strict order off
Sequence 1, Domain: google.com, Server: 8.8.8.8
Sequence 2, Domain: google.com, Server:: 8.8.4.4

Test1:

igc1_vlan7_DSL
pppoe0	2025-03-12
10:32:12.953004			length 56: 80.151.100.99.6436 > 8.8.8.8.53: UDP, length 24
igc1_vlan7_DSL
pppoe0	2025-03-12
10:32:12.953025			length 56: 80.151.100.99.6436 > 8.8.4.4.53: UDP, length 24
igc1_vlan7_DSL
pppoe0	2025-03-12
10:32:12.965418			length 72: 8.8.4.4.53 > 80.151.100.99.6436: UDP, length 40
igc1_vlan7_DSL
pppoe0	2025-03-12
10:32:12.965665			length 72: 8.8.8.8.53 > 80.151.100.99.6436: UDP, length 40
igc1_vlan7_DSL
pppoe0	2025-03-12
10:32:12.976444			length 56: 80.151.100.99.54462 > 8.8.4.4.53: UDP, length 24
igc1_vlan7_DSL
pppoe0	2025-03-12
10:32:12.989172			length 128: 8.8.4.4.53 > 80.151.100.99.54462: UDP, length 96

Test2:

igc1_vlan7_DSL
pppoe0	2025-03-12
10:33:42.260311			length 58: 80.151.100.99.55913 > 8.8.8.8.53: UDP, length 26
igc1_vlan7_DSL
pppoe0	2025-03-12
10:33:42.260339			length 58: 80.151.100.99.55913 > 8.8.4.4.53: UDP, length 26
igc1_vlan7_DSL
pppoe0	2025-03-12
10:33:42.271929			length 74: 8.8.4.4.53 > 80.151.100.99.55913: UDP, length 42
igc1_vlan7_DSL
pppoe0	2025-03-12
10:33:42.272170			length 74: 8.8.8.8.53 > 80.151.100.99.55913: UDP, length 42
igc1_vlan7_DSL
pppoe0	2025-03-12
10:33:42.283059			length 58: 80.151.100.99.4510 > 8.8.4.4.53: UDP, length 26
igc1_vlan7_DSL
pppoe0	2025-03-12
10:33:42.295461			length 86: 8.8.4.4.53 > 80.151.100.99.4510: UDP, length 54
Strict order ON
Sequence 1, Domain: google.com, Server: 8.8.8.8
Sequence 2, Domain: google.com, Server:: 8.8.4.4

Test1:

igc1_vlan7_DSL
pppoe0	2025-03-12
10:34:42.500490			length 61: 80.151.100.99.56961 > 8.8.8.8.53: UDP, length 29
igc1_vlan7_DSL
pppoe0	2025-03-12
10:34:42.512486			length 157: 8.8.8.8.53 > 80.151.100.99.56961: UDP, length 125
igc1_vlan7_DSL
pppoe0	2025-03-12
10:34:42.523878			length 61: 80.151.100.99.60813 > 8.8.8.8.53: UDP, length 29
igc1_vlan7_DSL
pppoe0	2025-03-12
10:34:42.535493			length 229: 8.8.8.8.53 > 80.151.100.99.60813: UDP, length 197

Test2:

igc1_vlan7_DSL
pppoe0	2025-03-12
10:35:51.435828			length 58: 80.151.100.99.44103 > 8.8.8.8.53: UDP, length 26
igc1_vlan7_DSL
pppoe0	2025-03-12
10:35:51.448252			length 90: 8.8.8.8.53 > 80.151.100.99.44103: UDP, length 58
igc1_vlan7_DSL
pppoe0	2025-03-12
10:35:51.459323			length 58: 80.151.100.99.6135 > 8.8.8.8.53: UDP, length 26
igc1_vlan7_DSL
pppoe0	2025-03-12
10:35:51.471005			length 131: 8.8.8.8.53 > 80.151.100.99.6135: UDP, length 99
igc1_vlan7_DSL
pppoe0	2025-03-12
10:35:53.680684			length 69: 80.151.100.99.32379 > 8.8.8.8.53: UDP, length 37
igc1_vlan7_DSL
pppoe0	2025-03-12
10:35:53.680740			length 69: 80.151.100.99.32379 > 8.8.8.8.53: UDP, length 37
igc1_vlan7_DSL
pppoe0	2025-03-12
10:35:53.693745			length 109: 8.8.8.8.53 > 80.151.100.99.32379: UDP, length 77
igc1_vlan7_DSL
pppoe0	2025-03-12
10:35:53.695005			length 121: 8.8.8.8.53 > 80.151.100.99.32379: UDP, length 89

@AdSchellevis
Copy link
Member

@Monviech ok, thanks for testing, so we'll leave the sequence in there then.

@gspannu
Copy link
Author

gspannu commented Mar 12, 2025

From empiric tests it looks like the upstream documentation is wrong:

@Monviech

Would you be kind enough to do an additional test?

strict-order is ON and assign Sequence 1 to some dummy IP address
(some IP address that will definitely not respond to a DNS query).

Example...

Strict order ON
Sequence 1, Domain: dummy.net, Server: 192.168.254.254
Sequence 2, Domain: google.com, Server:: 8.8.4.4

This will hopefully test whether the dnsquery gets forwarded to the next server (8.8.4.4 in your example) or does it just time out retrying the first server.

@Monviech
Copy link
Member

Monviech commented Mar 12, 2025

It looks like strict order works as expected for me. I have configured 3 servers:

Sequence 1, Domain: example.com, Server: 192.168.99.99
Sequence 2, Domain: example.com, Server: 192.168.22.22
Sequence 3, Domain: example.com, Server: 8.8.4.4

This is the result of a new query that was not cached:

igc1_vlan7_DSLpppoe0 | 2025-03-1215:27:32.900649 |   |   | length 60: 80.151.100.99.30194 > 192.168.99.99.53: UDP, length 28
igc1_vlan7_DSLpppoe0 | 2025-03-1215:27:37.955230 |   |   | length 60: 80.151.100.99.30194 > 192.168.22.22.53: UDP, length 28
igc1_vlan7_DSLpppoe0 | 2025-03-1215:27:42.989161 |   |   | length 60: 80.151.100.99.30194 > 8.8.4.4.53: UDP, length 28
igc1_vlan7_DSLpppoe0 | 2025-03-1215:27:43.001018 |   |   | length 124: 8.8.4.4.53 > 80.151.100.99.30194: UDP, length 92
igc1_vlan7_DSLpppoe0 | 2025-03-1215:27:43.012388 |   |   | length 60: 80.151.100.99.40259 > 192.168.99.99.53: UDP, length 28

First try: 192.168.99.99.53 -> Timeout after 5s
Second try: 192.168.22.22 -> Timeout after 5s
Third try: 8.8.4.4 -> Woo response, my client got an answer
(These all happened during one single nslookup, did not have to reinitiate it.)

@Monviech
Copy link
Member

Monviech commented Mar 12, 2025

Maybe your client does not time out correctly. DNSmasq default behavior is that the client must time out. Try out this option:

https://thekelleys.org.uk/dnsmasq/docs/dnsmasq-man.html
--fast-dns-retry=[<initial retry delay in ms>[,<time to continue retries in ms>]]

@gspannu
Copy link
Author

gspannu commented Mar 12, 2025

It looks like strict order works as expected for me. I have configured 3 servers:

Sequence 1, Domain: example.com, Server: 192.168.99.99 Sequence 2, Domain: example.com, Server: 192.168.22.22 Sequence 3, Domain: example.com, Server: 8.8.4.4
….

First try: 192.168.99.99.53 -> Timeout after 5s Second try: 192.168.22.22 -> Timeout after 5s Third try: 8.8.4.4 -> Woo response, my client got an answer (These all happened during one single nslookup, did not have to reinitiate it.)

Thanks for the update. 👍

One last test please…

What happens when you use a non default port lets say 53035 instead of default 53

Sequence 1, Domain: example.com, Server: 192.168.99.99:53035
Sequence 2, Domain: example.com, Server: 192.168.22.22
Sequence 3, Domain: example.com, Server: 8.8.4.4

My custom-dnsmasq.conf contains the following

server=192.168.1.1#53035
server=1.1.1.1
add-mac
add-subnet=32,128

So, I am wondering if the behaviour I see is to do with custom ports !

Q1. Does the query still get forwarded to the next in sequence, if you use custom ports?
Q2. Is the IP address (192.168.99.99) non-reachable, or is it that there is nothing running on port 53 (in your example)?


Also, could you look at providing GUI support to enter the (optional) port number in the System > Settings > General > DNS Server field?

Maybe an additional edit box to enter optional port number, or allow port number in the DNS server edit boxitself.
E.g. 192.168.99.99:53035, 192.168.22.22:53535, 1.1.1.1, 8.8.8.8:53

@Monviech
Copy link
Member

Cant you try this test yourself now? I think I have tested enough and I gave more hints what to potentially try.

It could be true that all DNS Servers must be exactly the same for this to work correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
support Community support
Development

No branches or pull requests

4 participants