Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pihole 6 docker / dnsmasq component segfaults during run #2219

Open
reneploetz opened this issue Feb 20, 2025 · 9 comments
Open

Pihole 6 docker / dnsmasq component segfaults during run #2219

reneploetz opened this issue Feb 20, 2025 · 9 comments

Comments

@reneploetz
Copy link

reneploetz commented Feb 20, 2025

Since using Pihole 6 I'm experiencing repeated crashes - resulting in container restarts. The debug output was captured using the steps in https://docs.pi-hole.net/ftldns/gdb/

Versions

  • Core version is v6.0 (Latest: v6.0.1)
  • Web version is v6.0 (Latest: v6.0)
  • FTL version is v6.0 (Latest: v6.0)

FTL commit: eaa7dbb
FTL date: 2025-02-18 17:19:26 +0000

Platform

  • OS and version: Alpine Linux
  • Platform: Docker (pihole/pihole, ImageId 2cfd24401bf6)

Actual behavior / bug

GDB session inside Docker:

gdb -p $(cat /run/pihole-FTL.pid)
GNU gdb (GDB) 14.1
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-alpine-linux-musl".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 57
[New LWP 58]
[New LWP 59]
[New LWP 60]
[New LWP 61]
[New LWP 62]
[New LWP 63]
[New LWP 64]
[New LWP 65]
__cp_end () at src/thread/x86_64/syscall_cp.s:29

warning: 29     src/thread/x86_64/syscall_cp.s: No such file or directory
(gdb) continue
Continuing.
[Detaching after vfork from child process 208]
[Detaching after vfork from child process 209]
[Detaching after vfork from child process 210]
[Detaching after vfork from child process 211]
[Detaching after vfork from child process 212]
[Detaching after vfork from child process 213]
[Detaching after vfork from child process 214]
[Detaching after vfork from child process 215]
[Detaching after vfork from child process 216]
[Detaching after vfork from child process 217]
[Detaching after vfork from child process 218]
[Detaching after vfork from child process 219]
[Detaching after vfork from child process 220]
[Detaching after vfork from child process 221]
[Detaching after vfork from child process 222]
[Detaching after vfork from child process 223]

Thread 1 "pihole-FTL" received signal SIGSEGV, Segmentation fault.
get_nominal_size (p=0x7f6b44fbbc70 "Ee\211o39.", end=0x7f6b44fbbc7c "") at src/malloc/mallocng/meta.h:169
warning: 169    src/malloc/mallocng/meta.h: No such file or directory
(gdb) backtrace
#0  get_nominal_size (p=0x7f6b44fbbc70 "Ee\211o39.", end=0x7f6b44fbbc7c "") at src/malloc/mallocng/meta.h:169
#1  __libc_free (p=0x7f6b44fbbc70) at src/malloc/mallocng/free.c:110
#2  0x00000000007f4b2c in free (p=<optimized out>) at src/malloc/free.c:5
#3  0x00000000004cf4cf in free_frec (f=0x7f6b44dd85b0) at /app/src/dnsmasq/forward.c:3207
#4  0x00000000004d1de0 in return_reply (now=now@entry=1740083409, forward=<optimized out>, header=header@entry=0x7f6b450f0570, n=n@entry=69, 
    status=<optimized out>, status@entry=524288) at /app/src/dnsmasq/forward.c:1582
#5  0x00000000004d2eac in reply_query (fd=51, now=now@entry=1740083409) at /app/src/dnsmasq/forward.c:1353
#6  0x00000000004bd177 in check_dns_listeners (now=now@entry=1740083409) at /app/src/dnsmasq/dnsmasq.c:1929
#7  0x00000000004bfa50 in main_dnsmasq (argc=<optimized out>, argv=<optimized out>) at /app/src/dnsmasq/dnsmasq.c:1298
@reneploetz reneploetz changed the title Segfault during run Pihole 6 docker / dnsmasq component segfaults during run Feb 20, 2025
@DL6ER
Copy link
Member

DL6ER commented Feb 20, 2025

Thank you for your report and the detailed backtrace, this is a crash deeply inside dnsmasq. We'll investigate.

@DL6ER
Copy link
Member

DL6ER commented Feb 20, 2025

Sorry for the trouble but "repeated crashes" is something that makes me hopeful we can fix this soon. To further investigate where this double free corruption comes from, could I ask you to run the steps of https://docs.pi-hole.net/ftldns/valgrind/ in addition as report this back?

@reneploetz
Copy link
Author

reneploetz commented Feb 20, 2025

I'm unsure if that helps as I have no real experience with valgrind.

c19a95121dff:/# setcap -r /usr/bin/pihole-FTL                                                                                                                                                                                                                                                                                 
File '/usr/bin/pihole-FTL' has no capablity to remove                                                                                                                                                                                                                                                                         
c19a95121dff:/# valgrind --trace-children=yes --leak-check=full --track-origins=yes --vgdb=full --log-file=valgrind.log -s /usr/bin/pihole-FTL no-daemon                                                                                                                                                                      
2025-02-20 22:06:21.247 UTC [17M] INFO: ########## FTL started on c19a95121dff! ##########                                                                                                                                                                                                                                    
2025-02-20 22:06:21.275 UTC [17M] INFO: FTL branch: master                                                                                                                                                                                                                                                                    
2025-02-20 22:06:21.277 UTC [17M] INFO: FTL version: v6.0                                                                                                                                                                                                                                                                     
2025-02-20 22:06:21.278 UTC [17M] INFO: FTL commit: eaa7dbb4                                                                                                                                                                                                                                                                  
2025-02-20 22:06:21.279 UTC [17M] INFO: FTL date: 2025-02-18 17:19:26 +0000                                                                                                                                                                                                                                                   
2025-02-20 22:06:21.279 UTC [17M] INFO: FTL user: root                                                                                                                                                                                                                                                                        
2025-02-20 22:06:21.280 UTC [17M] INFO: Compiled for linux/amd64 (compiled on CI) using cc (Alpine 14.2.0) 14.2.0                                                                                                                                                                                                             
2025-02-20 22:06:21.427 UTC [17M] INFO: 3 FTLCONF environment variables found (3 used, 0 invalid, 0 ignored)
2025-02-20 22:06:21.430 UTC [17M] INFO:    [✓] FTLCONF_dns_upstreams is used
2025-02-20 22:06:21.431 UTC [17M] INFO:    [✓] FTLCONF_dns_queryLogging is used
2025-02-20 22:06:21.432 UTC [17M] INFO:    [✓] FTLCONF_dhcp_active is used
2025-02-20 22:06:21.532 UTC [17M] INFO: Wrote config file:
2025-02-20 22:06:21.532 UTC [17M] INFO:  - 152 total entries
2025-02-20 22:06:21.533 UTC [17M] INFO:  - 145 entries are default
2025-02-20 22:06:21.534 UTC [17M] INFO:  - 7 entries are modified
2025-02-20 22:06:21.535 UTC [17M] INFO:  - 2 entries are forced through environment
2025-02-20 22:06:21.692 UTC [17M] INFO: Parsed config file /etc/pihole/pihole.toml successfully
2025-02-20 22:06:21.694 UTC [17M] WARNING: Unable to read PID from file: No such file or directory
2025-02-20 22:06:21.695 UTC [17M] INFO: PID file does not exist or not readable 
2025-02-20 22:06:21.695 UTC [17M] INFO: No other running FTL process found.
2025-02-20 22:06:21.698 UTC [17M] WARNING: Insufficient permissions to set process priority to -10 (CAP_SYS_NICE required), process priority remains at 0
2025-02-20 22:06:21.740 UTC [17M] WARNING: Starting pihole-FTL as user root is not recommended
2025-02-20 22:06:21.743 UTC [17M] INFO: PID of FTL process: 17
c19a95121dff:/# echo $?                                                         
1

c19a95121dff:/# cat valgrind.log

==40== Memcheck, a memory error detector
==40== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==40== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
==40== Command: /sbin/ip address show
==40== Parent PID: 17
==40== 
==40== Syscall param write(buf) points to uninitialised byte(s)
==40==    at 0x405E868: ??? (in /lib/ld-musl-x86_64.so.1)
==40==    by 0x405B501: ??? (in /lib/ld-musl-x86_64.so.1)
==40==    by 0x40A1B43: ???
==40==  Address 0x1ffefffc61 is on thread 1's stack
==40==  Uninitialised value was created by a stack allocation
==40==    at 0x13B323: ??? (in /bin/busybox)
==40== 
==40== 
==40== HEAP SUMMARY:
==40==     in use at exit: 11,697 bytes in 13 blocks
==40==   total heap usage: 16 allocs, 3 frees, 28,127 bytes allocated
==40== 
==40== 268 (84 direct, 184 indirect) bytes in 1 blocks are definitely lost in loss record 8 of 11
==40==    at 0x48AD723: malloc (in /usr/libexvex amd64->IR: unhandled instruction bytes: 0xF4 0x80 0x3A 0x0 0x74 0x1 0xF4 0xBE 0x1 0x0
vex amd64->IR:   REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR:   VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE
vex amd64->IR:   PFX.66=0 PFX.F2=0 PFX.F3=0
==17== valgrind: Unrecognised instruction at address 0x7f53c7.
==17==    at 0x7F53C7: a_crash (atomic_arch.h:108)
==17==    by 0x7F53C7: get_nominal_size (meta.h:169)
==17==    by 0x7F53C7: __libc_free (free.c:110)
==17==    by 0x4CF4CE: free_frec (forward.c:3207)
==17==    by 0x4D2EAB: reply_query (forward.c:1353)
==17==    by 0x4BFA4F: main_dnsmasq (dnsmasq.c:1298)
==17==    by 0x4020C6: main (main.c:123)
==17== Your program just tried to execute an instruction that Valgrind
==17== did not recognise.  There are two possible reasons for this.
==17== 1. Your program has a bug and erroneously jumped to a non-code
==17==    location.  If you are running Memcheck and you just saw a
==17==    warning about a bad jump, it's probably your program's fault.
==17== 2. The instruction is legitimate but Valgrind doesn't handle it,
==17==    i.e. it's Valgrind's fault.  If you think this is the case or
==17==    you are not sure, please let us know and we'll try to fix it.
==17== Either way, Valgrind will now raise a SIGILL signal which will
==17== probably kill your program.
==17== 
==17== HEAP SUMMARY:
==17==     in use at exit: 0 bytes in 0 blocks
==17==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==17== 
==17== All heap blocks were freed -- no leaks are possible
==17== 
==17== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

@DL6ER
Copy link
Member

DL6ER commented Feb 21, 2025

We already have a proposed fix, could you please run

sudo pihole checkout ftl update/dnsmasq

to check if this fixes the bug for you?

@reneploetz
Copy link
Author

reneploetz commented Feb 21, 2025

I cannot use that command inside a Docker container, but I fetched the binary from here: https://github.com/pi-hole/FTL/actions/runs/13451409271 ("pihole-FTL-amd64-binary") and replaced the one inside the container with it.

Maybe for reproduction purposes:
I can reproduce this using "https://www.dnsleaktest.com" using the extended tests.
If relevant: the DNS chain I'm using is as follows: My system <-> unbound for DNS over TLS <-> Pihole <-> Unbound for Resolving against Cloudflare

2025-02-21 11:26:16.933 UTC [54M] INFO: ########## FTL started on b7056cd73ba1! ##########
2025-02-21 11:26:16.933 UTC [54M] INFO: FTL branch: update/dnsmasq
2025-02-21 11:26:16.933 UTC [54M] INFO: FTL version: vDev-a94e1d7
2025-02-21 11:26:16.933 UTC [54M] INFO: FTL commit: a94e1d70
2025-02-21 11:26:16.933 UTC [54M] INFO: FTL date: 2025-02-20 22:59:04 +0000
2025-02-21 11:26:16.933 UTC [54M] INFO: FTL user: pihole
2025-02-21 11:26:16.933 UTC [54M] INFO: Compiled for linux/amd64 (compiled on CI) using cc (Alpine 14.2.0) 14.2.0
2025-02-21 11:26:16.935 UTC [54M] INFO: 3 FTLCONF environment variables found (3 used, 0 invalid, 0 ignored)
2025-02-21 11:26:16.935 UTC [54M] INFO:    [✓] FTLCONF_dns_upstreams is used
2025-02-21 11:26:16.935 UTC [54M] INFO:    [✓] FTLCONF_dns_queryLogging is used
2025-02-21 11:26:16.935 UTC [54M] INFO:    [✓] FTLCONF_dhcp_active is used
2025-02-21 11:26:16.936 UTC [54M] INFO: Wrote config file:
2025-02-21 11:26:16.936 UTC [54M] INFO:  - 152 total entries
2025-02-21 11:26:16.936 UTC [54M] INFO:  - 144 entries are default
2025-02-21 11:26:16.936 UTC [54M] INFO:  - 8 entries are modified
2025-02-21 11:26:16.936 UTC [54M] INFO:  - 2 entries are forced through environment
2025-02-21 11:26:16.937 UTC [54M] INFO: Parsed config file /etc/pihole/pihole.toml successfully
2025-02-21 11:26:16.937 UTC [54M] INFO: PID file does not exist or not readable
2025-02-21 11:26:16.937 UTC [54M] INFO: No other running FTL process found.
2025-02-21 11:26:16.937 UTC [54M] WARNING: Insufficient permissions to set process priority to -10 (CAP_SYS_NICE required), process priority remains at 0
2025-02-21 11:26:16.938 UTC [54M] INFO: PID of FTL process: 54
2025-02-21 11:26:16.939 UTC [54M] INFO: listening on 0.0.0.0 port 53
2025-02-21 11:26:16.939 UTC [54M] INFO: listening on :: port 53
2025-02-21 11:26:16.939 UTC [54M] INFO: PID of FTL process: 54
2025-02-21 11:26:16.940 UTC [54M] INFO: Database version is 21
2025-02-21 11:26:16.940 UTC [54M] INFO: Database successfully initialized
2025-02-21 11:26:16.971 UTC [54M] WARNING: Insufficient permissions to set system time (CAP_SYS_TIME required), NTP client not available
2025-02-21 11:26:16.971 UTC [54/T83] INFO: NTP server listening on 0.0.0.0:123 (IPv4)
2025-02-21 11:26:16.971 UTC [54/T84] INFO: NTP server listening on :::123 (IPv6)
2025-02-21 11:26:16.971 UTC [54M] INFO: FTL is running as user pihole (UID 1000)
2025-02-21 11:26:16.971 UTC [54M] INFO: Reading certificate from /etc/pihole/tls.pem ...
2025-02-21 11:26:16.971 UTC [54M] INFO: Using SSL/TLS certificate file /etc/pihole/tls.pem
2025-02-21 11:26:16.971 UTC [54M] INFO: Web server ports:
2025-02-21 11:26:16.971 UTC [54M] INFO:   - 80 (HTTP, IPv4, optional)
2025-02-21 11:26:16.971 UTC [54M] INFO:   - 80 (HTTP, IPv6, optional)
2025-02-21 11:26:16.971 UTC [54M] INFO:   - 443 (HTTPS, IPv4, optional)
2025-02-21 11:26:16.971 UTC [54M] INFO:   - 443 (HTTPS, IPv6, optional)
2025-02-21 11:26:16.971 UTC [54M] INFO: Restored 0 API sessions from the database
2025-02-21 11:26:16.973 UTC [54M] INFO: Blocking status is enabled
2025-02-21 11:26:17.072 UTC [54/T85] INFO: Compiled 1 allow and 0 deny regex for 0 client in 0.1 msec
2025-02-21 11:26:38.715 UTC [54/T86] INFO: Debugger attached (180: gdb), entering dnsmasq debug mode
b7056cd73ba1:/# gdb -p $(cat /run/pihole-FTL.pid)
GNU gdb (GDB) 14.1
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-alpine-linux-musl".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 54
[New LWP 83]
[New LWP 84]
[New LWP 85]
[New LWP 86]
[New LWP 87]
[New LWP 88]
[New LWP 89]
[New LWP 90]
__cp_end () at src/thread/x86_64/syscall_cp.s:29

warning: 29     src/thread/x86_64/syscall_cp.s: No such file or directory
(gdb) continue
Continuing.
[Detaching after vfork from child process 188]
[Detaching after vfork from child process 189]
[Detaching after vfork from child process 190]
[Detaching after vfork from child process 191]

Thread 1 "pihole-FTL" received signal SIGSEGV, Segmentation fault.
get_nominal_size (p=0x7f24c105bed0 "\310A\230\307\001", end=0x7f24c105bedc "") at src/malloc/mallocng/meta.h:169
warning: 169    src/malloc/mallocng/meta.h: No such file or directory
(gdb) backtrace
#0  get_nominal_size (p=0x7f24c105bed0 "\310A\230\307\001", end=0x7f24c105bedc "") at src/malloc/mallocng/meta.h:169
#1  __libc_free (p=0x7f24c105bed0) at src/malloc/mallocng/free.c:110
#2  0x00000000007f4bac in free (p=<optimized out>) at src/malloc/free.c:5
#3  0x00000000004cf50f in free_frec (f=0x7f24c0f0a690) at /app/src/dnsmasq/forward.c:3209
#4  0x00000000004d1e40 in return_reply (now=now@entry=1740137284, forward=<optimized out>, header=header@entry=0x7f24c1191570, n=n@entry=102, 
    status=<optimized out>, status@entry=524288) at /app/src/dnsmasq/forward.c:1584
#5  0x00000000004d2f0c in reply_query (fd=42, now=now@entry=1740137284) at /app/src/dnsmasq/forward.c:1355
#6  0x00000000004bd1b7 in check_dns_listeners (now=now@entry=1740137284) at /app/src/dnsmasq/dnsmasq.c:1929
#7  0x00000000004bfa90 in main_dnsmasq (argc=<optimized out>, argv=<optimized out>) at /app/src/dnsmasq/dnsmasq.c:1298
#8  0x00000000004020c7 in main (argc=<optimized out>, argv=0x7ffe2dbf8338) at /app/src/main.c:123

@DL6ER
Copy link
Member

DL6ER commented Feb 21, 2025

FTL commit: a94e1d7

Okay, so this definitely includes the last fix. I will try about the same you did but I traveling right now so I cannot (read as: should not) reconfigure DNS resolution at home remotely and possibly get some people there upset because I break the Internet connection.

I can reproduce this using "https://www.dnsleaktest.com/" using the extended tests.

I ran the test three times now, it always returns only my residential IP address but never triggered a crash.

@reneploetz
Copy link
Author

No worries. My current setup is using caching on the unbound resolvers so it's not as if I have no connectivity at all. It's "only" that I can see the container restarting every once in a while - tough I cannot really say which request is causing this or if it's the amount of concurrent ones etc.

@PromoFaux
Copy link
Member

I cannot use that command inside a Docker container, but I fetched the binary from here: pi-hole/FTL/actions/runs/13451409271 ("pihole-FTL-amd64-binary") and replaced the one inside the container with it.

Just as an FYI You can build local images with alternative FTL binaries like so: https://docs.pi-hole.net/docker/build-image/#as-an-alternative-to-pihole-checkout

@DL6ER
Copy link
Member

DL6ER commented Feb 21, 2025

@reneploetz I pushed another commit, it'd be great if you could test this one, too. I am still failing to reproduce this crash myself.

edit What you did with valgrind seems to be correct but I'm very surprised there is so little output. Usually, these files tend to be very large.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants