-
-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FTL v54b4ad93 crashes sometimes (several times today) for unknown reasons #2112
Comments
Could you please run https://deploy-preview-338--pihole-docs.netlify.app/ftldns/debugging/ has all the details incl. a special section how to do it inside a |
I can see the same happening both for nightly and development tags. This is what is shown after the mentioned debugging procedure inside the docker container:
It doesn't look very helpful. |
Seems, you need to need to add |
I have another with
|
Okay, so |
This issue has been mentioned on Pi-hole Userspace. There might be relevant details there: https://discourse.pi-hole.net/t/pi-hole-stopped-responding/73931/4 |
Thanks for your patience. However, when also adding
|
This means there was no crash. Why FTL exited with exit code 01 will be found in |
This is what the log contains:
Note, that this happens on the current development image, as well as yesterday's and today's nightly image. The log above is from the latest nightly because while it crashes, FTL recovers quicker while development does not and sometimes takes down the container. |
from e.g. the first line says the crash happened in process with PID 3270 which is a Fork of process 52. Unfortunately,
´The first crash you have reported here had the crash in According to the link above, Linux supports setting
allowing edit Clarified above and suggested a different command to run |
might work, too, but I am not sure if suspending the main process is still a healthy system. It may have other consequences. Easiest would probably be waiting until we get a crash not in a fork so |
With
With
|
Maybe it's a good idea to create a development build that includes debug symbols just to have a valid stack trace output? |
Here is "my" crash:
|
We already have the full set of debug symbols in the release builds, otherwise, you would not have seen the function names and related code lines in:
But this also shows that following forks/children won't work for us here. The location at where you entered @schuettecarsten Could you attach the debugger as well? This latest crash was in a fork, too, however, the first one (the very first post in this issue ticket) was in a normal thread of the main process ( |
Maybe here:
or
or
I'm not experienced with gdb. Thank you for your patience. If these backtraces are not helpful, I apologize |
Signal 17 is |
@bermeitinger-b I don't think there is anything wrong with the network table TBH, my current assumption is that somehow handling of the strings got broken and your database got populated with garbage data. This would likely explain the crash you are seeing, too. But I am still not sure how/where it happens. I have meanwhile manually inspected the entire string processing code paths twice and did not find anything odd. Please also enable |
Here an excerpt after flushing the network table:
|
Yes, this brings us a lot closer. Please restart
you can try to search for the first occurrence of this strange string like
|
It'd also be helpful if you could run
and see if any errors regarding hash tables are reported in |
Thanks. I've restarted and it prints a lot of
Typically, this is an indicator that it's not a literal E. After many of these lines, it looks the output changes:
About the hash collisions:
Seems okay, however, I'm not sure why it scans 100 clients. There should only be 3 (localhost, DoH, DoT). |
Looking at the logs, we are in a bit of a chicken-and-egg-problem as the database already has some broken stuff which then "contaminates" FTL's strings on the history import during startup. I'd suggest to disable database importing for the moment,
and then restart again. You may also want to clean out the log file first to get rid of the binary stuff that is in there right now. edit The 100 clients come about because FTL detects clients based on their (string) IP address and when the IP address is garbage, a new client is added for each new garbage string. |
I've deleted the old database and started new. It generated the following log:
The rest is the same as in the first logs above. As above, it did not restart automatically. The log file is not binary anymore, so the network table is looking correctly. |
Did you have the debugger |
Maybe:
and right after:
Then, the whole container dies. Edit: After some time, the glyphs reappear in the network table. I don't have a backtrace for those but will check the log. |
I fully understand and I'm thankful for your deep investigation. I can reproduce the crash by deleting |
FTL crashed for unknown reasons several times today.
Here is a full log;
The text was updated successfully, but these errors were encountered: