-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Add new custom event loop for Linux I/O layer #456
base: master
Are you sure you want to change the base?
Conversation
4cf0f12
to
24680d8
Compare
@decarv Rebase, squash, and CI |
24680d8
to
45da25d
Compare
45da25d
to
87204f9
Compare
@decarv Please, look at the CI |
Please, see if you can get the liburing version check in place as well |
f57a6f3
to
4a8a32c
Compare
4a8a32c
to
4258ac0
Compare
Issue with shutdown ? |
Yes. I will investigate shutdown issues. However, from the CI output it could be anything, really. |
Try and a TRACE log file for gcc and clang, and see if there are pointers there... |
e1b42f5
to
ceb9ec3
Compare
Please take a look at the latest commits. I decided to keep kqueue in this same PR for simplicity. I have just setup the testing environment for kqueue, so I haven't tested anything yet. |
src/include/ev.h
Outdated
{ | ||
enum ev_type type; /**< Event type. */ | ||
int slot; /**< *CURRENTLY UNUSED* Slot number associated with the poll. */ | ||
#if HAVE_URING |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a union ?
Just a quick pass on the first part of the patch in order to give some comments that are probably general for the entire patch |
I still have issues with ipv6 listening addresses: pgagroal is unresponsive if |
I will investigate that further to understand what is happening. I don't see how ev.c is doing this (although it has to be it).
Thank you. Fixed. Already on the next commit. |
ceb9ec3
to
d5d6680
Compare
I think that setting
So, Edit: I see this with both |
What does
mean ? |
@fluca1978 I didn't pay attention to that. Sorry. Can you please try to reproduce again and show the steps? Maybe share the .conf and .databases? I cannot reproduce. That output happens when pgbench run is cancelled with Ctrl+C. That happens upstream also. @jesperpedersen Do you mean have "FD: -1" mid run when maxing out the connections? I don't see it happening here. Can you share more information? |
Just that one child exited with exit status 0. That message is printed by the main loop after receiving a SIGCHLD. |
No, I mean after the run, so if the run is short enough it means that the connections havn't timed out, which means that only |
@decarv It could be a related issue since if you force it with Anyway, test with |
So, it is more like TRACE ? Or is it important to keep at this level ? |
It s a trace. I will change it.
Sorry. I still do not understand. Luca's connections have all been used. I will do a run with
|
@decarv I mean, when I do a similar run locally on my machine -- not Luca's actual run |
So you mean debug build, max_connections = 8, and limit file for the db I'm testing: 8 8 0. |
Yes, but We need 8 valid ones |
Here it is a run, built in debug mode, with databases configured as
I don't see any And here follows the test with
after a fresh start of both PostgreSQL and |
I took some time to test commit 2e31f0b on FreeBSD, with the same configuration as above ( When using a parallelism of
but note the
However, if I increase
|
Thanks a lot @fluca1978. |
This is the test of 2e31f0b on Rocky Linux 9. The first run, after restart of postgresql and a fresh start of pgagroal, went fine.
First run
Second run
Compilationpgagroal has been compiled with:
I attach the log and my configuration. |
@fluca1978 thanks for that. I was able to reproduce this with I have also configured my ssl on postgres. That does not seem to be the issue though. By the way, this is my postgres.conf file. It is pretty default. If you have any suggestions for improvement or changes to improve testing and align my setup more closely with yours, please let me know. |
Great to know you are on the right track!
My ssl configuration is turned off:
so I don't think either it is related to ssl, at this point.
Apart ssl, the only thing different in my setup is that I've PL/Java installed, even if not in the database used to test with pgbench and pgagroal:
but even disabling I can confirm that epoll is working fine, i.e., can run always without errors. |
Think I have found the issue. Reads are blocking waiting for server response in pgagroal_write_discard_all (and potentially other functions). I think io_uring might be picking up the receive. I will try to fix it by tearing it down after ev_loop_break. I will update you. |
But, tearing it down after ev_loop_break will break pooling, right ? |
I meant on the worker. The worker tears it and reconstructs it every time.
Having a per process ring is very cheap.
…On Sat, 22 Feb 2025 at 06:58 Jesper Pedersen ***@***.***> wrote:
But, tearing it down after ev_loop_break will break pooling, right ?
—
Reply to this email directly, view it on GitHub
<#456 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALHAW2OLXBECAM63CQNZ7MT2RBC4FAVCNFSM6AAAAABMPLHTZ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNZWGEZDOOJTGA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
[image: jesperpedersen]*jesperpedersen* left a comment
(agroal/pgagroal#456)
<#456 (comment)>
But, tearing it down after ev_loop_break will break pooling, right ?
—
Reply to this email directly, view it on GitHub
<#456 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALHAW2OLXBECAM63CQNZ7MT2RBC4FAVCNFSM6AAAAABMPLHTZ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNZWGEZDOOJTGA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Introduce ev.h and ev.c, establishing the foundation for the new custom event loop, `pgagroal_ev`. Replace previous dependencies on libev with the custom event loop. For Linux, implement support for io_uring with a fallback to epoll if io_uring is unavailable. For BSD, implement support for kqueue. Changelog ========= 2025-02-24: - Fix "hang" after running pgbench, caused by using both read_message directly on sockets and io_uring. For this, I modified `worker.c`. 2025-02-13: - Fix ev_backend log in `configuration.c`. We cannot use pgagroal_log_* functions during read configuration because the log file is not set yet. Before this change, setting log type to file leads to segmentation fault. 2025-01-18: - delete unneeded file `bidi.c`. - cleanup 2025-01-17: - Fix regression added to the Debug build. `waitpid()` inside `#if DEBUG` would block waiting for the running child processes. Add WNOHANG to the function options so the main process does not block. - Remove '\n' from `pagroal_log_*()` - Run uncrustify.sh - Remove `printf()` debugging - Modify CD/CI pipeline to improve function verify_running and enable the CD/CI for kqueue, which was disabled. - Change `pgagroal_log_error()` that informs errors in `kill()` to `pgagroal_log_debug()`. I suppose these errors should happen if pgagroal is not closely following up the death of its children. Therefore, there is no need to flood the log with error messages. Keep as debug because it is informative of something that could be improved. 2025-01-16: - Remove `pgagroal_ev_*_stop()` functions from main.c and vault.c as these functions are not intended to be called from a child process inheriting the ev loop. This does not make sense in io_uring and for epoll and kqueue, just closing the fds is enough as a precaution to prevent their use. - Enhance debugging steps in `sigchld_handler()` to help debug child processes termination. - Add error handling to `kill()` to help debug child processes termination. - Allow return error on `pgagroal_ev_io_stop()` instead of exit so the caller function can handle errors appropriately. - Add a call to `setpgid()` when worker is created to ensure child processes do not exit immediately if a SIGINT is issued in the controlling terminal. - Add a call to `pgagroal_ev_fork()` when a worker is created. 2025-01-13: - Fix ASan report in `pgagroal_ev_io_stop()`
4432229
to
63b8ec5
Compare
I will fix CI shortly. I think a new version of postgres was released and I cannot assume the same directories exist. Anyhow, @fluca1978, can you test commit 63b8ec5? The changelog will be posted in the first comment. Edit.: |
@decarv works on my side, great job! I'm able to run without any error the following any time I want:
I've tried building in debug and production mode, both work (I tested production to have more connections than the default 8). |
@decarv You can just make sure that PostgreSQL 17 is used for all testing - maybe get ideas from pgmoneta; like https://github.com/pgmoneta/pgmoneta/blob/main/.github/workflows/ci.yml |
Great! :)
Nothing specific, feel free to test it as a whole by running your usual workload. Try to put some pressure on it, try cancelling in the middle to see how it recovers, etc. A while ago I started typing these tests on a shell script so we can test the event loop. I wil finish that.
Good! Will take a look. IIRC, currently we test start, stop, one pgbench run. Let's see if we can improve that. |
I found a possible problem. Test (interrupted):
and now repeat:
Excerpt from
I tried to wait a few minutes, but the status remains the same. Running a
|
Yes. I could reproduce this with master branch as well. Can you confirm? With io_uring it hangs on the same place as before. With epoll, on epoll_wait. |
Introduce ev.h and ev.c, establishing the
foundation for the new custom event loop,
pgagroal_ev
.Replace previous dependencies on libev with the
custom event loop.
Implement support for io_uring with a fallback to
epoll if io_uring is unavailable.
Changelog
63b8ec5 (2025-02-24):
read_message directly on sockets and io_uring. For this, I
modified
worker.c
. I couldn't simply use pgagroal_ev_loop_destroy before the last exchange of messages between the worker and the server because the worker rely on allocd memory of which pgagroal_ev_loop_destroy gets rid. So epoll and io_uring will have to be treated differently until I find a sane way to do that.2025-02-13:
configuration.c
. We cannot usepgagroal_log_* functions during read configuration because the
log file is not set yet. Before this change, setting log type to
file leads to segmentation fault.
2025-01-18:
bidi.c
.2025-01-17:
waitpid()
inside
#if DEBUG
would block waiting for the runningchild processes. Add WNOHANG to the function options so the
main process does not block.
pagroal_log_*()
printf()
debuggingenable the CD/CI for kqueue, which was disabled.
pgagroal_log_error()
that informs errors inkill()
to
pgagroal_log_debug()
. I suppose these errors should happenif pgagroal is not closely following up the death of its children.
Therefore, there is no need to flood the log with error messages.
Keep as debug because it is informative of something that could
be improved.
2025-01-16:
pgagroal_ev_*_stop()
functions from main.c and vault.cas these functions are not intended to be called from a child
process inheriting the ev loop. This does not make sense in
io_uring and for epoll and kqueue, just closing the fds is enough
as a precaution to prevent their use.
sigchld_handler()
to help debugchild processes termination.
kill()
to help debug child processestermination.
pgagroal_ev_io_stop()
instead of exitso the caller function can handle errors appropriately.
setpgid()
when worker is created to ensurechild processes do not exit immediately if a SIGINT is issued
in the controlling terminal.
pgagroal_ev_fork()
when a worker is created.2025-01-13:
pgagroal_ev_io_stop()