Make PostgreSQL multi-threaded

I'm using this issue to keep track of things that we need to do to make PostgreSQL multi-threaded. To be updated as we go.

Discussion on pgsql-hackers: https://www.postgresql.org/message-id/flat/31cc6df9-53fe-3cd9-af5b-ac0d801163f4%40iki.fi

PG Wiki page: https://wiki.postgresql.org/wiki/Multithreading (mostly from pgconf.dev 2024)

Prior art: Konstantin's old branch: https://github.com/postgrespro/postgresql.pthreads

Some very preliminary hacking on: https://github.com/hlinnaka/postgres/tree/threading. I used similar approach to labeling all global variables as Konstantin.


see also: https://github.com/cmu-db/peloton/wiki/Postgres-Modifications


TODOs:

- [ ] label all global variables with markers like 'session_local', 'postmaster_guc' etc. to mark what they are used for.
- [ ] have a tool that checks that all global variables have been labelled
- [ ] extension support. Add something to control file to label extensions that can be run in multi-threaded mode or not
- [ ] lots more, add tasks here later

# Global variables

We have a lot of global and static variables:

$ objdump -t bin/postgres | grep -e "\.data" -e "\.bss" | grep -v 
"data.rel.ro" | wc -l
1666

Some of them are pointers to shared memory structures and can stay as 
they are. But many of them are per-connection state. The most 
straightforward conversion for those is to turn them into thread-local 
variables, like Konstantin did in [0].

It might be good to have some kind of a Session context struct that we 
pass everywhere, or maybe have a single thread-local variable to hold 
it. Many of the global variables would become fields in the Session. But 
that's future work.

# Extensions

A lot of extensions also contain global variables or other things that 
break in a multi-threaded environment. We need a way to label extensions 
that support multi-threading. And in the future, also extensions that 
*require* a multi-threaded server.


Let's add flags to the control file to mark if the extension is 
thread-safe and/or process-safe. If you try to load an extension that's 
not compatible with the server's mode, throw an error.

We might need new functions in addition _PG_init, called at connection 
startup and shutdown. And background worker API probably needs some changes.

# Exposed PIDs


We expose backend process PIDs to users in a few places. 
pg_stat_activity.pid and pg_terminate_backend(), for example. They need 
to be replaced, or we can assign a fake PID to each connection when 
running in multi-threaded mode.


# Signals


We use signals for communication between backends. SIGURG in latches, 
and SIGUSR1 in procsignal, for example. Those primitives need to be 
rewritten with some other signalling mechanism in multi-threaded mode. 
In principle, it's possible to set per-thread signal handlers, and send 
a signal to a particular thread (pthread_kill), but I think it's better 
to just rewrite them.


We also document that you can send SIGINT, SIGTERM or SIGHUP to an 
individual backend process. I think we need to deprecate that, and maybe 
come up with some convenient replacement. E.g. send a message with 
backend ID to a unix domain socket, and a new pg_kill executable to send 
those messages.


# Restart on crash


If a backend process crashes, postmaster terminates all other backends 
and restarts the system. That's hard (impossible?) to do safely if 
everything runs in one process. We can continue have a separate 
postmaster process that just monitors the main process and restarts it 
on crash.


# Thread-safe libraries


Need to switch to thread-safe versions of library functions, e.g. 
uselocale() instead of setlocale().


The Python interpreter has a Global Interpreter Lock. It's not possible 
to create two completely independent Python interpreters in the same 
process, there will be some lock contention on the GIL. Fortunately, the 
python community just accepted https://peps.python.org/pep-0684/. That's 
exactly what we need: it makes it possible for separate interpreters to 
have their own GILs. It's not clear to me if that's in Python 3.12 
already, or under development for some future version, but by the time 
we make the switch in Postgres, there probably will be a solution in 
cpython.


At a quick glance, I think perl and TCL are fine, you can have multiple 
interpreters in one process. Need to check any other libraries we use.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make PostgreSQL multi-threaded #296

Global variables

Extensions

Exposed PIDs

Signals

Restart on crash

Thread-safe libraries

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Make PostgreSQL multi-threaded #296

Description

Global variables

Extensions

Exposed PIDs

Signals

Restart on crash

Thread-safe libraries

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions