Skip to content

Commit 0929b1d

Browse files
authored
Merge pull request #3642 from m-ou-se/thread-spawn-hook
[RFC] Thread spawn hook (inheriting thread locals)
2 parents 273e41a + dde6144 commit 0929b1d

File tree

1 file changed

+278
-0
lines changed

1 file changed

+278
-0
lines changed

text/3642-thread-spawn-hook.md

+278
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,278 @@
1+
- Feature Name: `thread_spawn_hook`
2+
- Start Date: 2024-05-22
3+
- RFC PR: [rust-lang/rfcs#3642](https://github.com/rust-lang/rfcs/pull/3642)
4+
- Rust Issue: [rust-lang/rust#132951](https://github.com/rust-lang/rust/issues/132951)
5+
6+
# Summary
7+
8+
Add `std::thread::add_spawn_hook` to register a hook that runs for newly spawned threads.
9+
This will effectively provide us with "inheriting thread locals", a much requested feature.
10+
11+
```rust
12+
thread_local! {
13+
static MY_THREAD_LOCAL: Cell<u32> = Cell::new(0);
14+
}
15+
16+
std::thread::add_spawn_hook(|_| {
17+
// Get the value of X in the spawning thread.
18+
let value = MY_THREAD_LOCAL.get();
19+
20+
// Set the value of X in the newly spawned thread.
21+
move || MY_THREAD_LOCAL.set(value)
22+
});
23+
```
24+
25+
# Motivation
26+
27+
Thread local variables are often used for scoped "global" state.
28+
For example, a testing framework might store the status or name of the current
29+
unit test in a thread local variable, such that multiple tests can be run in
30+
parallel in the same process.
31+
32+
However, this information will not be preserved across threads when a unit test
33+
will spawn a new thread, which is problematic.
34+
35+
The solution seems to be "inheriting thread locals": thread locals that are
36+
automatically inherited by new threads.
37+
38+
However, adding this property to thread local variables is not easily possible.
39+
Thread locals are initialized lazily. And by the time they are initialized, the
40+
parent thread might have already disappeared, such that there is no value left
41+
to inherit from.
42+
Additionally, even if the parent thread was still alive, there is no way to
43+
access the value in the parent thread without causing race conditions.
44+
45+
Allowing hooks to be run as part of spawning a thread allows precise control
46+
over how thread locals are "inherited".
47+
One could simply `clone()` them, but one could also add additional information
48+
to them, or even add relevant information to some (global) data structure.
49+
50+
For example, not only could a custom testing framework keep track of unit test
51+
state even across spawned threads, but a logging/debugging/tracing library could
52+
keeps track of which thread spawned which thread to provide more useful
53+
information to the user.
54+
55+
# Public Interface
56+
57+
For adding a hook:
58+
59+
```rust
60+
// In std::thread:
61+
62+
/// Registers a function to run for every newly thread spawned.
63+
///
64+
/// The hook is executed in the parent thread, and returns a function
65+
/// that will be executed in the new thread.
66+
///
67+
/// The hook is called with the `Thread` handle for the new thread.
68+
///
69+
/// The hook will only be added for the current thread and is inherited by the threads it spawns.
70+
/// In other words, adding a hook has no effect on already running threads (other than the current
71+
/// thread) and the threads they might spawn in the future.
72+
///
73+
/// The hooks will run in order, starting with the most recently added.
74+
///
75+
/// # Usage
76+
///
77+
/// ```
78+
/// std::thread::add_spawn_hook(|_| {
79+
/// ..; // This will run in the parent (spawning) thread.
80+
/// move || {
81+
/// ..; // This will run it the child (spawned) thread.
82+
/// }
83+
/// });
84+
/// ```
85+
///
86+
/// # Example
87+
///
88+
/// A spawn hook can be used to "inherit" a thread local from the parent thread:
89+
///
90+
/// ```
91+
/// use std::cell::Cell;
92+
///
93+
/// thread_local! {
94+
/// static X: Cell<u32> = Cell::new(0);
95+
/// }
96+
///
97+
/// // This needs to be done once in the main thread before spawning any threads.
98+
/// std::thread::add_spawn_hook(|_| {
99+
/// // Get the value of X in the spawning thread.
100+
/// let value = X.get();
101+
/// // Set the value of X in the newly spawned thread.
102+
/// move || X.set(value)
103+
/// });
104+
///
105+
/// X.set(123);
106+
///
107+
/// std::thread::spawn(|| {
108+
/// assert_eq!(X.get(), 123);
109+
/// }).join().unwrap();
110+
/// ```
111+
pub fn add_spawn_hook<F, G>(hook: F)
112+
where
113+
F: 'static + Send + Sync + Fn(&Thread) -> G,
114+
G: 'static + Send + FnOnce();
115+
```
116+
117+
And for opting out when spawning a hook:
118+
119+
```rust
120+
// In std::thread:
121+
122+
impl Builder {
123+
/// Disables running and inheriting [spawn hooks](add_spawn_hook).
124+
///
125+
/// Use this if the parent thread is in no way relevant for the child thread.
126+
/// For example, when lazily spawning threads for a thread pool.
127+
pub fn no_hooks(mut self) -> Builder;
128+
}
129+
```
130+
131+
# Implementation
132+
133+
The implementation is a *thread local* linked list of hooks, which is inherited by newly spawned threads.
134+
This means that adding a hook will only affect the current thread and all (direct and indirect) future child threads of the current thread.
135+
It will not globally affect all already running threads.
136+
137+
Functions that spawn a thread, such as `std::thread::spawn` will eventually call
138+
`spawn_unchecked_`, which will call the hooks in the parent thread, after the
139+
child `Thread` object has been created, but before the child thread has been
140+
spawned. The resulting `FnOnce` objects are stored and passed on to the child
141+
thread afterwards, which will execute them one by one before continuing with its
142+
main function.
143+
144+
# Downsides
145+
146+
- The implementation requires allocation for each hook (to store them in the
147+
list of hooks), and an allocation each time a hook is spawned
148+
(to store the resulting closure).
149+
150+
- A library that wants to make use of inheriting thread locals will have to
151+
register a global hook (e.g. at the start of `main`),
152+
and will need to keep track of whether its hook has already been added.
153+
154+
- The hooks will not run if threads are spawned through e.g. pthread directly,
155+
bypassing the Rust standard library.
156+
(However, this is already the case for output capturing in libtest:
157+
that does not work across threads when not spawned by libstd.)
158+
159+
# Rationale and alternatives
160+
161+
## Global vs thread local effect
162+
163+
Unlike e.g. libc's `atexit()`, which has a global effect, `add_spawn_hook` has a thread local effect.
164+
165+
This means that adding a hook will only affect the current thread and all (direct and indirect) future child threads of the current thread.
166+
In other words, adding a hook has no effect on already running threads (other than the current thread) and the threads they might spawn in the future.
167+
168+
An alternative could be to have a global set of hooks that affects all newly spawned threads, on any existing and future thread.
169+
170+
Both are relatively easy and efficient to implement (as long as removing hooks
171+
is not an option).
172+
173+
The global behavior was proposed in an earlier version of this RFC,
174+
but the library-api team expressed a preference for exploring a "more local" solution.
175+
176+
Having a "lexicographically local" solution doesn't seem to be possible other than for scoped threads, however,
177+
since threads can outlive their parent thread and then spawn more threads.
178+
179+
A thread local effect (affecting all future child threads) seems to be the most "local" behavior we can achieve here.
180+
181+
## Add but no remove
182+
183+
Having only an `add_spawn_hook` but not a `remove_spawn_hook` keeps things
184+
simple, by not needing a way to identify a specific hook (through a
185+
handle or a name).
186+
187+
If a hook only needs to execute conditionally, one can make use of an
188+
`if` statement.
189+
190+
If no hooks should be executed or inherited, one can use `Builder::no_hooks`.
191+
192+
## Requiring storage on spawning
193+
194+
Because the hooks run on the parent thread first, before the child thread is
195+
spawned, the results of those hooks (the functions to be executed in the child)
196+
need to be stored. This will require heap allocations (although it might be
197+
possible for an optimization to save small objects on the stack up to a certain
198+
size).
199+
200+
An alternative interface that wouldn't require any store is possible, but has
201+
downsides. Such an interface would spawn the child thread *before* running the
202+
hooks, and allow the hooks to execute a closure on the child (before it moves on
203+
to its main function). That looks roughly like this:
204+
205+
```rust
206+
std::thread::add_spawn_hook(|child| {
207+
// Get the value on the parent thread.
208+
let value = MY_THREAD_LOCAL.get();
209+
// Set the value on the child thread.
210+
child.exec(|| MY_THREAD_LOCAL.set(value));
211+
});
212+
```
213+
214+
This could be implemented without allocations, as the function executed by the
215+
child can now be borrowed from the parent thread.
216+
217+
However, this means that the parent thread will have to block until the child
218+
thread has been spawned, and block for each hook to be finished on both threads,
219+
significantly slowing down thread creation.
220+
221+
Considering that spawning a thread involves several allocations and syscalls,
222+
it doesn't seem very useful to try to minimize an extra allocation when that
223+
comes at a significant cost.
224+
225+
## `impl` vs `dyn` in the signature
226+
227+
An alternative interface could use `dyn` instead of generics, as follows:
228+
229+
```rust
230+
pub fn add_spawn_hook<F, G>(
231+
hook: Box<dyn Send + Sync + Fn(&Thread) -> Box<dyn FnOnce() + Send>>
232+
);
233+
```
234+
235+
However, this mostly has downsides: it requires the user to write `Box::new` in
236+
a few places, and it prevents us from ever implementing some optimization tricks
237+
to, for example, use a single allocation for multiple hook results.
238+
239+
## A regular function vs some lang feature
240+
241+
Just like `std::panic::set_hook`, `std::thread::add_spawn_hook` is just regular function.
242+
243+
An alternative would be to have some special attribute, like `#[thread_spawn_hook]`,
244+
similar to `#[panic_handler]` in `no_std` programs, or to make use of
245+
a potential future [global registration feature](https://github.com/rust-lang/rust/issues/125119).
246+
247+
While such things might make sense in a `no_std` world, spawning threads (like
248+
panic hooks) is an `std` only feature, where we can use global state and allocations.
249+
250+
The only potential advantage of such an approach might be a small reduction in overhead,
251+
but this potential overhead is insignificant compared to the overall cost of spwaning a thread.
252+
253+
The downsides are plenty, including limitations on what your hook can do and return,
254+
needing a macro or special syntax to register a hook, potential issues with dynamic linking,
255+
additional implementation complexity, and possibly having to block on a language feature.
256+
257+
# Unresolved questions
258+
259+
- Should the return value of the hook be an `Option`, for when the hook does not
260+
require any code to be run in the child?
261+
262+
- Should the hook be able to access/configure more information about the child
263+
thread? E.g. set its stack size.
264+
(Note that settings that can be changed afterwards by the child thread, such as
265+
the thread name, can already be set by simply setting it as part of the code
266+
that runs on the child thread.)
267+
268+
# Future possibilities
269+
270+
- Using this in libtest for output capturing (instead of today's
271+
implementation that has special hardcoded support in libstd).
272+
273+
# Relevant history
274+
275+
- The original reason I wrote [RFC 3184 "Thread local Cell methods"](https://github.com/rust-lang/rfcs/pull/3184)
276+
was to simplify thread spawn hooks (which I was experimenting with at the time).
277+
Without that RFC, thread spawn hooks would look something like `let v = X.with(|x| x.get()); || X.with(|x| x.set(v))`, instead of just `let v = X.get(); || X.set(v)`,
278+
which is far less ergonomic (and behaves subtly differently). This is the reason I waited with this RFC until that RFC was merged and stabilized.

0 commit comments

Comments
 (0)