Add relaxed atomics #13822

maia-s · 2025-08-28T13:10:19Z

This PR adds relaxed versions of all the atomic functions to SDL. This is useful for #13806 and also for apps in general for when you want to access an atomic without any other synchronization, which can be faster (e.g. on ARM). Relaxed functions are currently implemented for GCC, Clang and MSVC (ARM only), and fall back to the regular synced version if relaxed atomics aren't available on the current platform.

This hardcodes relaxed memory ordering, similar to how the current functions hardcode seqcst. As an alternative to this, we could add a memory ordering enum and functions that take that as an argument. That'd let users use acquire and release ordering too, which is often desirable.

src/dynapi/SDL_dynapi_procs.h

slouken · 2025-08-28T14:33:46Z

I'm not sure these are worth adding a whole bunch of API entry points to SDL. @icculus, thoughts?

maia-s · 2025-08-28T14:49:27Z

Adding the memory ordering as arguments would be more generally useful to apps, but of course we'd have to add the enum too for that, and some logic for MSVC to choose the right functions

In general you usually want to use Acquire/Release for atomics, but it doesn't make a difference on x86 since strong ordering comes for free there

icculus · 2025-08-28T16:18:55Z

@icculus, thoughts?

Isn't a large part of the reason for atomics to guarantee memory ordering? Help me understand the value of this.

maia-s · 2025-08-28T18:44:57Z

Yes, but the issue is that SDL currently only allows for one kind of memory ordering, the strongest one (sequentially consistent ordering, or SeqCst for short). This isn't really an issue on X86/X86-64, because those processors do SeqCst automatically, and so using SeqCst atomics doesn't cost anything extra on X86, but it can make a big difference on ARM/ARM64 and other processors. (Apple's M-processors have a feature to enable automatic SeqCst ordering in Rosetta 2 to make X86-64 emulation more efficient, but that can't be enabled for normal code AFAIK.)

Memory ordering is easiest to explain if you think about atomics as synchronizing TWO pieces of data:

The atomic variable itself
Other data. This is further split into:
2a. Data written to before the atomic is accessed
2b. Data read from after the atomic is accessed

The atomic variable itself is always synchronized, and you'll never get a partially synchronized value. Once a value is stored to an atomic variable, you'll get that value when you load that atomic variable in the same or another thread. This is the same for all memory orderings. Note that this does not on its own guarantee ordering of anything else with respect to that atomic, not even other atomics.

The SeqCst ordering synchronizes everything. This is the only ordering SDL supports today. When you access an atomic with SeqCst, it acts as a total memory barrier. This costs nothing more than usual on X86 as explained above, but can be expensive on other archs like ARM.

At the other extreme is the Relaxed ordering (I'll capitalize the orderings to distinguish them from the regular english words). Relaxed ordering synchronizes ONLY the atomic variable itself. Other memory is not synchronized at all. On ARM64, this makes a big difference: Accessing a Relaxed atomic is exactly as efficient as accessing normal memory (on 32-bit ARM I think it's slightly less efficient, but still better than a full sync). With relaxed ordering, atomic loads and stores can be reordered and even omitted if one is determined to be redundant, but the atomic variable itself is consistent in all threads. (In particular, using only Relaxed ordering, if you set atomic variable A before atomic variable B, and then in another thread you read atomic variable B and determine it had been set, it's not guaranteed that reading atomic variable A after will get the value that was, in code, written to it before writing to B, because it might have been reordered, but it will have either its old value or its new value, and once it is set it will sync the new value)

There's two other memory orderings of note, Acquire and Release, which work together. Atomic load operations with the Acquire ordering synchronizes with atomic store operations on the same atomic variable with Release ordering, such that anything written to memory before the atomic store operation (2a) is available after the atomic load operation (2b), but only when Acquire and Release is used like that on the same atomic variable. If you use Acquire on one atomic and Release on another it doesn't mean anything with regards to synchronization.

(There's another ordering called Consume, but it's been deprecated, so I won't talk about that)

Acquire/Release is usually what you want when you want to sync data using atomics. It's faster than SeqCst on non-X86 because it's just one memory read barrier and one memory write barrier instead of two full memory barriers.

Relaxed ordering is not useful for data synchronization (2), but it's still useful for synchronizing the atomic itself (1). In SDL, we could use this e.g. for accessing the main thread id in SetMainReady/IsMainThread, or for reading and initializing the first timestamp in GetTicks/NS. Those only need synchronization of the atomic itself, so the memory barriers don't do anything useful, and skipping them makes it as fast as a regular variable on ARM64 in particular and still much faster than SeqCst on other archs.

After thinking on it a bit, I think it'd be nice to expose the memory ordering as an argument for these functions instead of restricting it to Relaxed only like the PR does currently. Later on, the implementation of SDL itself could also benefit from this on ARM and other platforms by using Acquire/Release instead of SeqCst.

maia-s · 2025-08-29T12:19:26Z

I made a test program to demonstrate. You'll have to run this on ARM or other non-x86 arch to get meaningful results*. Compile with either USE_SEQ_CST, USE_ACQ_REL or USE_RELAXED defined.

On macos with M2 Pro, best out of 1000 runs:

USE_SEQ_CST: 911 917 ns
USE_ACQ_REL: 888 042 ns (not a big difference, but there's nothing to sync here)
USE_RELAXED: 485 333 ns (about half of USE_SEQ_CST)

(* Actually SEQ_CST is significantly slower than the other two on my linux x86-64 laptop, but I'm not sure why)

#include <SDL3/SDL.h>
#include <SDL3/SDL_main.h>
#include <stdio.h>

#define ITERATIONS 1000000

#ifdef USE_SEQ_CST
#define LOAD_ORDERING __ATOMIC_SEQ_CST
#define STORE_ORDERING __ATOMIC_SEQ_CST
#elif defined(USE_ACQ_REL)
#define LOAD_ORDERING __ATOMIC_ACQUIRE
#define STORE_ORDERING __ATOMIC_RELEASE
#elif defined(USE_RELAXED)
#define LOAD_ORDERING __ATOMIC_RELAXED
#define STORE_ORDERING __ATOMIC_RELAXED
#else
#error "define one of USE_SEQ_CST, USE_ACQ_REL or USE_RELAXED"
#endif

static int atomic;

static int thread_fn(void* data) {
    (void)data;
    for (int i = 0; i < ITERATIONS; ++i) {
        __atomic_store_n(&atomic, i, STORE_ORDERING);
        __asm__ volatile(""); // prevent optimizing out redundant relaxed stores
    }
    return 0;
}

int main(int argc, char* argv[]) {
    (void)argc;
    (void)argv;

    if (!SDL_Init(0)) {
        fprintf(stderr, "SDL_Init failed: %s\n", SDL_GetError());
        return 1;
    }

    Uint64 t0 = SDL_GetTicksNS();

    SDL_Thread* thread = SDL_CreateThread(thread_fn, "store", NULL);
    if (!thread) {
        fprintf(stderr, "SDL_CreateThread failed: %s\n", SDL_GetError());
        SDL_Quit();
        return 1;
    }

    while (__atomic_load_n(&atomic, LOAD_ORDERING) != ITERATIONS - 1) {}

    Uint64 ns = SDL_GetTicksNS() - t0;
    printf("%llu ns\n", (unsigned long long)ns);

    SDL_DetachThread(thread);
    SDL_Quit();
    return 0;
}

icculus · 2025-08-29T14:12:33Z

Okay, I'm sold, this sounds useful.

icculus · 2025-08-29T14:14:04Z

If we're being honest, most if not all of our own internal uses only need Relaxed atomics, too, I suspect.

maia-s · 2025-08-29T16:24:00Z

Thanks! Do you want me to add an argument for the ordering so Acquire/Release can also be supported without more API symbols, or is this good as is? (I accidentally disabled atomic_load support for PS2 so I'll push a fix for that in a bit. update: also rebased to current main)

icculus · 2025-09-01T17:21:26Z

I think I wouldn't complicate it with the extra parameter (and if we want acquire/release later, we should add new symbols at that point too).

nfries88 · 2025-09-03T03:07:48Z

Acquire and Release memory orders are the minimum required for spinlocks (locking is CAS-Acquire and unlocking is Store-Release) and several commonly used lockfree data structures can also get away without SeqCst but require more than Relaxed. While Relaxed is indeed useful for things like one-time initialization accesses it's pretty limited in utility elsewhere without also having the other memory orders. My recommendation would be to add them all.

maia-s · 2025-09-03T06:40:48Z

I can add those if Sam and Ryan wants that.

Compare and swap would be a bit awkward without ordering arguments since it takes two, one for success and one for failure. E.g. you can use Release on success and Relaxed on failure so you don't pay for a sync if it's not needed. Tbf it'd be a bit awkward anyway since the orderings should be compile time constants or they may fall back to SeqCst.

icculus · 2025-09-04T22:23:28Z

My opinion is we hold off on that and just do the Relaxed version, but I'll defer to Sam on this.

slouken · 2025-09-04T23:08:05Z

I'm going to bump this out to the 3.6 milestone where we can think about it in a relaxed manner. ;)

slouken · 2025-09-13T14:16:36Z

I think if we're going to add various memory semantics then we should add functions that take a parameter that specifies what memory semantic it has.

maia-s · 2025-09-15T14:10:50Z

I think if we're going to add various memory semantics then we should add functions that take a parameter that specifies what memory semantic it has.

Okay, I've added a commit that implements this. Not sure what to name them but I went with a WithOrder suffix for now. C11 atomics name these explicit.

maia-s force-pushed the relaxed-atomics branch 2 times, most recently from 743b6ff to ffc4d29 Compare August 28, 2025 13:27

sezero reviewed Aug 28, 2025

View reviewed changes

src/dynapi/SDL_dynapi_procs.h Outdated Show resolved Hide resolved

maia-s force-pushed the relaxed-atomics branch 2 times, most recently from ed8d14b to 87aeb75 Compare August 28, 2025 13:57

maia-s force-pushed the relaxed-atomics branch from 87aeb75 to 03dbf73 Compare August 28, 2025 14:43

maia-s force-pushed the relaxed-atomics branch from 03dbf73 to 6d5910e Compare August 29, 2025 16:27

slouken added this to the 3.6.0 milestone Sep 4, 2025

Add relaxed atomics

a324e39

maia-s force-pushed the relaxed-atomics branch from 6d5910e to 5207f6b Compare September 15, 2025 14:08

maia-s force-pushed the relaxed-atomics branch 5 times, most recently from 8cb2098 to 939d6cb Compare September 15, 2025 14:23

maia-s force-pushed the relaxed-atomics branch 4 times, most recently from 2451c66 to 2e52bad Compare September 16, 2025 09:46

Ordered atomics

d56a50d

maia-s force-pushed the relaxed-atomics branch from 2e52bad to d56a50d Compare September 16, 2025 09:48

Add relaxed atomics #13822

Are you sure you want to change the base?

Add relaxed atomics #13822

Conversation

maia-s commented Aug 28, 2025

Uh oh!

Uh oh!

slouken commented Aug 28, 2025

Uh oh!

maia-s commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

icculus commented Aug 28, 2025

Uh oh!

maia-s commented Aug 28, 2025

Uh oh!

maia-s commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

icculus commented Aug 29, 2025

Uh oh!

icculus commented Aug 29, 2025

Uh oh!

maia-s commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

icculus commented Sep 1, 2025

Uh oh!

nfries88 commented Sep 3, 2025

Uh oh!

maia-s commented Sep 3, 2025

Uh oh!

icculus commented Sep 4, 2025

Uh oh!

slouken commented Sep 4, 2025

Uh oh!

slouken commented Sep 13, 2025

Uh oh!

maia-s commented Sep 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

maia-s commented Aug 28, 2025 •

edited

Loading

maia-s commented Aug 29, 2025 •

edited

Loading

maia-s commented Aug 29, 2025 •

edited

Loading