-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Atomic record fields #39
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I strongly support this, it looks like a good solution for a problem that seemed intractable to me at first glance.
For example, if we have `type 'a t = { atomic mutable state : 'a st }`, | ||
then `[%atomic.field state]` is a polymorphic constant of type | ||
`('a t, 'a st) Atomic.Field.t`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder how hard it would be to have a third type parameter, which would be instantiated to a different (opaque) type for every field of a record. This way, when two fields have the same type, it would be impossible to mistakenly access the wrong one.
We can expose atomic arrays with the same mechanism: a builtin type | ||
`'a Atomic.Array.t`, with a primitive function to build an index from | ||
an integer: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of having a uniform interface for atomic fields and arrays. It would also make it natural to use the same runtime primitives for both, reducing the number of primitives to create.
We can reuse the `Field.t` type and its atomic-operation API, but note | ||
that the index here may be outside the bounds of the array, and will | ||
have to be checked on access. (The index is not bound-checked by the | ||
`index` function, as it may be called on different arrays later.) For | ||
`Field.t` values at record type, no such bound checking is necessary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m reminded (by @polytypic) that non-bound-checked accesses would be desirable as well for performance-critical code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking more about this, I notice that there is a weakness in my proposal related to arrays. When using Atomic.Field.t
for records, we do not want any bound-checking, as the indices are in-bounds by construction. So the Atomic.Field.t
primitives will not perform any bound-checking. We can reuse the same primitives to implement array access, but the specific proposal I made in the PR leads to non-bound-checked array accesses by default, not to bound-checked array accesses. I think this is wrong: I'm happy to offer both, but bound-checked accesses should be the default.
I see two approaches to solve this:
- We could of course expose different primitives in
Atomic.Array.t
, which would perform bound-checking first and then call the same runtime primitives as those of theAtomic.Field.t
API. (More precisely: for each access function, offer a checked and an unchecked version inAtomic.Array.t
.) I think that this would be perfectly reasonable, and/but it duplicates the API surface which I thought we could avoid. - We could change our
index
function to haveindex : 'a t -> int -> ('a Array.t, 'a) Field.t
, and alsoval unsafe_index : int -> ('a Array.t, 'a) Field.t
, wichindex
in charge of doing bound-checking against a specific array. This is a bit wonky: the index you get has been checked against one array and it is in fact unsafe/unchecked to use it against another array. So only the fragmentAtomic.Field.<op> arr (Atomic.Array.index arr i)
is safe, where you use the index for indexing right away, on the same array. This avoids the API duplication but the API is a bit weird. This would be acceptable for an advanced, expert-users-seeking-performance-only internal API, but I am not fond of the extra complexity (it is more work to explain and to use), so I would rather support (1).
My gut feeling is thus that the best move would be to remove this paragraph on supporting arrays, which was tentative and does not pan out as I had hoped, and simply bite the bullet and offer a Atomic.Array
submodule without any of the field stuff getting involved. (... and we could also have Atomic.ContendedArray
if we wanted to, for a fourth copy of the list of primitives.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m reminded (by @polytypic) that non-bound-checked accesses would be desirable as well for performance-critical code.
See Saturn's spsc_queue for example. All accesses are made with an index modulo the size of the array, so there is no need for a bound check.
@bclement-ocp has an alternate proposal, where we use an (* in library code *)
module Atomic : sig
...
module Loc : sig
type 'a t
val get : 'a t -> 'a
val compare_and_set : 'a t -> 'a -> 'a -> bool
end
end
(* in user code *)
type t = {
id : int;
mutable state : st [@atomic];
}
let rec evolve v =
let cur = v.state in (* <- strong atomic read *)
let next = next_state cur in
if not Atomic.Loc.compare_and_set [%atomic.field v.state] cur next
then evolve v Upsides of this proposal:
Downside: internally this is represented as a (value, offset) |
I thought more about the suggestion of @bclement-ocp, and I see four Running example: type t = {
id : int;
mutable state : st [@atomic];
} The designs1. A new two-parameter type
|
Even if you do 2 you might as well define the types like: type ('r, 'v) field
type 'v loc = Loc : 'r * ('r, 'v) field -> 'v loc Then people can still get 1 if they like. |
I like how (2) fits simply into the compiler, using only things that already exist (extension points, types, compiler primitives). In comparison, (4) seems to add a lot of complexity for a gain that is not so big to me, namely no API duplication. I think we can easily explain to our users that |
Thanks, this is useful feedback. I agree that (2) is simpler. It's also probably not too hard to implement (4) on top of (2) later if we decide to do it, so in a sense (2) is a minimum viable proposal. |
For the record, I disagree with the Speaking of this, I like the fact that atomic arrays can use the same access functions as atomic record fields; but to satisfy the separation of atomic and non-atomic locations, we need to allow atomic locations to be created only from a Edit.: I just realized that it’s already the case in your proposal. |
In my mind, anything to do with atomics, relaxed memory semantics and their compilation fall under the broad category of memory model. Recall that atomics in OCaml were introduced in the PLDI 2017 “memory model” paper. I’m using the tag only to find memory model related RFCs easily. |
In trunk, all atomic functions exposed in the runtime are also exposed as language primitives in our intermediate representations (lambda, clambda). But except for `Patomic_load`, which benefits from dedicated code generation, they are all transformed into C calls on all backends. The present PR simplifies the code noticeably by removing the intermediate primitives, by producing C calls directly in lambda/translprim.ml. This reduces the amount of boilerplate to modify to implement atomic record fields (ocaml/RFCs#39). Co-authored-by: Clément Allain <[email protected]>
Uses of existing atomic primitives %atomic_foo, which act on single-field references, are now translated into %atomic_foo_field, which act on a pointer and an offset -- passed as separate arguments. In particular, note that the arity of the internal Lambda primitive Patomic_load increases by one with this patchset. (Initially we renamed it into Patomic_load_field but this creates a lot of churn for no clear benefits.) We also support primitives of the form %atomic_foo_loc, which expects a pair of a pointer and an offset (as a single argument), as we proposed in the RFC on atomic fields ocaml/RFCs#39 (but there is no language-level support for atomic record fields yet) Co-authored-by: Clément Allain <[email protected]>
This type will be used for ['a Atomic.Loc.t], as proposed in the RFC ocaml/RFCs#39 We implement this here to be able to use it in the stdlib later, after a bootstrap.
This type will be used for ['a Atomic.Loc.t], as proposed in the RFC ocaml/RFCs#39 We implement this here to be able to use it in the stdlib later, after a bootstrap.
Uses of existing atomic primitives %atomic_foo, which act on single-field references, are now translated into %atomic_foo_field, which act on a pointer and an offset -- passed as separate arguments. In particular, note that the arity of the internal Lambda primitive Patomic_load increases by one with this patchset. (Initially we renamed it into Patomic_load_field but this creates a lot of churn for no clear benefits.) We also support primitives of the form %atomic_foo_loc, which expects a pair of a pointer and an offset (as a single argument), as we proposed in the RFC on atomic fields ocaml/RFCs#39 (but there is no language-level support for atomic record fields yet) Co-authored-by: Clément Allain <[email protected]>
Uses of existing atomic primitives %atomic_foo, which act on single-field references, are now translated into %atomic_foo_field, which act on a pointer and an offset -- passed as separate arguments. In particular, note that the arity of the internal Lambda primitive Patomic_load increases by one with this patchset. (Initially we renamed it into Patomic_load_field but this creates a lot of churn for no clear benefits.) We also support primitives of the form %atomic_foo_loc, which expects a pair of a pointer and an offset (as a single argument), as we proposed in the RFC on atomic fields ocaml/RFCs#39 (but there is no language-level support for atomic record fields yet) Co-authored-by: Clément Allain <[email protected]>
This type will be used for ['a Atomic.Loc.t], as proposed in the RFC ocaml/RFCs#39 We implement this here to be able to use it in the stdlib later, after a bootstrap.
Uses of existing atomic primitives %atomic_foo, which act on single-field references, are now translated into %atomic_foo_field, which act on a pointer and an offset -- passed as separate arguments. In particular, note that the arity of the internal Lambda primitive Patomic_load increases by one with this patchset. (Initially we renamed it into Patomic_load_field but this creates a lot of churn for no clear benefits.) We also support primitives of the form %atomic_foo_loc, which expects a pair of a pointer and an offset (as a single argument), as we proposed in the RFC on atomic fields ocaml/RFCs#39 (but there is no language-level support for atomic record fields yet) Co-authored-by: Clément Allain <[email protected]>
This type will be used for ['a Atomic.Loc.t], as proposed in the RFC ocaml/RFCs#39 We implement this here to be able to use it in the stdlib later, after a bootstrap.
Uses of existing atomic primitives %atomic_foo, which act on single-field references, are now translated into %atomic_foo_field, which act on a pointer and an offset -- passed as separate arguments. In particular, note that the arity of the internal Lambda primitive Patomic_load increases by one with this patchset. (Initially we renamed it into Patomic_load_field but this creates a lot of churn for no clear benefits.) We also support primitives of the form %atomic_foo_loc, which expects a pair of a pointer and an offset (as a single argument), as we proposed in the RFC on atomic fields ocaml/RFCs#39 (but there is no language-level support for atomic record fields yet) Co-authored-by: Clément Allain <[email protected]>
This type will be used for ['a Atomic.Loc.t], as proposed in the RFC ocaml/RFCs#39 We implement this here to be able to use it in the stdlib later, after a bootstrap.
Uses of existing atomic primitives %atomic_foo, which act on single-field references, are now translated into %atomic_foo_field, which act on a pointer and an offset -- passed as separate arguments. In particular, note that the arity of the internal Lambda primitive Patomic_load increases by one with this patchset. (Initially we renamed it into Patomic_load_field but this creates a lot of churn for no clear benefits.) We also support primitives of the form %atomic_foo_loc, which expects a pair of a pointer and an offset (as a single argument), as we proposed in the RFC on atomic fields ocaml/RFCs#39 (but there is no language-level support for atomic record fields yet) Co-authored-by: Clément Allain <[email protected]>
This type will be used for ['a Atomic.Loc.t], as proposed in the RFC ocaml/RFCs#39 We implement this here to be able to use it in the stdlib later, after a bootstrap.
There is now an implementation of this RFC -- the |
Uses of existing atomic primitives %atomic_foo, which act on single-field references, are now translated into %atomic_foo_field, which act on a pointer and an offset -- passed as separate arguments. In particular, note that the arity of the internal Lambda primitive Patomic_load increases by one with this patchset. (Initially we renamed it into Patomic_load_field but this creates a lot of churn for no clear benefits.) We also support primitives of the form %atomic_foo_loc, which expects a pair of a pointer and an offset (as a single argument), as we proposed in the RFC on atomic fields ocaml/RFCs#39 (but there is no language-level support for atomic record fields yet) Co-authored-by: Clément Allain <[email protected]>
This type will be used for ['a Atomic.Loc.t], as proposed in the RFC ocaml/RFCs#39 We implement this here to be able to use it in the stdlib later, after a bootstrap.
This type will be used for ['a Atomic.Loc.t], as proposed in the RFC ocaml/RFCs#39 We implement this here to be able to use it in the stdlib later, after a bootstrap.
Uses of existing atomic primitives %atomic_foo, which act on single-field references, are now translated into %atomic_foo_field, which act on a pointer and an offset -- passed as separate arguments. In particular, note that the arity of the internal Lambda primitive Patomic_load increases by one with this patchset. (Initially we renamed it into Patomic_load_field but this creates a lot of churn for no clear benefits.) We also support primitives of the form %atomic_foo_loc, which expects a pair of a pointer and an offset (as a single argument), as we proposed in the RFC on atomic fields ocaml/RFCs#39 (but there is no language-level support for atomic record fields yet) Co-authored-by: Clément Allain <[email protected]>
This type will be used for ['a Atomic.Loc.t], as proposed in the RFC ocaml/RFCs#39 We implement this here to be able to use it in the stdlib later, after a bootstrap.
Uses of existing atomic primitives %atomic_foo, which act on single-field references, are now translated into %atomic_foo_field, which act on a pointer and an offset -- passed as separate arguments. In particular, note that the arity of the internal Lambda primitive Patomic_load increases by one with this patchset. (Initially we renamed it into Patomic_load_field but this creates a lot of churn for no clear benefits.) We also support primitives of the form %atomic_foo_loc, which expects a pair of a pointer and an offset (as a single argument), as we proposed in the RFC on atomic fields ocaml/RFCs#39 (but there is no language-level support for atomic record fields yet) Co-authored-by: Clément Allain <[email protected]>
This type will be used for ['a Atomic.Loc.t], as proposed in the RFC ocaml/RFCs#39 We implement this here to be able to use it in the stdlib later, after a bootstrap.
This type will be used for ['a Atomic.Loc.t], as proposed in the RFC ocaml/RFCs#39 We implement this here to be able to use it in the stdlib later, after a bootstrap.
This type will be used for ['a Atomic.Loc.t], as proposed in the RFC ocaml/RFCs#39 We implement this here to be able to use it in the stdlib later, after a bootstrap.
I was asked to comment about whether there might be some reason to prefer the more complex initial design with two type parameters representing an offset into a block of some type and the design with a single type parameter that logically holds a pair of a block and offset into that block. My initial thought was that there probably aren't that many applications for the added generality of having first-class offsets to atomic fields. However, I then went running last night and realized there might actually be some practically important use cases for the more complex design. Consider the following FGL sketch: module FGL : sig
type state [@@immediate]
val init : unit -> state
val lock : 'block -> ('block, state) Atomic.Loc.t -> unit
val unlock : 'block -> ('block, state) Atomic.Loc.t -> unit
end = struct
type state = int
(* ... *)
end The idea is that a FGL takes only one word and can e.g. be embedded into the nodes of a data structure without allocations. This allows efficient Fine-Grained Locking approaches. Only a couple of bits are needed to represent the state of an efficient lock and the queue of awaiters can be stored externally just like with a futex. The Addition: You actually want to use both the block address and the field offset to identify the FGL. This way you will be able to have multiple FGLs per block. You might also use bits from the offset as part of the hash value. For reference:
You can, of course, implement something similar with the other design: module FGL : sig
type state [@@immediate]
val init : unit -> state
val lock : 'unique_key -> state Atomic.Loc.t -> unit
val unlock : 'unique_key -> state Atomic.Loc.t -> unit
end = struct
type state = int
(* ... *)
end One issue here is now that the |
Thanks @polytypic. Your reply got me thinking about how we can move from one design to the other. (The value-and-offset design and the offset-only design.) From the offset-only design, it is possible to implement the field-and-offset design with a GADT, as pointed by @lpw25 at #39 (comment) . From the value-and-offset design, it is possible to recover the field, but not its typing information: The pair that is giving me pause about the offset-only design is the fact that the type-inference code is more complex, as we have to deal with type-based disambiguation, and probably require a type annotation from the user. It is easy to type-check # type t = { x : int };;
# type u = { x : int };;
# let (vt : t), (vu : u) = { x = 0 }, { x = 1 };;
val vt : t = {x = 0}
val vu : u = {x = 1}
# let access r f = f r;;
val access : 'a -> ('a -> 'b) -> 'b = <fun>
# access vt (fun r -> r.x), access vu (fun r -> r.x);;
- : int * int = (0, 1)
# #principal true;;
# access vt (fun r -> r.x), access vu (fun r -> r.x);;
Warning 18 [not-principal]: this type-based field disambiguation is not principal.
- : int * int = (0, 1) |
I wouldn't expect this to be a practically significant imbediment. I believe it is likely that the vast majority of uses of atomic record fields will be hidden inside data structure implementations written by people who are relatively experts in these matters. The FGL example, which takes an offset as a parameter, also falls into this category. The main use case for FGL would be writing data structures with fine-grained locking, which requires some expertise, and would usually be hidden behind the data structure abstraction. |
Today I thought more about how we would go for design one ( type 'v loc = Loc : 'r * ('r, 'v) Field.t -> 'v loc There is no point in exposing atomic operations on There are two upsides to having atomic operations take a
I see two clear downsides to
I considered three ways out of this quagmire:
Note: If we decide to go for a design with |
Uses of existing atomic primitives %atomic_foo, which act on single-field references, are now translated into %atomic_foo_field, which act on a pointer and an offset -- passed as separate arguments. In particular, note that the arity of the internal Lambda primitive Patomic_load increases by one with this patchset. (Initially we renamed it into Patomic_load_field but this creates a lot of churn for no clear benefits.) We also support primitives of the form %atomic_foo_loc, which expects a pair of a pointer and an offset (as a single argument), as we proposed in the RFC on atomic fields ocaml/RFCs#39 (but there is no language-level support for atomic record fields yet) Co-authored-by: Clément Allain <[email protected]>
This type will be used for ['a Atomic.Loc.t], as proposed in the RFC ocaml/RFCs#39 We implement this here to be able to use it in the stdlib later, after a bootstrap.
In ocaml/ocaml#13707 I implement Let me quote my preliminary conclusions at the end of the description of that PR:
|
A proposed design for atomic record fields.
Rendered version.
In summary: