-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tagged integers #676
Comments
See also Ken Jin's thoughts on tagging, in a Google Doc and a PR (python/cpython#118450). |
The tagging scheme used in that PR is compatible with this. It needs very few modifications to tag everything to |
Shouldn't that be "(value = ptr-3)"? Or am I missing something? |
It is |
I found this https://coredumped.dev/2024/09/09/what-is-the-best-pointer-tagging-method/ article today, the lower byte tagging is interesting. |
With deferred reference counting, we gain some freedom in how we represent references to objects.
One common optimization in VMs, from early Smalltalk days to V8 and Javascript Core, is tagged pointers.
The idea is to use one or more of the spare bits in a pointer to distinguish pointers from other values.
Given how ubiquitous integers are, they are the obvious candidate for tagging.
Tagging floats is also appealing, but is probably too awkward to be viable, but here's an idea on how it might be done
Given that we plan to convert stack references from
PyObject *
to an opaque struct anyway, adding tagged ints becomes considerably easier.Stack and heap references
Converting from a reference struct to a
PyObject *
is potentially expensive if the struct contains a tagged int, as we need to allocate and free boxed ints. To reduce the cost we want to minimize the conversions. That means that most collection objects should contain reference structs, notPyObject *
s. We don't need to change them all at once though.Ideally, the only conversions would occur at the boundary between the VM and third party code. Although even that cost can be be largely eliminated
The tagging scheme
The obvious scheme is to set the low bit to 1 for unboxed ints, storing the actual value in the remaining bits.
It actually makes more sense, however to tag pointers and not ints for a few reasons.
If we want to reserve a second bit for tagging other values, or maybe marking deferred references, then we have the following tagging scheme:
The text was updated successfully, but these errors were encountered: