Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tagged integers #676

Open
markshannon opened this issue May 8, 2024 · 5 comments
Open

Tagged integers #676

markshannon opened this issue May 8, 2024 · 5 comments

Comments

@markshannon
Copy link
Member

markshannon commented May 8, 2024

With deferred reference counting, we gain some freedom in how we represent references to objects.
One common optimization in VMs, from early Smalltalk days to V8 and Javascript Core, is tagged pointers.
The idea is to use one or more of the spare bits in a pointer to distinguish pointers from other values.

Given how ubiquitous integers are, they are the obvious candidate for tagging.
Tagging floats is also appealing, but is probably too awkward to be viable, but here's an idea on how it might be done

Given that we plan to convert stack references from PyObject * to an opaque struct anyway, adding tagged ints becomes considerably easier.

Stack and heap references

Converting from a reference struct to a PyObject * is potentially expensive if the struct contains a tagged int, as we need to allocate and free boxed ints. To reduce the cost we want to minimize the conversions. That means that most collection objects should contain reference structs, not PyObject *s. We don't need to change them all at once though.

Ideally, the only conversions would occur at the boundary between the VM and third party code. Although even that cost can be be largely eliminated

The tagging scheme

The obvious scheme is to set the low bit to 1 for unboxed ints, storing the actual value in the remaining bits.
It actually makes more sense, however to tag pointers and not ints for a few reasons.

  1. It means that we don't need to offset the int by 1 when performing arithmetic
  2. The cost of the offset for pointers is effectively zero, as almost all pointer dereferences include a constant offset. Changing the constant offset by one is free
  3. It help finds bugs. Casting a tagged pointer to an untagged pointer is incorrect, and by tagging pointers will almost certainly cause a crash

If we want to reserve a second bit for tagging other values, or maybe marking deferred references, then we have the following tagging scheme:

Tag Meaning
00 Unboxed int
01 Reserved
10 Reserved
11 Tagged pointer (value = ptr-1)
@gvanrossum
Copy link
Collaborator

See also Ken Jin's thoughts on tagging, in a Google Doc and a PR (python/cpython#118450).

@Fidget-Spinner
Copy link
Collaborator

The tagging scheme used in that PR is compatible with this. It needs very few modifications to tag everything to 0b11 like what this issue is suggesting.

@gvanrossum
Copy link
Collaborator

Tagged pointer (value = ptr-1)

Shouldn't that be "(value = ptr-3)"? Or am I missing something?

@markshannon
Copy link
Member Author

Tagged pointer (value = ptr-1)

Shouldn't that be "(value = ptr-3)"? Or am I missing something?

It is ptr-1: xxxx00 - 1 == yyyy11, where yyyy == xxxx - 1

@aivarsk
Copy link

aivarsk commented Sep 9, 2024

I found this https://coredumped.dev/2024/09/09/what-is-the-best-pointer-tagging-method/ article today, the lower byte tagging is interesting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants