Skip to content

Conversation

vstinner
Copy link
Member

@vstinner vstinner commented Oct 10, 2025

Move the public PyUnicodeWriter API and the private _PyUnicodeWriter API to a new Objects/unicode_writer.c file.

Rename a few helper functions to share them between unicodeobject.c and unicode_writer.c, such as resize_compact() or unicode_result().

Move the public PyUnicodeWriter API and the private _PyUnicodeWriter
API to a new Objects/unicode_writer.c file.

Rename a few helper functions to share them between unicodeobject.c
and unicode_writer.c, such as resize_compact() or unicode_result().
@vstinner
Copy link
Member Author

cc @serhiy-storchaka @malemburg

@malemburg
Copy link
Member

As mentioned before, I don't think turning these parts into separate object files is a good idea.

Have you benchmarked the effect of putting the writer into a separate object file vs. keeping it in the unicodeobject.c file (vie #includes )

@vstinner
Copy link
Member Author

Have you benchmarked the effect of putting the writer into a separate object file vs. keeping it in the unicodeobject.c file (vie #includes )

I'm not sure if it's relevant since PyUnicodeWriter is not used by unicodeobject.c.

I ran a benchmark on repr(tuple) (which is implemented with PyUnicodeWriter): there is no impact on performance. The difference is just noise in the benchmark.

ref is the main branch and split is this PR:

Benchmark ref split
repr tuple-1 254 ns 265 ns: 1.04x slower
repr tuple-10 786 ns 783 ns: 1.00x faster
repr tuple-50 3.02 us 3.03 us: 1.00x slower
repr tuple-100 5.78 us 5.84 us: 1.01x slower
Geometric mean (ref) 1.01x slower

Benchmark hidden because not significant (1): repr tuple-5

Note: I didn't use LTO+PGO optimizations which should reduce the noise even more.

import pyperf
runner = pyperf.Runner()
for size in (1, 5, 10, 50, 100):
    runner.timeit(f'repr tuple-{size}',
        setup=f't = (1,)*{size}',
        stmt='repr(t)')

@malemburg
Copy link
Member

malemburg commented Oct 14, 2025

Thanks for running the benchmark.

I'm more concerned about uses of unicodeobject.c code in this new unicode_writerr.c, than use of the writer APIs in unicodeobject.c.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants