Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of http.cookiejar.join_header_words #128982

Closed
picnixz opened this issue Jan 18, 2025 · 5 comments
Closed

Improve performance of http.cookiejar.join_header_words #128982

picnixz opened this issue Jan 18, 2025 · 5 comments
Assignees
Labels
performance Performance or resource usage stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@picnixz
Copy link
Member

picnixz commented Jan 18, 2025

Feature or enhancement

Proposal:

Currently, we use if not re.search(r"^\w+$", v):. This can be replaced by if not v.isalnum() and '_' not in v. Benchmarks are as follows:

$ ./python -m pyperf timeit -s 'x = [[("text/plain", None), ("charset", "iso-8859-1\"")]]; from http.cookiejar import join_header_words as f' 'f(x)'
Mean +- std dev: [ref] 795 ns +- 8 ns -> [new] 604 ns +- 20 ns: 1.32x faster

It doesn't help having a compiled regex by the way.

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

Linked PRs

@picnixz picnixz added performance Performance or resource usage stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Jan 18, 2025
@picnixz picnixz self-assigned this Jan 18, 2025
encukou pushed a commit that referenced this issue Feb 26, 2025
…er_words` for an efficient alternative (GH-128983)

The function does not anymore rely on a regular expression
to find alphanumeric characters and underscores.
@encukou
Copy link
Member

encukou commented Feb 26, 2025

Thank you!

@encukou encukou closed this as completed Feb 26, 2025
@serhiy-storchaka
Copy link
Member

This is incorrect change. For example, it produces incorrect results for "foo;_" and "foo_bar".

@encukou
Copy link
Member

encukou commented Feb 26, 2025

Oof, thanks for noticing.
I'll revert and add a test.

encukou added a commit to encukou/cpython that referenced this issue Feb 26, 2025
…ejar.join_header_words` for an efficient alternative (pythonGH-128983)"

This reverts commit 56e1900.
encukou added a commit that referenced this issue Feb 26, 2025
…kiejar.join_header_words for an efficient alternative (GH-128983)" and add tests (GH-130584)

* Revert "gh-128982: Substitute regular expression in `http.cookiejar.join_header_words` for an efficient alternative (GH-128983)"

This reverts commit 56e1900.

* Add tests
@serhiy-storchaka
Copy link
Member

\w here looks arbitrary. And it is. There are other red flags. See #130631.

@picnixz
Copy link
Member Author

picnixz commented Mar 1, 2025

I'll close this one and we'll move to 130631

@picnixz picnixz closed this as not planned Won't fix, can't repro, duplicate, stale Mar 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage stdlib Python modules in the Lib dir type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

3 participants