Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

<regex>: regex_traits<_Elem> uses an inadmissible value of type char_class_type to represent character class "w" #5242

Open
muellerj2 opened this issue Jan 17, 2025 · 0 comments

Comments

@muellerj2
Copy link
Contributor

regex_traits<_Elem> uses static_cast<ctype_base::mask>(-1) to represent the character class "w":

_REGEX_CHAR_CLASS_NAME("w", static_cast<ctype_base::mask>(-1)),

This is an inadmissible choice, because it violates [re.grammar]/9:

The results from multiple calls to traits_inst.lookup_classname can be bitwise or'ed together and subsequently passed to traits_inst.isctype.

Specifically, or'ing the char_class_type for "w" with the char_class_type for any other character class always produces the value for "w" again, even if the combination should match more characters.

Additional remarks

I think resolving this issue will break ABI. However, it should be possible to mitigate the problems caused by this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant