Skip to content

Conversation

@anivar
Copy link
Contributor

@anivar anivar commented Nov 29, 2025

Implements the ES2025 RegExp.escape() static method for safely escaping special regex characters in strings.

Implementation

  • Uses LambdaConstructor pattern for modern Rhino static method definition
  • Escapes syntax characters (. * + ? ^ $ | etc.) with backslash prefix
  • Escapes control characters (\t \n \v \f \r) with special sequences
  • Escapes other punctuators (,-=<>#&!%:;@~'`") using \xNN hex notation
  • Escapes initial digits/letters to prevent ambiguity with legacy sequences
  • Throws TypeError for non-string arguments (no automatic coercion)
  • Uses explicit hex codes (0x002c) for clarity and maintainability

Testing

  • 15 Java unit tests created, all passing
  • test262 compliance: 19/20 tests passing (95%)
  • Overall RegExp test failures reduced from 975 to 956

Impact

Provides standardized, spec-compliant solution for escaping user input in regular expressions, improving code safety and developer experience.

Adds support for RegExp.escape() which escapes special regex characters
in strings for safe use in regular expressions.

Implementation:
- Escapes syntax chars (.*+?^$|()[]{}\/) with backslash
- Escapes control chars (\t\n\v\f\r) with special sequences
- Escapes initial digits/letters with \xNN to prevent ambiguity
- Escapes surrogates with \uXXXX format
- Escapes other punctuators (,-=<>#&\!%:;@~'`") with \xNN
- Escapes whitespace and line terminators appropriately
- Throws TypeError for non-string inputs

Testing:
- 15 Java unit tests (all passing)
- test262: 19/20 tests passing (95% compliance)
- Manual verification confirms spec-compliant output
@anivar anivar force-pushed the es2025-regexp-escape branch from 0149244 to 37218ff Compare November 29, 2025 22:48
RegExp test failures improved from 956 to 955 after rebasing on latest master.
Copy link
Collaborator

@gbrail gbrail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do a "DRY" check on that character classification code. Thanks!

}

/** Check if character is a decimal digit (0-9) */
private static boolean isDecimalDigit(char c) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure that we already have implementations of these checks either elsewhere in the codebase, or in the Java Character class. Can you please check first to see if we can reuse those or make them more generic rather than encode this here? I won't be surprised if we need a few new ones but checks like isWhiteSpace and isDecimalDigit are certainly not unique to this method. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants