Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RE: Eliminate Negative-Lookbehind for Leading Escapes #52

Open
4 tasks
tajmone opened this issue Feb 25, 2024 · 1 comment
Open
4 tasks

RE: Eliminate Negative-Lookbehind for Leading Escapes #52

tajmone opened this issue Feb 25, 2024 · 1 comment
Labels
💀 RegEx Syntax RegEx incompatible with new engine ⭐ RegEx Topic: Syntax RegExs

Comments

@tajmone
Copy link
Owner

tajmone commented Feb 25, 2024

Many of the bad RegExs reported when running "Syntax Test - Regex Compatibility" (i.e. incompatible with the new ST RegEx engine, due to the presence of look-behind statements) seem to perform checks for the presence of a leading escape character.

The negative look-behind can safely be removed by such RegExs, provided that the syntax ensures that the context that handles escape sequences is always executed before them — i.e. we're positively sure that no unattended escape sequences can slip through and be missed by those contexts that currently rely on a negative look-behind for an escape backslash ((?<!\\)).

In order to achieve this, we need to first disentangle the order of execution of these contexts, to ensure that the escapes are always handled before they are attempted — in this respect, we might even consider moving the context that handles escape sequences into the prototype context, as long as we remember to manually disable prototypes in those contexts where escaping doesn't apply.

  • Wait that Issue Group One-liner Bracketed Constructs #53 is completed and implemented, since most of these contexts belong to that category, and its implementation will greatly pave the path for these changes.
  • Individuate all the contexts which use a negative look-behind to check for a leading escape character, and list them in this Issue (even if they might contain another look-behind assertion, unrelated to escapes).
  • Study how to rearrange contexts inclusions to ensure that the context handling escape sequences will always be executed before the contexts individuated above.
  • Implement enough syntax tests to cover as many edge-cases and nested contexts as possible, to ensure that the syntax doesn't break-up, and that escape sequences are properly intercepted once the (?<!\\) look-behind patterns are dropped.
@tajmone tajmone added this to the Fix Bad RegExs milestone Feb 25, 2024
@tajmone tajmone added ⭐ RegEx Topic: Syntax RegExs 💀 RegEx Syntax RegEx incompatible with new engine labels Feb 25, 2024
@tajmone
Copy link
Owner Author

tajmone commented Feb 25, 2024

Arguments Against Using prototype

Although adding the context that handles backslash escaping to the prototype context sound a tempting quick-solution, there are some considerations to be taken into account...

  1. AsciiDoc also needs to account for "double escaping" — e.g. \\__func__) to escape the two underscores after the backslashes, to ensure that neither of them are treated as formatting delimiters.
  2. The way preprocessor directives are escaped is a bit more complicated: a single \ at the beginning of the line will ensure that the entire directive is ignored and treated as raw text — our syntax right now doesn't skip the entire directive, with the result that some part of it might result in a false positive match (e.g. the square brackets treated as a macro).

Chances are that, in order to ensure that the context that handles escape sequences will develop into a "smart context" that can account for the above, it might not play well with being included into prototype.

It might still be possible to split the handling of backslash escapes across different contexts, in order to meet all the mentioned goals, but it could turn out to be trickier that it might seem at first thought ... In any case, it's something that needs to be considered thoroughly, and covered by extensive syntax tests.


References

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💀 RegEx Syntax RegEx incompatible with new engine ⭐ RegEx Topic: Syntax RegExs
Projects
None yet
Development

No branches or pull requests

1 participant