Skip to content

incorrect match with repeated group #300

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
faceprint opened this issue Jan 8, 2024 · 3 comments
Open

incorrect match with repeated group #300

faceprint opened this issue Jan 8, 2024 · 3 comments

Comments

@faceprint
Copy link

faceprint commented Jan 8, 2024

A regex with a repeated group and a lazy space character doesn't work like I expect. boost and the online regex demo tools behave as expected. Here's as simple an example as I could easily create:

regex: (([A-Z][a-z]+ *?){1,4}) g

The intent is to match 1 to 4 capitalized words in front of a word starting with g.

https://gcc.godbolt.org/z/v6KWrasYv

CTRE fails to match multiple capitalized words, and only matches the last one.

@hanickadot
Copy link
Owner

Thanks, will look into it. It will take a while as I'm currently swamped.

@sladecek
Copy link

I am adding a simpler regex for reproducing the same bug. The problem is probably in backtracking optimization which does not work when two repeats are nested.

#include <string>
#include <iostream>
#include <ctre.hpp>

int main()
{

    constexpr ctll::fixed_string regex{"(ab*?)+"};
    auto [match, grp1] = ctre::match<regex>("ab");
    if (match)
    {
        std::cout
            << " match; grp1=" << grp1.to_string() << "'" << std::endl;
    }
    else
    {
        std::cout << "no match" << std::endl;
    }
}

@EnigmaTriton
Copy link

I stumble upon a similar issue and given @sladecek’s description about nested repeat, it looks like it’s the same bug.

This comment is there only as a possible help to identify the issue. If it does not it can be ignored. :)

I’m trying to isolate private keys in a PEM file so finding -----BEGIN RSA PRIVATE KEY----- and similar (ENCRYPTED PRIVATE KEY, simply PRIVATE KEY, or any variation of this “prefix” including multiple words before PRIVATE KEY).

First, I tried with ctre::multiline_search<R"(-----BEGIN( ([A-Z0-9 ]+ )?PRIVATE KEY-----))">(input_data); which did not match (outer capture group is there to match -----END\1, inner parenthesis are only there for the “prefix”).

Then, I tried with ctre::multiline_search<R"(-----BEGIN( ([A-Z0-9]+ )*PRIVATE KEY-----))">(input_data); which works (although there is still some nesting involved).

For the moment, I will just track this issue and check if a proposed fix for the previous expressions work for my use case and will open a different issue only if it proves to be unrelated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants