Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Is there an equivalent to regex_replace? #332

Open
jmarrec opened this issue Nov 8, 2024 · 4 comments
Open

[Question] Is there an equivalent to regex_replace? #332

jmarrec opened this issue Nov 8, 2024 · 4 comments

Comments

@jmarrec
Copy link

jmarrec commented Nov 8, 2024

I was wondering if there's a facility such as https://en.cppreference.com/w/cpp/regex/regex_replace in CTRE?

So far I was only able to come up with something not great

static constexpr auto pattern = ctll::fixed_string{" "};

std::string str = get_string();

std::string result;
bool first = true;
for (auto match : ctre::split<pattern>(str)) {
    if (!first) {
        result += "_";
    } else {
        first = false;
    }
    result += std::string{match.get<0>()};
}
fmt::print("{}\n", result);

https://godbolt.org/z/9PMzef97v

A better version but that would require C++26 (join_with is C++23, and ranges::to is C++26): https://godbolt.org/z/9vcTWePGe

@pbs3141
Copy link

pbs3141 commented Nov 19, 2024

See #250

@DubbleClick
Copy link

DubbleClick commented Mar 10, 2025

This... works. It's not good and definitely shouldn't be used, But it kind of works for simple cases.

template <typename CharT, ctll::fixed_string Pattern, typename... Modifiers>
constexpr std::basic_string<CharT> ctre_simple_regex_replace(const std::basic_string_view<CharT> input, const std::basic_string_view<CharT> replacement)
{
   auto r = ctre::split<Pattern, Modifiers...>(input);
   return r | std::views::as_rvalue | std::views::join_with(replacement) | std::ranges::to<std::basic_string<CharT>>();
}

template <ctll::fixed_string Pattern, ctll::fixed_string Replacement, typename... Modifiers>
constexpr std::string ctre_regex_replace(const std::string_view subject)
{
    // this is actually SLOWER than the stringstream version, so don't use it.
    static constexpr ctll::fixed_string special_tokens = R"(\$(\d+|'|&|`|\$))";
    if constexpr (!ctre::search<special_tokens>(Replacement) && false) {
        return ctre_simple_regex_replace<char, Pattern, Modifiers...>(subject, Replacement | std::ranges::to<std::string>());
    }

    std::string result;
    result.reserve(subject.size() * 2);
    auto search_start = subject.begin();
    const auto replacement = Replacement | std::ranges::to<std::string>(); // warning: conversion from char32_t to char

    for (auto match : ctre::search_all<Pattern, Modifiers...>(subject)) {
        // append between last match and this match
        result.append(&*search_start, match.begin() - search_start);
        std::string replaced_match(replacement);
        struct Pair {
            std::string key;
            std::string value;
        };
        std::vector<Pair> replacements;

        constexpr auto cnt = decltype(match)::count();
        static_assert(cnt < 10, "Only up to 9 capture groups are supported");
        constexpr auto has_escaped_dollar = ctre::search<R"(\$\$)">(Replacement);
        if constexpr (has_escaped_dollar)
            replacements.emplace_back("$$", "###ESCAPED_DOLLAR###");
        if constexpr (ctre::search<R"(\$&)">(Replacement))
            replacements.emplace_back("$&", match.to_string());
        if constexpr (ctre::search<R"(\$')">(Replacement))
            replacements.emplace_back("$'", std::string(match.end(), subject.end()));
        if constexpr (ctre::search<R"(\$`)">(Replacement))
            replacements.emplace_back("$`", std::string(subject.begin(), match.begin()));
        // if constexpr (ctre::search<R"(\$0)">(Replacement))
        //     replacements.emplace_back("$0", match.to_string());
        if constexpr (ctre::search<R"(\$1)">(Replacement) && cnt > 1)
            replacements.emplace_back("$1", match.template get<1>().to_string());
        if constexpr (ctre::search<R"(\$2)">(Replacement) && cnt > 2)
            replacements.emplace_back("$2", match.template get<2>().to_string());
        if constexpr (ctre::search<R"(\$3)">(Replacement) && cnt > 3)
            replacements.emplace_back("$3", match.template get<3>().to_string());
        if constexpr (ctre::search<R"(\$4)">(Replacement) && cnt > 4)
            replacements.emplace_back("$4", match.template get<4>().to_string());
        if constexpr (ctre::search<R"(\$5)">(Replacement) && cnt > 5)
            replacements.emplace_back("$5", match.template get<5>().to_string());
        if constexpr (ctre::search<R"(\$6)">(Replacement) && cnt > 6)
            replacements.emplace_back("$6", match.template get<6>().to_string());
        if constexpr (ctre::search<R"(\$7)">(Replacement) && cnt > 7)
            replacements.emplace_back("$7", match.template get<7>().to_string());
        if constexpr (ctre::search<R"(\$8)">(Replacement) && cnt > 8)
            replacements.emplace_back("$8", match.template get<8>().to_string());
        if constexpr (ctre::search<R"(\$9)">(Replacement) && cnt > 9)
            replacements.emplace_back("$9", match.template get<9>().to_string());
        if constexpr (has_escaped_dollar)
            replacements.emplace_back("###ESCAPED_DOLLAR###", "$");

        for (const auto& [key, value] : replacements) {
            size_t pos = 0;
            while ((pos = replaced_match.find(key, pos)) != std::string::npos) {
                replaced_match.replace(pos, key.length(), value);
                pos += value.length();
            }
        }

        // append replaced match
        result.append(replaced_match);
        search_start = match.end();
    }

    // Add remaining text
    if (search_start != subject.end()) {
        const auto remaining_length = std::distance(search_start, subject.end());
        result.append(&*search_start, remaining_length);
    }
    return result;
}

@DubbleClick
Copy link

And it's still (a lot) faster than std::regex for simple replacements. For long replacements, however, it's a terrible idea.

const auto global_subject = "RegExr was created by gskinner.com. \n"
    "Edit the Expression & Text to see matches. Roll over matches or the expression for details. PCRE & JavaScript flavors of RegEx are supported. Validate your expression with Tests mode.\n"
    "The side bar includes a Cheatsheet, full Reference, and Help. You can also Save & Share with the Community and view patterns you create or favorite in My Patterns.\n"
    "Explore results with the Tools below. Replace & List output custom results. Details lists capture groups. Explain describes your expression in plain English."s;

constexpr auto max_iterations = 10000;

void benchmark_ctre()
{
    constexpr ctll::fixed_string pattern = R"((\b\w+)\s+(\w+\b))";
    auto start = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < max_iterations; ++i) {
        auto result = ctre_regex_replace<pattern, "_">(global_subject);
        if (i == 0) {
            //std::cout << result << std::endl;
        }
    }
    auto end = std::chrono::high_resolution_clock::now();
    std::cout << "simple CTRE time: " << std::chrono::duration<double>(end - start).count() << "s" << std::endl;
    start = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < max_iterations; ++i) {
        auto result = ctre_regex_replace<pattern, "$2 <-> $1">(global_subject);
        if (i == 0) {
            //std::cout << result << std::endl;
        }
    }
    end = std::chrono::high_resolution_clock::now();
    std::cout << "complex CTRE time: " << std::chrono::duration<double>(end - start).count() << "s" << std::endl;
}

void benchmark_std_regex()
{
    const std::regex pattern(R"((\b\w+)\s+(\w+\b))");

    auto start = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < max_iterations; ++i) {
        auto result = std::regex_replace(global_subject, pattern, "_");
        if (i == 0) {
            //std::cout << result << std::endl;
        }
    }
    auto end = std::chrono::high_resolution_clock::now();
    std::cout << "simple std::regex time: " << std::chrono::duration<double>(end - start).count() << "s" << std::endl;
    start = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < max_iterations; ++i) {
        auto result = std::regex_replace(global_subject, pattern, "$2 <-> $1");
        if (i == 0) {
            //std::cout << result << std::endl;
        }
    }
    end = std::chrono::high_resolution_clock::now();
    std::cout << "complex std::regex time: " << std::chrono::duration<double>(end - start).count() << "s" << std::endl;
}

int main()
{
    benchmark_ctre();
    benchmark_std_regex();
    return 0;
}
simple CTRE time: 0.0669281s
complex CTRE time: 0.118469s
simple std::regex time: 1.54931s
complex std::regex time: 1.76555s

When going for replacements that bloat up the size of the resulting string a lot, it's almost preferable to use the STL though.

Example replacement:

const auto global_subject = "RegExr was created by gskinner.com. \n"
    "Edit the Expression & Text to see matches. Roll over matches or the expression for details. PCRE & JavaScript flavors of RegEx are supported. Validate your expression with Tests mode.\n"
    "The side bar includes a Cheatsheet, full Reference, and Help. You can also Save & Share with the Community and view patterns you create or favorite in My Patterns.\n"
    "Edit the Expression & Text to see matches. Roll over matches or the expression for details. PCRE & JavaScript flavors of RegEx are supported. Validate your expression with Tests mode.\n"
    "The side bar includes a Cheatsheet, full Reference, and Help. You can also Save & Share with the Community and view patterns you create or favorite in My Patterns.\n"
    "Edit the Expression & Text to see matches. Roll over matches or the expression for details. PCRE & JavaScript flavors of RegEx are supported. Validate your expression with Tests mode.\n"
    "The side bar includes a Cheatsheet, full Reference, and Help. You can also Save & Share with the Community and view patterns you create or favorite in My Patterns.\n"
    "Edit the Expression & Text to see matches. Roll over matches or the expression for details. PCRE & JavaScript flavors of RegEx are supported. Validate your expression with Tests mode.\n"
    "The side bar includes a Cheatsheet, full Reference, and Help. You can also Save & Share with the Community and view patterns you create or favorite in My Patterns.\n"
    "Edit the Expression & Text to see matches. Roll over matches or the expression for details. PCRE & JavaScript flavors of RegEx are supported. Validate your expression with Tests mode.\n"
    "The side bar includes a Cheatsheet, full Reference, and Help. You can also Save & Share with the Community and view patterns you create or favorite in My Patterns.\n"
    "Edit the Expression & Text to see matches. Roll over matches or the expression for details. PCRE & JavaScript flavors of RegEx are supported. Validate your expression with Tests mode.\n"
    "The side bar includes a Cheatsheet, full Reference, and Help. You can also Save & Share with the Community and view patterns you create or favorite in My Patterns.\n"
    "Edit the Expression & Text to see matches. Roll over matches or the expression for details. PCRE & JavaScript flavors of RegEx are supported. Validate your expression with Tests mode.\n"
    "The side bar includes a Cheatsheet, full Reference, and Help. You can also Save & Share with the Community and view patterns you create or favorite in My Patterns.\n"
    "Edit the Expression & Text to see matches. Roll over matches or the expression for details. PCRE & JavaScript flavors of RegEx are supported. Validate your expression with Tests mode.\n"
    "The side bar includes a Cheatsheet, full Reference, and Help. You can also Save & Share with the Community and view patterns you create or favorite in My Patterns.\n"
    "Edit the Expression & Text to see matches. Roll over matches or the expression for details. PCRE & JavaScript flavors of RegEx are supported. Validate your expression with Tests mode.\n"
    "The side bar includes a Cheatsheet, full Reference, and Help. You can also Save & Share with the Community and view patterns you create or favorite in My Patterns.\n"
    "Edit the Expression & Text to see matches. Roll over matches or the expression for details. PCRE & JavaScript flavors of RegEx are supported. Validate your expression with Tests mode.\n"
    "The side bar includes a Cheatsheet, full Reference, and Help. You can also Save & Share with the Community and view patterns you create or favorite in My Patterns.\n"
    "Edit the Expression & Text to see matches. Roll over matches or the expression for details. PCRE & JavaScript flavors of RegEx are supported. Validate your expression with Tests mode.\n"
    "The side bar includes a Cheatsheet, full Reference, and Help. You can also Save & Share with the Community and view patterns you create or favorite in My Patterns.\n"
    "Edit the Expression & Text to see matches. Roll over matches or the expression for details. PCRE & JavaScript flavors of RegEx are supported. Validate your expression with Tests mode.\n"
    "The side bar includes a Cheatsheet, full Reference, and Help. You can also Save & Share with the Community and view patterns you create or favorite in My Patterns.\n"
    "Explore results with the Tools below. Replace & List output custom results. Details lists capture groups. Explain describes your expression in plain English."s;
const auto pattern  = "$' $2 <-> $1 <-> $& $`"s;
simple CTRE time: 0.508804s
complex CTRE time: 11.9429s
simple std::regex time: 12.1643s
complex std::regex time: 23.6284s

@DubbleClick
Copy link

@hanickadot I know that it's not at the top of your priority list because you want to stay compatible with all sorts of ranges, but having three built in replacement options for 8 bit, 16 bit and 32 bit wide characters would be very convenient. You would probably also come up with a vastly superior solution to mine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants