I really think the documentation should clearly state that this isn't a packrat parser #1081

rlpowell · 2025-02-05T07:06:29Z

rlpowell
Feb 5, 2025

Because almost all PEG parsers are packrat parsers, it never occurred to me that Pest wouldn't be one, which is why I spent uh a lot of hours on #1080 , because I was trying to get enough information out of Pest to understand why something that takes https://github.com/lojban/ilmentufa/ (PEGJS) about 0.1 seconds was taking Pest ~390 seconds. (Fun fact: the input "klama" (that's the entire input, 5 characters) takes Pest 9 seconds, the input "zarci" (again, 5 characters) takes Pest 390 seconds; they both follow approximately the same path through the grammar, I have no idea what the issue is).

For those of us who have been around PEGs for a while (I was around when the original Ford paper was published), packrat is assumed, even though it's not required.

I enjoyed the work I did on that PR, and I learned a bunch, but given that Pest isn't going to be able to handle my grammar (I think if you look at https://github.com/lojban/ilmentufa/blob/master/camxes.peg , you'll agree that automatic optimization probably isn't going to do the trick), that time was time that did not advance me towards my actual goals.

I think somewhere prominent y'all should say something like "WARNING: Pest does not use packrat parsing, it instead tries to handle exponential parsing cases with automated optimizations. This is known to not work in some cases; see #685 ".

tomtau · 2025-02-06T00:30:40Z

tomtau
Feb 6, 2025
Maintainer

I opened this issue on the book repo: pest-parser/book#57

Having said that, the fact that pest doesn't use packrat parsing now doesn't mean it won't use it in the future... I feel a bit doubtful about the performance overhead, especially in the context of pest3 which may have more overhead from the typed tree generation, so it may be potentially interesting to revisit the packrat vs automatic optimizations tradeoff.

0 replies

rlpowell · 2025-02-09T18:49:28Z

rlpowell
Feb 9, 2025
Author

So just for my own amusement, I tried the stupidest memoization I could think of, to see if it helped. I figured hey, I already did the tracing stuff, which means that a single call is wrapping every rule check, perfect place to put memoization, right? The diffs are at https://gist.github.com/rlpowell/43ff4ab1550004b8a802406b49f7b4d2 , but all it really is is I added:

memo: Box<HashMap<String, &'static str>>,

to ParserState (yes, it's a &str, I was going for quick and dirty) and lines like this to the tracing module wherever a production failed:

 (*new_state.memo).insert(memo_string.clone(), "Err");

memo_string there is the rule plus the position.

This took my 5 character pathological example from ~390 seconds to 0.4 seconds. (Doing the equivalent thing with successful productions just doesn't work, and given how well failed-only worked I didn't bother to figure out why.)

So, yeah. I'd prefer to stay with Pest, partly because sunk costs and partly because it's the most popular Rust PEG parser, so if there's a version of this that y'all would be willing to mainline, please let me know; I have zero interest in maintaining my own fork.

If you do want me to polish this up, I'd love a pointer to a real-time place to discuss it, as that pretty clearly isn't the Discord.

FWIW, it turns out ( https://docs.rs/peg/latest/peg/#caching-and-left-recursion ) that rust-peg is also not a packrat parser by default, but you can mark individual productions for caching. It seems very Rust-y that the top two PEG parsers are both like "make your grammar performant that's not our job". :) rust-peg also has left-recursion support via caching, which my project also needs, so if Pest doesn't work out I'll go there. But I dunno, maybe y'all want to do a similar thing where productions can be marked for caching

1 reply

tomtau Feb 10, 2025
Maintainer

If you do want me to polish this up, I'd love a pointer to a real-time place to discuss it, as that pretty clearly isn't the Discord.

@rlpowell thanks for looking into this. it's great that even the simplest memoization works for that grammar.

I think there are two distinct follow ups:

For the existing pest codebase, it'd be good to run benchmarks https://github.com/pest-parser/pest/tree/master/grammars/benches to see the memoization performance impact:

if it doesn't look too bad, it could go to the pest main repo under default
if it could have a significant negative parsing performance impact in some grammars, it could go to the pest main repo under a "memo" feature flag

including it it in the next breaking version: memoization and left-recursion support pest3#24 as I said, I felt a bit doubtful about the original decision, given it leads to some accidental complexity (extra requirement on user grammars, the need for the extra incomplete ad-hoc grammar validation and optimization code etc.), so it's worth revisiting it in that context: does it diverge from the original pest2 and gets memo by default? or would it make sense to include it on the per-annotated rule basis like in rust-peg?

rlpowell · 2025-02-09T18:55:50Z

rlpowell
Feb 9, 2025
Author

Just for amusement, the output from parsing those two 5-character strings (two separate runs) had 4400 memo hits.

Yes, I'm aware the grammar is terrible; my plan is to set up a testing suite of inputs to the grammar and their results so I can tweak it and make sure I'm not making changes that change the output.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I really think the documentation should clearly state that this isn't a packrat parser #1081

{{title}}

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

I really think the documentation should clearly state that this isn't a packrat parser #1081

rlpowell Feb 5, 2025

Replies: 3 comments · 1 reply

tomtau Feb 6, 2025 Maintainer

rlpowell Feb 9, 2025 Author

tomtau Feb 10, 2025 Maintainer

rlpowell Feb 9, 2025 Author

rlpowell
Feb 5, 2025

Replies: 3 comments 1 reply

tomtau
Feb 6, 2025
Maintainer

rlpowell
Feb 9, 2025
Author

tomtau Feb 10, 2025
Maintainer

rlpowell
Feb 9, 2025
Author