Replies: 3 comments 1 reply
-
I opened this issue on the book repo: pest-parser/book#57 Having said that, the fact that pest doesn't use packrat parsing now doesn't mean it won't use it in the future... I feel a bit doubtful about the performance overhead, especially in the context of pest3 which may have more overhead from the typed tree generation, so it may be potentially interesting to revisit the packrat vs automatic optimizations tradeoff. |
Beta Was this translation helpful? Give feedback.
-
So just for my own amusement, I tried the stupidest memoization I could think of, to see if it helped. I figured hey, I already did the tracing stuff, which means that a single call is wrapping every rule check, perfect place to put memoization, right? The diffs are at https://gist.github.com/rlpowell/43ff4ab1550004b8a802406b49f7b4d2 , but all it really is is I added:
to ParserState (yes, it's a &str, I was going for quick and dirty) and lines like this to the tracing module wherever a production failed:
This took my 5 character pathological example from ~390 seconds to 0.4 seconds. (Doing the equivalent thing with successful productions just doesn't work, and given how well failed-only worked I didn't bother to figure out why.) So, yeah. I'd prefer to stay with Pest, partly because sunk costs and partly because it's the most popular Rust PEG parser, so if there's a version of this that y'all would be willing to mainline, please let me know; I have zero interest in maintaining my own fork. If you do want me to polish this up, I'd love a pointer to a real-time place to discuss it, as that pretty clearly isn't the Discord. FWIW, it turns out ( https://docs.rs/peg/latest/peg/#caching-and-left-recursion ) that rust-peg is also not a packrat parser by default, but you can mark individual productions for caching. It seems very Rust-y that the top two PEG parsers are both like "make your grammar performant that's not our job". :) rust-peg also has left-recursion support via caching, which my project also needs, so if Pest doesn't work out I'll go there. But I dunno, maybe y'all want to do a similar thing where productions can be marked for caching |
Beta Was this translation helpful? Give feedback.
-
Just for amusement, the output from parsing those two 5-character strings (two separate runs) had 4400 memo hits. Yes, I'm aware the grammar is terrible; my plan is to set up a testing suite of inputs to the grammar and their results so I can tweak it and make sure I'm not making changes that change the output. |
Beta Was this translation helpful? Give feedback.
-
Because almost all PEG parsers are packrat parsers, it never occurred to me that Pest wouldn't be one, which is why I spent uh a lot of hours on #1080 , because I was trying to get enough information out of Pest to understand why something that takes https://github.com/lojban/ilmentufa/ (PEGJS) about 0.1 seconds was taking Pest ~390 seconds. (Fun fact: the input "klama" (that's the entire input, 5 characters) takes Pest 9 seconds, the input "zarci" (again, 5 characters) takes Pest 390 seconds; they both follow approximately the same path through the grammar, I have no idea what the issue is).
For those of us who have been around PEGs for a while (I was around when the original Ford paper was published), packrat is assumed, even though it's not required.
I enjoyed the work I did on that PR, and I learned a bunch, but given that Pest isn't going to be able to handle my grammar (I think if you look at https://github.com/lojban/ilmentufa/blob/master/camxes.peg , you'll agree that automatic optimization probably isn't going to do the trick), that time was time that did not advance me towards my actual goals.
I think somewhere prominent y'all should say something like "WARNING: Pest does not use packrat parsing, it instead tries to handle exponential parsing cases with automated optimizations. This is known to not work in some cases; see #685 ".
Beta Was this translation helpful? Give feedback.
All reactions