Codex/add lalr parsing path to parser.py#354
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a significant and well-engineered improvement to the parser by adding a fast LALR parsing path with a fallback to the existing Earley parser for robustness. The implementation is thorough, including necessary grammar adjustments, backend selection logic, and caching. The accompanying documentation updates and new tests are excellent and greatly improve the project's quality.
However, I found a critical logical bug in the new transformation logic within the goals method in vibeprolog/parser.py. This change incorrectly alters the logical evaluation of combined conjunction and disjunction goals, which could lead to incorrect program behavior. I have also included a minor suggestion to improve code readability.
| # When a leading conjunction was captured as part of a disjunction, | ||
| # peel off the first goal to match the intended goal separation. | ||
| if len(result) == 1: | ||
| candidate = result[0] | ||
| if isinstance(candidate, Compound) and candidate.functor == ';': | ||
| left, right = candidate.args | ||
| if isinstance(left, (Compound, ParenthesizedComma)) and getattr(left, "functor", None) == ',': | ||
| leading_terms = self._collect_comma_terms(left) | ||
| if len(leading_terms) >= 2: | ||
| first_goal = leading_terms[0] | ||
| if isinstance(first_goal, List): | ||
| return result | ||
| rebuilt_left = self._rebuild_comma_chain(leading_terms[1:]) | ||
| if rebuilt_left is not None: | ||
| result = [first_goal, Compound(";", (rebuilt_left, right))] | ||
| return result |
There was a problem hiding this comment.
This transformation introduces a logical bug. It incorrectly changes the associativity of conjunctions and disjunctions. The code transforms a goal of the form (A, B) ; C into a sequence of goals A, (B ; C).
These two forms are not logically equivalent:
(A, B) ; Cmeans(A AND B) OR CA, (B ; C)meansA AND (B OR C)
Consider the case where A is false and C is true:
- The original expression
(false, B) ; trueevaluates totrue. - The transformed expression
false, (B ; true)evaluates tofalse.
According to standard Prolog operator precedence, the comma (,) binds more tightly than the semicolon (;), so a, b ; c is parsed as (a, b) ; c. This should be treated as a single goal for the engine to resolve. The previous behavior was correct, and this new logic should be removed.
| preferred_backend = backend or self._preferred_backend | ||
| backend_order = ( | ||
| ("lalr", "earley") if preferred_backend == "auto" else (preferred_backend,) | ||
| ) |
There was a problem hiding this comment.
This logic to determine the backend_order is a bit dense. It can be rewritten using a standard if/else block to improve readability, in line with the style guide's principle that code should be easy to understand.
| preferred_backend = backend or self._preferred_backend | |
| backend_order = ( | |
| ("lalr", "earley") if preferred_backend == "auto" else (preferred_backend,) | |
| ) | |
| preferred_backend = backend or self._preferred_backend | |
| if preferred_backend == "auto": | |
| backend_order = ("lalr", "earley") | |
| else: | |
| backend_order = (preferred_backend,) |
References
- Code should be easy to understand for all team members. (link)
There was a problem hiding this comment.
⚠️ 1 Issue Found
| Severity | Issue | Location |
|---|---|---|
| CRITICAL | Removing OP_SYMBOL from atom breaks parsing of graphic atoms in LALR mode | vibeprolog/parser.py:1938 |
Recommendation: Address critical issue before merge
Review Details
Files: vibeprolog/parser.py (1 issue)
Checked: Security, bugs, performance, error handling
Inline Comment:
CRITICAL: Removing OP_SYMBOL from atom breaks parsing of graphic atoms in LALR mode
Graphic atoms (sequences of graphic characters not defined as operators) cannot be parsed as atoms in LALR mode because OP_SYMBOL is removed from the atom rule. This breaks parsing of valid Prolog atoms like '++' or other graphic sequences.
def _prepare_grammar_for_backend(self, grammar: str, backend: str) -> str:
return grammar # Keep grammar the same for both backends
Or find an alternative way to resolve lexer conflicts without breaking atom parsing.
No description provided.