-
Notifications
You must be signed in to change notification settings - Fork 534
Compactify syntax diagrams #1818
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
In compacting grammar productions, the `BorrowExpression` production needs to preserve the case that allows `&$EXPR`, but did not. Let's fix that.
Let's fix up the grammar compactions to follow the indentation and spacing style we use for this grammar (especially as the indentation is preserved into the rendering), remove superfluous parentheses, and add line breaks in alternations and elsewhere to add clarity.
Using a number of strategies, there are some other grammar compactions and regularizations that stand out as similarly possible, so let's do those. In doing these, we prefer to pick a factoring of the grammar that calls for the least lookahead or backtracking.
When I get a chance I can take a closer look, but I want to exercise caution with this. Some of these productions were intentionally written in a "not maximally minimized" form, as certain alternations were intended to have semantically meaningful distinctions (but for brevity didn't bother giving them distinct names, which might be worthwhile). Some of these are definitely improvements, though, so will need to go through them carefully. |
Yes, I debated too on a number of these. It's something to discuss. |
RegOperand -> (ParamName `=`)? | ||
( | ||
DirSpec `(` RegSpec `)` Expression | ||
| DualDirSpec `(` RegSpec `)` DualDirSpecExpression | ||
| `sym` PathExpression | ||
| `const` Expression | ||
| `label` `{` Statements? `}` | ||
RegOperand -> | ||
(ParamName `=`)? ( | ||
( | ||
DirSpec `(` RegSpec `)` | ||
| `const` | ||
) Expression | ||
| DualDirSpec `(` RegSpec `)` DualDirSpecExpression | ||
| `sym` PathExpression | ||
| `label` `{` Statements? `}` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's an example of one that I debated on for awhile, in terms of whether the transformation was worth it. The question I asked myself was whether transforming it in this way added insight.
In the end, I decided I think it does actually. E.g., we need to decide on what syntax to use for injecting constant pointers -- e.g. whether const
or sym
is better for that -- and in that context, syntactically, it is interesting that Expression
shares both const
and e.g. in(<reg>)
as common prefixes, but not sym
.
That is perhaps the main thing to consider here. This PR now approximates a locally most-compressed factoring, by trying to keep shared prefixes and shared postfixes together (with a preference for shared prefixes). We could adopt that as our guiding rule, and then also say that anywhere that makes us unhappy, we should just be naming the production separately anyway, and that might work out OK. The diagrams often end up looking better by naming more things. |
In the follow on commits, I had updated the text for these productions to make them as readable and consistent as I could, given this system of trying to consolidate shared prefixes and suffixes where possible. In discussion today, however, @ehuss and I decided that these still weren't good enough (at commit 13e008327). In particular, the statement of the productions that is there now in the Reference, while it repeats a lot of terminals and nonterminals, is framed in such a way that each alternation represents a different "kind" of item (e.g., many of the alternations match the different AST nodes we generate in the compiler), and the grammar looks more like a readable syntactic Rust pattern. I might take another pass to see about naming more of the nonterminals, but unless we come up with a better idea here, I think most of this is going to get reverted in this PR. So in that context, over in #1805 (comment), @joshtriplett, you had said:
I wonder whether you might have some ideas, in looking at this PR, about what we might be able to do here. |
I'm working on revision that leans more heavily into how the compiler thinks about it. I'm going through the I'm finding, in doing this, that I do tend to prefer more names to fewer. One of the goals of the Reference is to provide common language for talking about things, and this does seem maybe helpful in that respect. I'm going with names similar to those in the compiler, when those are reasonable, as it is always convenient when there is shared nomenclature between lang and compiler. |
There'll always be tension between the graphical and textual representation. While I agree with #1805 (comment) in spirit, I'd find it perfectly reasonable to reject this PR (or at least 7d720ec), as it makes the textual representation much more difficult to understand than it eases the graphical one. In addition, I myself would find PRs such as this very hard to review. |
@traviscross I think the principle of giving things distinct names if they can't be unified is a good idea. |
Let's just close out this PR as we're not going with the approach here. |
Fixes #1805
This makes the grammar rules more compact by removing superfluous elements. It leads to much simpler syntax diagrams at the expense of more difficult to follow (nested) text representation. See #1805 for some examples.
I visually compared every before-and-after to make sure that the grammar rules represent the same syntax. Please give it firm look anyway.