Skip to content

Compactify syntax diagrams #1818

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed

Conversation

lukaslueg
Copy link
Contributor

Fixes #1805

This makes the grammar rules more compact by removing superfluous elements. It leads to much simpler syntax diagrams at the expense of more difficult to follow (nested) text representation. See #1805 for some examples.

I visually compared every before-and-after to make sure that the grammar rules represent the same syntax. Please give it firm look anyway.

@rustbot rustbot added the S-waiting-on-review Status: The marked PR is awaiting review from a maintainer label May 8, 2025
In compacting grammar productions, the `BorrowExpression` production
needs to preserve the case that allows `&$EXPR`, but did not.  Let's
fix that.
Let's fix up the grammar compactions to follow the indentation and
spacing style we use for this grammar (especially as the indentation
is preserved into the rendering), remove superfluous parentheses, and
add line breaks in alternations and elsewhere to add clarity.
Using a number of strategies, there are some other grammar compactions
and regularizations that stand out as similarly possible, so let's do
those.

In doing these, we prefer to pick a factoring of the grammar that
calls for the least lookahead or backtracking.
@ehuss
Copy link
Contributor

ehuss commented May 12, 2025

When I get a chance I can take a closer look, but I want to exercise caution with this. Some of these productions were intentionally written in a "not maximally minimized" form, as certain alternations were intended to have semantically meaningful distinctions (but for brevity didn't bother giving them distinct names, which might be worthwhile).

Some of these are definitely improvements, though, so will need to go through them carefully.

@traviscross
Copy link
Contributor

traviscross commented May 12, 2025

Yes, I debated too on a number of these. It's something to discuss.

Comment on lines -76 to +84
RegOperand -> (ParamName `=`)?
(
DirSpec `(` RegSpec `)` Expression
| DualDirSpec `(` RegSpec `)` DualDirSpecExpression
| `sym` PathExpression
| `const` Expression
| `label` `{` Statements? `}`
RegOperand ->
(ParamName `=`)? (
(
DirSpec `(` RegSpec `)`
| `const`
) Expression
| DualDirSpec `(` RegSpec `)` DualDirSpecExpression
| `sym` PathExpression
| `label` `{` Statements? `}`
Copy link
Contributor

@traviscross traviscross May 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's an example of one that I debated on for awhile, in terms of whether the transformation was worth it. The question I asked myself was whether transforming it in this way added insight.

In the end, I decided I think it does actually. E.g., we need to decide on what syntax to use for injecting constant pointers -- e.g. whether const or sym is better for that -- and in that context, syntactically, it is interesting that Expression shares both const and e.g. in(<reg>) as common prefixes, but not sym.

@traviscross
Copy link
Contributor

traviscross commented May 12, 2025

...as certain alternations were intended to have semantically meaningful distinctions (but for brevity didn't bother giving them distinct names, which might be worthwhile).

That is perhaps the main thing to consider here. This PR now approximates a locally most-compressed factoring, by trying to keep shared prefixes and shared postfixes together (with a preference for shared prefixes). We could adopt that as our guiding rule, and then also say that anywhere that makes us unhappy, we should just be naming the production separately anyway, and that might work out OK. The diagrams often end up looking better by naming more things.

@traviscross
Copy link
Contributor

traviscross commented May 13, 2025

In the follow on commits, I had updated the text for these productions to make them as readable and consistent as I could, given this system of trying to consolidate shared prefixes and suffixes where possible. In discussion today, however, @ehuss and I decided that these still weren't good enough (at commit 13e008327). In particular, the statement of the productions that is there now in the Reference, while it repeats a lot of terminals and nonterminals, is framed in such a way that each alternation represents a different "kind" of item (e.g., many of the alternations match the different AST nodes we generate in the compiler), and the grammar looks more like a readable syntactic Rust pattern.

I might take another pass to see about naming more of the nonterminals, but unless we come up with a better idea here, I think most of this is going to get reverted in this PR. So in that context, over in #1805 (comment), @joshtriplett, you had said:

All of the proposed diagram simplifications look great to me. If the resulting text is less readable, I think we have some options to try to improve the readability.

I wonder whether you might have some ideas, in looking at this PR, about what we might be able to do here.

@traviscross
Copy link
Contributor

traviscross commented May 14, 2025

I'm working on revision that leans more heavily into how the compiler thinks about it. I'm going through the rustc AST and naming more productions to model the state that it keeps. This is, I suspect, another principled way to go about it, in a bit of the opposite direction.

I'm finding, in doing this, that I do tend to prefer more names to fewer. One of the goals of the Reference is to provide common language for talking about things, and this does seem maybe helpful in that respect. I'm going with names similar to those in the compiler, when those are reasonable, as it is always convenient when there is shared nomenclature between lang and compiler.

@lukaslueg
Copy link
Contributor Author

There'll always be tension between the graphical and textual representation. While I agree with #1805 (comment) in spirit, I'd find it perfectly reasonable to reject this PR (or at least 7d720ec), as it makes the textual representation much more difficult to understand than it eases the graphical one. In addition, I myself would find PRs such as this very hard to review.
Having said all that, I'd encourage to reconsider if the diagrams should have their own visual-transformation-pass.

@joshtriplett
Copy link
Member

@traviscross I think the principle of giving things distinct names if they can't be unified is a good idea.

@traviscross
Copy link
Contributor

Let's just close out this PR as we're not going with the approach here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: The marked PR is awaiting review from a maintainer
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Simplify syntax diagrams
5 participants