Skip to content

Conversation

@titzer
Copy link
Contributor

@titzer titzer commented Dec 8, 2025

This implements parser support and (I think) the intended semantics that custom page sizes that are 0 or not powers of two are malformed, and powers of two that are not 1 or 65536 are invalid.

@rossberg PTAL. There is a shift/reduce conflict here due to the factoring of the pagetype/memorytype having to do with the syntactic sugar for data segments. I think it requires left-factoring the grammar and rearranging it a little; I wasn't sure if you a convention you were following for how that's done.

Copy link
Member

@rossberg rossberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this sort of parser conflict arises whenever you allow a parenthesised phrase (like pagetype here) to be empty that can be followed by another parenthesised clause. Because of the paren token, the parser would need a look-ahead of 2 to know what case it's in.

The only way I know to resolve this in an LALR(1) grammar is by not using an empty production but instead duplicating the use sites of the symbol, once with and once without the extra phrase. That's what I did in other such cases anyway.

A couple of high-level comments:

  • Given that the binary format for pagetype is in logarithmic representation, the AST should probably reflect that. That's what we do for alignment annotations as well. Then some weird cases (like PageT 0) are avoided by construction.

  • Please expand tabs.


pagetype :
| LPAR PAGESIZE NAT RPAR
{ let v' =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can simplify this using the nat32 helper and Lib.Int.is_power_of_two, cf. the action for the align production.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using power_of_two now, will rework for nat32.

@titzer
Copy link
Contributor Author

titzer commented Dec 9, 2025

Yeah, this sort of parser conflict arises whenever you allow a parenthesised phrase (like pagetype here) to be empty that can be followed by another parenthesised clause. Because of the paren token, the parser would need a look-ahead of 2 to know what case it's in.

The only way I know to resolve this in an LALR(1) grammar is by not using an empty production but instead duplicating the use sites of the symbol, once with and once without the extra phrase. That's what I did in other such cases anyway.

Thanks, there was a quick fix.

A couple of high-level comments:

* Given that the binary format for pagetype is in logarithmic representation, the AST should probably reflect that. That's what we do for alignment annotations as well. Then some weird cases (like PageT 0) are avoided by construction.

I had considered this at first but started off using the size as an int. I'll rework it to use a nat32 of the logarithm.

* Please expand tabs.

Done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants