Refactor jsx mode in parser #7751

nojaf · 2025-08-02T10:15:49Z

I was experimenting with the parser related to JSX and noticed that we have a somewhat convoluted mechanism for handling /> and - in identifier names.

First, I created a token dump tool in res_parser, which was previously missing.

Currently, the parser employs a sequence of Scanner.set_jsx_mode p.Parser.scanner; and Scanner.pop_mode p.scanner Jsx; while processing elements to distinguish between parser JSX and non-JSX. However, this is mainly used in the following cases:

Allowing a - inside an identifier. This logic belongs in the parser, but it currently resides in the scanner, which feels inappropriate.
Combining a < + / into a </ token. I would prefer using a lookahead for when a < is encountered. This would clarify that it's specific to JSX parsing. Even though LessThanSlash exists, there is still a separate LessThan + Slash check, which makes the code a bit messy.

This is an effort to streamline the JSX mode.

PS: to run the local analysis tests, I had to revert to legacy clean. This is for the better until we figure out #7707

nojaf · 2025-08-02T10:17:59Z

compiler/syntax/src/res_core.ml

    | LessThan ->
      (* Imagine: <div> <Navbar /> <
       * is `<` the start of a jsx-child? <div …
       * or is it the start of a closing tag?  </div>
       * reconsiderLessThan peeks at the next token and
       * determines the correct token to disambiguate *)
-      let token = Scanner.reconsider_less_than p.scanner in


This is what bother me a bit, there is LessThanSlash above, yet we still need to do the reconsider_less_than call.

nojaf · 2025-08-02T10:19:31Z

compiler/syntax/src/res_core.ml

        let attr_expr = parse_primary_expr ~operand:(parse_atomic_expr p) p in
        Some (Parsetree.JSXPropValue ({txt = name; loc}, optional, attr_expr))
      | _ -> Some (Parsetree.JSXPropPunning (false, {txt = name; loc})))
  (* {...props} *)
  | Lbrace -> (
-    Scanner.pop_mode p.scanner Jsx;


This is rather confusing when you are in a nested jsx scenario:

<div> <p> {foo} </p> </div>

Popping Jsx from p requires the pop of div also to happen to get out of Jsx mode.

nojaf · 2025-08-02T11:55:12Z

tests/analysis_tests/tests/src/expected/CompletionJsx.res.txt

-posCursor:[30:12] posNoWhite:[30:11] Found expr:[30:9->32:10]
-JSX <di:[30:10->30:12] div[32:6->32:9]=...[32:6->32:9]> _children:None
+posCursor:[30:12] posNoWhite:[30:11] Found expr:[30:9->30:12]
+JSX <di:[30:10->30:12] > _children:None


This changes is because there is slightly different AST for:

<div> <di </div>

It used to be

[ structure_item (A.res[1,0+0]..[3,12+6]) Pstr_eval expression (A.res[1,0+0]..[3,12+6]) Pexp_jsx_container_element "div" (A.res[1,0+1]..[1,0+4]) jsx_props = [] > [1,0+4] jsx_children = [ expression (A.res[2,6+2]..[3,12+6]) Pexp_jsx_container_element "di" (A.res[2,6+3]..[2,6+5]) jsx_props = [ div ] > [3,12+5] jsx_children = [] ] ]

and now is

[ structure_item (A.res[1,0+0]..[3,12+6]) Pstr_eval expression (A.res[1,0+0]..[3,12+6]) Pexp_jsx_container_element "div" (A.res[1,0+1]..[1,0+4]) jsx_props = [] > [1,0+4] jsx_children = [ expression (A.res[2,6+2]..[2,6+5]) Pexp_jsx_unary_element "di" (A.res[2,6+3]..[2,6+5]) jsx_props = [] ] ]

I think this is more correct. The unary (new) versus container (old) doesn't matter that much, neither can be determined for <di.
However, the container had a weird prop div, which it no longer has.

pkg-pr-new · 2025-08-02T11:58:18Z

Open in StackBlitz

rescript

npm i https://pkg.pr.new/rescript-lang/rescript@7751

@rescript/darwin-arm64

npm i https://pkg.pr.new/rescript-lang/rescript/@rescript/darwin-arm64@7751

@rescript/darwin-x64

npm i https://pkg.pr.new/rescript-lang/rescript/@rescript/darwin-x64@7751

@rescript/linux-arm64

npm i https://pkg.pr.new/rescript-lang/rescript/@rescript/linux-arm64@7751

@rescript/linux-x64

npm i https://pkg.pr.new/rescript-lang/rescript/@rescript/linux-x64@7751

@rescript/win32-x64

npm i https://pkg.pr.new/rescript-lang/rescript/@rescript/win32-x64@7751

commit: 0ea79fd

nojaf · 2025-08-04T14:03:19Z

Hi @cristianoc, sorry for my eagerness, could you take a look at these changes?

cristianoc · 2025-08-04T18:03:11Z

Are there differences in white space behavior? Are these intended? (For composite tokens)? Or possible ambiguities when single characters are tokenised in isolation.
Traveling so could not take a look in detail, but these are the things that come to mind.

nojaf · 2025-08-04T18:12:29Z

Or possible ambiguities when single characters are tokenised in isolation.

No, actually not, there is no other way the language can encounter </ besides JSX.
So that made me wonder if we needed the jsx mode in the first place.

Safe travels, will ask someone else for a review.
(Don't be shy to take a look at this once you are back 😇)

@zth , @shulhi , @aspeddro any volunteers?

Copilot

Pull Request Overview

This PR refactors the JSX mode handling in the ReScript parser by removing the convoluted JSX mode mechanism from the scanner and moving JSX-specific logic to the parser. The refactoring streamlines JSX parsing by eliminating the need for Scanner.set_jsx_mode and Scanner.pop_mode calls throughout the codebase, and introduces lookahead functionality for better JSX token handling.

Key changes:

Removes JSX mode from the scanner and replaces it with parser-level JSX identifier handling
Introduces lookahead functions (peekMinus, peekSlash) for better JSX token disambiguation
Adds a token debugger tool for development purposes

Reviewed Changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`compiler/syntax/src/res_scanner.ml`	Removes JSX mode handling and adds lookahead functions for minus and slash characters
`compiler/syntax/src/res_core.ml`	Implements JSX identifier parsing in the parser with lookahead-based logic
`compiler/syntax/src/res_token.ml`	Removes LessThanSlash token type
`compiler/syntax/src/res_token_debugger.ml`	Adds new token debugging utility
`tests/syntax_tests/data/parsing/errors/expressions/expected/jsx.res.txt`	Updates expected error message
Various package.json files	Reverts to legacy clean command for analysis tests

compiler/syntax/src/res_core.ml

Copilot · 2025-08-05T13:10:05Z

compiler/syntax/src/res_core.ml

+    p.token <- token
+  | Uident txt when Scanner.peekMinus p.scanner ->
+    let buffer = Buffer.create (String.length txt) in
+    Buffer.add_string buffer txt;
+    Parser.next p;
+    let name = visit buffer |> Buffer.contents in
+    let token = Token.Uident name in
+    p.token <- token


[nitpick] Direct mutation of parser state (p.token <-) breaks encapsulation and makes the code harder to reason about. Consider using a proper parser method or returning the modified token instead of mutating parser state directly.

Suggested change

p.token <- token

| Uident txt when Scanner.peekMinus p.scanner ->

let buffer = Buffer.create (String.length txt) in

Buffer.add_string buffer txt;

Parser.next p;

let name = visit buffer |> Buffer.contents in

let token = Token.Uident name in

p.token <- token

set_token p token

| Uident txt when Scanner.peekMinus p.scanner ->

let buffer = Buffer.create (String.length txt) in

Buffer.add_string buffer txt;

Parser.next p;

let name = visit buffer |> Buffer.contents in

let token = Token.Uident name in

set_token p token

Copilot · 2025-08-05T13:10:05Z

compiler/syntax/src/res_core.ml

+    p.token <- token
+  | Uident txt when Scanner.peekMinus p.scanner ->
+    let buffer = Buffer.create (String.length txt) in
+    Buffer.add_string buffer txt;
+    Parser.next p;
+    let name = visit buffer |> Buffer.contents in
+    let token = Token.Uident name in
+    p.token <- token


[nitpick] Direct mutation of parser state (p.token <-) breaks encapsulation and makes the code harder to reason about. Consider using a proper parser method or returning the modified token instead of mutating parser state directly.

Suggested change

p.token <- token

| Uident txt when Scanner.peekMinus p.scanner ->

let buffer = Buffer.create (String.length txt) in

Buffer.add_string buffer txt;

Parser.next p;

let name = visit buffer |> Buffer.contents in

let token = Token.Uident name in

p.token <- token

set_token p token

| Uident txt when Scanner.peekMinus p.scanner ->

let buffer = Buffer.create (String.length txt) in

Buffer.add_string buffer txt;

Parser.next p;

let name = visit buffer |> Buffer.contents in

let token = Token.Uident name in

set_token p token

tests/syntax_tests/data/parsing/errors/expressions/expected/jsx.res.txt

cristianoc · 2025-08-06T11:43:52Z

Hitting AI with AI

https://chatgpt.com/share/68933fb4-cef4-8011-a04c-a30808a82b89

Jump to the end for the relevant summary

nojaf · 2025-08-06T12:32:14Z

@cristianoc thanks, hit it again 😅
Think it is fine now.

cristianoc

Looks good to go!

cristianoc · 2025-08-08T08:32:13Z

@nojaf here's one more cleanup one could consider:

Awesome—here’s a no-mutation refactor you can actually drop in and reason about. It consumes tokens and returns structured values; no code rewrites p.token (other than the usual Parser.next p to advance).

Core idea
• Parse JSX tag names via helpers that consume HEAD ( "-" IDENT )* and return the combined name + location (+ shape).
• The parser keeps reading the real token stream; nothing injects synthetic tokens into p.token.

Types

type jsx_ident_kind = [ Lower | Upper ]

type jsx_tag_name =
| Lower of { name : string; loc : Loc.t } (* "a-b-c" )
| QualifiedLower of { path : Longident.t; name : string; loc : Loc.t } ( V.X.y-z )
| Upper of { path : Longident.t; loc : Loc.t } ( V.X.Component *)

Helpers (no mutation, consume & return)

(* Read a single ident at current token, returning its text, loc, and kind. Does not advance. *)
let peek_ident (p : Parser.t) : (string * Loc.t * jsx_ident_kind) option =
match p.Parser.token with
| Token.Lident txt -> Some (txt, Parser.loc p, Lower) | Token.Uident txt -> Some (txt, Parser.loc p, Upper)
| _ -> None

(* Consume one Lident/Uident, error if none. *)
let expect_ident (p : Parser.t) : (string * Loc.t * jsx_ident_kind) option =
match peek_ident p with
| None -> None
| Some (txt, loc, k) -> Parser.next p; Some (txt, loc, k)

(* Consume ("-" IDENT)* appending to [buf], update [last_end], diagnose trailing '-'. )
let rec read_hyphen_chain (p : Parser.t) (buf : Buffer.t) (last_end : Lexing.position ref) : unit =
match p.Parser.token with
| Token.Minus ->
let minus_loc = Parser.loc p in
Parser.next p; ( after '-' )
begin match peek_ident p with
| Some (txt, loc, ) ->
Buffer.add_char buf '-';
Buffer.add_string buf txt;
last_end := (Loc.end loc);
Parser.next p; ( consume ident *)
read_hyphen_chain p buf last_end
| None ->
Parser.err p (Diagnostics.message_at minus_loc "JSX identifier cannot end with a hyphen")
end
| _ -> ()

Read local name (a-b-c or X-Y — head determines kind)

let read_local_jsx_name (p : Parser.t) : (string * Loc.t * jsx_ident_kind) option =
match expect_ident p with
| None -> None
| Some (head, head_loc, kind) ->
let buf = Buffer.create (String.length head + 8) in
Buffer.add_string buf head;
let start_pos = Loc.start head_loc in
let last_end = ref (Loc.end_ head_loc) in
read_hyphen_chain p buf last_end;
let name = Buffer.contents buf in
let loc = Loc.span start_pos !last_end in
Some (name, loc, kind)

Read qualified-or-local tag name (covers a-b, V.Component, V.x-y)

let read_jsx_tag_name (p : Parser.t) : jsx_tag_name option =
match peek_ident p with
| Some (_, , Lower) -> (* Plain lowercase tag with optional hyphens *) read_local_jsx_name p |> Option.map (fun (name, loc, _) -> Lower {name; loc}) | Some (seg, seg_loc, Upper) ->
(* Could be: Upper path ('.' Uident), OR QualifiedLower path '.' Lident('-' ident) *)
let start_pos = Loc.start seg_loc in
let rev_segs = ref [seg] in
let last_end = ref (Loc.end seg_loc) in
Parser.next p; (* consume first Uident *)

  let rec loop_path () =
    match p.Parser.token with
    | Token.Dot ->
        Parser.next p; (* after '.' *)
        begin match peek_ident p with
        | Some (txt, loc, `Upper) ->
            rev_segs := txt :: !rev_segs;
            last_end := Loc.end_ loc;
            Parser.next p;
            loop_path ()
        | Some (_, _, `Lower) ->
            (* QualifiedLower: path already in rev_segs, now read final lowercase with hyphens *)
            begin match read_local_jsx_name p with
            | Some (lname, l_loc, _) ->
                let path = Longident.of_rev_list (List.rev !rev_segs) in
                let loc = Loc.span start_pos (Loc.end_ l_loc) in
                Some (QualifiedLower {path; name = lname; loc})
            | None -> None
            end
        | None ->
            Parser.err p (Diagnostics.message "expected identifier after '.' in JSX tag name");
            None
        end
    | _ ->
        (* Pure Upper path (component) *)
        let path = Longident.of_rev_list (List.rev !rev_segs) in
        let loc = Loc.span start_pos !last_end in
        Some (Upper {path; loc})
  in
  loop_path ()

| None -> None

Call sites (no p.token <- … anywhere)

Opening/self-closing tags

(* ... after consuming '<' ... )
let parse_jsx_opening_or_self_closing_element (p : Parser.t) =
match read_jsx_tag_name p with
| None ->
Parser.err p (Diagnostics.message "expected JSX tag name");
( recover… )
| Some tag ->
( p.token now points to the first token after the name: props or '>' or '/>' )
let props = parse_jsx_props p in
match p.Parser.token with
| Token.SlashGreater -> Parser.next p; Ast.jsx_self_closing tag props
| Token.Greater -> Parser.next p; Ast.jsx_opening tag props
| _ ->
Parser.err p (Diagnostics.message "expected '>' or '/>' after JSX tag name");
( recover… *)

Closing tags (compare names by value, not tokens)

(* ... after consuming '</' ... )
let parse_jsx_closing_element (p : Parser.t) =
match read_jsx_tag_name p with
| None ->
Parser.err p (Diagnostics.message "expected JSX closing tag name");
( recover… *)
| Some closing ->
(match p.Parser.token with
| Token.Greater -> Parser.next p
| _ -> Parser.err p (Diagnostics.message "expected '>' after closing tag"));
closing

(* When finishing an element, ensure names match: *)
let names_equal (open_ : jsx_tag_name) (close_ : jsx_tag_name) =
match open_, close_ with
| Lower a, Lower b -> String.equal a.name b.name
| QualifiedLower a, QualifiedLower b ->
Longident.equal a.path b.path && String.equal a.name b.name
| Upper a, Upper b -> Longident.equal a.path b.path
| _ -> false

Props (if you allow data-/aria-, reuse the same local reader)

let read_jsx_prop_name (p : Parser.t) : (string * Loc.t) option =
match read_local_jsx_name p with
| Some (name, loc, Lower) -> Some (name, loc) | Some (_, _, Upper) ->
Parser.err p (Diagnostics.message "JSX prop names must be lowercase");
None
| None -> None

Why this is robust (and what to check)
• No synthetic tokens injected. The only state change is advancing with Parser.next p. The token stream stays faithful to the lexer, which is great for error recovery, incremental parsing, and source maps.
• Progress is explicit. On each -, we Parser.next before looking for the next ident. No chance of stalling on a-b-c.
• Trailing - is diagnosed. We report and stop instead of swallowing it.
• Qualified forms covered. Upper ('.' Upper)* for components; Upper ('.' Upper)* '.' Lower ('-' ident)* for things like <V.x-y>. Pure lowercase tags handled via Lower.
• Clear post-conditions at call sites: after read_jsx_tag_name, p.token is the token after the tag name. That’s easy to reason about.

Edge cases you can (and should) test
• ✅ , <V.x-y></V.x-y>, <V.Component />, nested combos.
• ❌ , >, <V.a-></V.a->> (should all emit the hyphen error).
• Mismatched closing: (should error via names_equal).

Evaluation

Does it look good?
Yep. This is the “clean” version: no mutation to parser state other than consuming tokens, crisp invariants, and all the correctness issues we discussed are handled in one place. It’s also easy to evolve—if you later decide hyphens are only allowed after lowercase segments, you can enforce that policy right where QualifiedLower is parsed.

Trade-offs:
• Slightly more code than the mutation trick, but much easier to read and test.
• You’ll need tiny plumbing in the AST to carry jsx_tag_name (or convert it immediately to whatever the existing AST nodes expect).

If you want me to adapt this precisely to your Loc/Diagnostics/Ast types (or wire it into your exact parse_jsx_* functions), paste those type signatures and I’ll tailor it.

nojaf · 2025-08-08T08:37:22Z

Hmm, I think I like it. So plan is to tailor the AST on element name and prop name?
Thus, removing the need for a combined token thing.
Will check it out sometime.

nojaf added 8 commits August 1, 2025 20:35

Add token dump printer

2699aa7

Remove ForwardSlash token

b60bcae

Document RESCRIPT_BSC_EXE for local usage

0e8b542

Process hypens in parse_module_long_ident for jsx

7d6a563

Update test snapshot

b0f4806

Add how to view tokens.

50d46ef

fmt

849d19f

Clean up

28cfceb

nojaf commented Aug 2, 2025

View reviewed changes

nojaf added 3 commits August 2, 2025 12:27

Correct fragment range

bd46a3e

Use legacy clean for analysis projects

8bb4e41

Update analysis snapshot

9d8641e

nojaf commented Aug 2, 2025

View reviewed changes

nojaf marked this pull request as ready for review August 2, 2025 12:00

nojaf requested a review from cristianoc August 2, 2025 12:00

nojaf changed the title ~~Add token dump printer~~ Refactor jsx mode in parser Aug 2, 2025

nojaf requested review from Copilot and removed request for cristianoc August 4, 2025 18:12

Copilot AI reviewed Aug 5, 2025

View reviewed changes

nojaf added 2 commits August 5, 2025 15:34

Copilot review suggestion

69054c5

Return original error directly

452301a

nojaf added 2 commits August 6, 2025 14:13

Add trailing hypen error

5d05a08

Mutate token on the call side.

7665a05

cristianoc approved these changes Aug 6, 2025

View reviewed changes

Add changelog entry

0ea79fd

nojaf enabled auto-merge (squash) August 6, 2025 13:22

nojaf merged commit c738f75 into rescript-lang:master Aug 6, 2025
52 of 53 checks passed

nojaf mentioned this pull request Aug 10, 2025

Add jsx_tag_name #7760

Draft

Refactor jsx mode in parser #7751

Refactor jsx mode in parser #7751

Conversation

nojaf commented Aug 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nojaf Aug 2, 2025

Choose a reason for hiding this comment

Uh oh!

nojaf Aug 2, 2025

Choose a reason for hiding this comment

Uh oh!

nojaf Aug 2, 2025

Choose a reason for hiding this comment

Uh oh!

pkg-pr-new bot commented Aug 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nojaf commented Aug 4, 2025

Uh oh!

cristianoc commented Aug 4, 2025

Uh oh!

nojaf commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Copilot AI Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cristianoc commented Aug 6, 2025

Uh oh!

nojaf commented Aug 6, 2025

Uh oh!

cristianoc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cristianoc commented Aug 8, 2025

Uh oh!

nojaf commented Aug 8, 2025

Uh oh!

Uh oh!

nojaf commented Aug 2, 2025 •

edited

Loading

pkg-pr-new bot commented Aug 2, 2025 •

edited

Loading

nojaf commented Aug 4, 2025 •

edited

Loading