refactor: custom lexer #437

psteinroe · 2025-07-01T07:03:04Z

adds a new tokenizer crate that turns a string into simple tokens
adds a new lexer + lexer_codegen that uses the tokeniser to lex into a new SyntaxKind enum

the new implementation is

much more performant (no extra string allocations, no call to C library)
works with broken strings (!!!!)
custom-made to our use-case (eg the LineEnding variant comes with a count)

in a follow-up, we will be able to:

parse custom parameters that popular tools use
pre-process to remove unsupported stuff
parse non-sql content (e.g. commands) via a simple custom parser

todos:

use new lexer in splitter
make sure we support all the different parameter formats popular tools use -> will do it in a follow-up
tests

juleswritescode · 2025-07-05T10:29:33Z

Cargo.toml

@@ -24,6 +24,7 @@ biome_rowan              = "0.5.7"
 biome_string_case        = "0.5.8"
 bpaf                     = { version = "0.9.15", features = ["derive"] }
 crossbeam                = "0.8.4"
+enum-iterator            = "2.1.0"


wir haben schon strum, denke mal das macht das selbe

juleswritescode · 2025-07-05T10:36:55Z

crates/pgt_workspace/src/workspace/server/annotation.rs

+        let mut ends_with_semicolon = false;
+
+        // Iterate through tokens in reverse to find the last non-whitespace token
+        for idx in (0..lexed.len()).rev() {


wie wär's hier mit matches(iter.filter(..).next_back(), Some(semi)) ?

juleswritescode · 2025-07-05T11:13:16Z

crates/pgt_lexer/src/lexed.rs

+
+    /// Returns an iterator over token kinds
+    pub fn tokens(&self) -> impl Iterator<Item = SyntaxKind> + '_ {
+        (0..self.len()).map(move |i| self.kind(i))


self.kind.iter().copied() ?

juleswritescode · 2025-07-05T11:14:30Z

crates/pgt_lexer/src/lexed.rs

+
+    /// Returns the kind of token at the given index
+    pub fn kind(&self, idx: usize) -> SyntaxKind {
+        assert!(idx < self.len());


willst du hier noch ne message anfügen?
ansonsten wär's vllt. besser, einfach den access in 53 panicen zu lassen, dann kriegt man zumindest eine index-out-of-bounds meldung, oder?

juleswritescode · 2025-07-05T11:21:49Z

crates/pgt_lexer/src/lexed.rs

+            .collect()
+    }
+
+    pub(crate) fn text_range(&self, i: usize) -> std::ops::Range<usize> {


ich glaube in allen fällen wird die std::ops::Range<usize> zu einer TextRange gemapped, vllt. dann besser die logic in range(..) packen?

juleswritescode · 2025-07-05T11:22:21Z

crates/pgt_lexer/src/lexed.rs

+    fn range_text(&self, r: std::ops::Range<usize>) -> &str {
+        assert!(r.start < r.end && r.end <= self.len());
+        let lo = self.start[r.start] as usize;
+        let hi = self.start[r.end] as usize;
+        &self.text[lo..hi]
+    }


wenn ich richtig sehe wird das hier nur in text genutzt, vllt. die logik dann da rein packen statt einer indirection?

refactor: parser

259d13b

psteinroe changed the title ~~refactor: parser~~ refactor: lexer Jul 1, 2025

psteinroe mentioned this pull request Jul 1, 2025

fix(lexer): handle single \r #386

Closed

psteinroe added 11 commits July 3, 2025 08:03

progress

fb1594c

progress

2cdc659

progress

86e83c8

progress

8794776

progress

1676f3b

progress

0ce7203

progress

5b3322e

progress

8c0678a

progress

7b49632

progress

d33d56a

progress

8f8c311

psteinroe requested a review from juleswritescode July 4, 2025 16:00

psteinroe marked this pull request as ready for review July 4, 2025 16:00

psteinroe added 2 commits July 4, 2025 18:30

progress

317472c

progress

5efa9f7

psteinroe changed the title ~~refactor: lexer~~ refactor: custom lexer Jul 4, 2025

juleswritescode reviewed Jul 5, 2025

View reviewed changes

juleswritescode approved these changes Jul 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor: custom lexer #437

refactor: custom lexer #437

Uh oh!

psteinroe commented Jul 1, 2025 •

edited

Loading

Uh oh!

juleswritescode Jul 5, 2025

Uh oh!

juleswritescode Jul 5, 2025

Uh oh!

juleswritescode Jul 5, 2025

Uh oh!

juleswritescode Jul 5, 2025

Uh oh!

juleswritescode Jul 5, 2025

Uh oh!

juleswritescode Jul 5, 2025

Uh oh!

Uh oh!

refactor: custom lexer #437

Are you sure you want to change the base?

refactor: custom lexer #437

Uh oh!

Conversation

psteinroe commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

juleswritescode Jul 5, 2025

Choose a reason for hiding this comment

Uh oh!

juleswritescode Jul 5, 2025

Choose a reason for hiding this comment

Uh oh!

juleswritescode Jul 5, 2025

Choose a reason for hiding this comment

Uh oh!

juleswritescode Jul 5, 2025

Choose a reason for hiding this comment

Uh oh!

juleswritescode Jul 5, 2025

Choose a reason for hiding this comment

Uh oh!

juleswritescode Jul 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

psteinroe commented Jul 1, 2025 •

edited

Loading