[RESP] Optimize command parsing with SIMD fast path and O(1) hash table lookup by badrishc · Pull Request #1658 · microsoft/garnet

badrishc · 2026-04-01T01:09:40Z

Description of Change

Replaces the two legacy command parsing methods (FastParseArrayCommand ~950 lines of nested switch/if chains, SlowParseCommand ~815 lines of sequential SequenceEqual comparisons) with a tiered architecture:

SIMD Vector128 fast path — matches ~18 hot commands (GET, SET, DEL, PING, INCR, etc.) by comparing the full RESP encoding in a single 16-byte vector comparison. Cost: 3 ops per candidate.
MRU cache — 2-slot per-session cache catches repeated commands (HSET, LPUSH, ZADD) at the same 3-op cost as SIMD.
Scalar ulong fast path — handles hot commands too long for SIMD (PUBLISH, SETRANGE, GETRANGE) and variable-arg commands (EXPIRE, SETEXNX, GETEX, PEXPIRE).
CRC32 hash table (RespCommandHashLookup) — O(1) lookup for all ~~400 built-in commands. 512-entry table (~~16KB, L1-resident) with linear probing and per-parent subcommand tables.

Also unifies BITOP subcommand handling, adds debug assertions, hardens the hash table against edge cases (empty names, oversized names), and adds ValidatePrimaryTable() for startup integrity checks.

Key files:

RespCommand.cs — Refactored FastParseCommand (SIMD + scalar + MRU), removed FastParseArrayCommand and SlowParseCommand
RespCommandHashLookup.cs — New: Hash table engine (lookup, insert, validate, subcommand dispatch)
RespCommandHashLookupData.cs — New: Command registration (PopulatePrimaryTable, subcommand arrays)
RespCommandSimdPatterns.cs — New: SIMD Vector128 patterns and RespPattern() helper
CommandParsingBenchmark.cs — New: BDN benchmark covering all parser tiers

CommandParsingBenchmark Results (Params=None, batch of 100):

Command	Tier	dev (μs)	PR (μs)	Delta
ParsePING	SIMD	1.576	1.427	−9.5%
ParseGET	SIMD	2.145	1.847	−13.9%
ParseSET	SIMD	2.823	2.453	−13.1%
ParseINCR	SIMD	2.152	2.034	−5.5%
ParseEXISTS	SIMD	8.891	2.319	−73.9%
ParseSETEX	SIMD	3.376	3.469	+2.8%
ParsePUBLISH	Scalar	—	3.469	new
ParseEXPIRE	Scalar	3.076	3.505	+13.9%
ParseHSET	Hash+MRU	10.035	3.867	−61.5%
ParseLPUSH	Hash+MRU	8.981	3.176	−64.6%
ParseZADD	Hash+MRU	9.599	3.843	−60.0%
ParseZRANGEBYSCORE	Hash	12.455	8.539	−31.4%
ParseZREMRANGEBYSCORE	Hash	13.983	9.289	−33.6%
ParseHINCRBYFLOAT	Hash	11.104	9.311	−16.1%
ParseSUBSCRIBE	Hash	9.649	7.196	−25.4%
ParseGEORADIUS	Hash	20.397	10.478	−48.6%
ParseSETIFMATCH	Hash	28.701	8.230	−71.3%

ParseCommand [AggressiveInlining]
│
├── FastParseCommand [AggressiveInlining]
│   │
│   │   // On SIMD hardware (Vector128.IsHardwareAccelerated && remainingBytes >= 16):
│   ├── SIMD pattern matching
│   │   Matches ~18 hot commands by comparing the full RESP encoding
│   │   (*N\r\n$L\r\nCMD\r\n) as a single 16-byte vector.
│   │   Cost: 1 load + 1 AND + 1 EqualsAll per candidate.
│   │   Hit → return immediately (readHead advanced past command header + name).
│   │
│   ├── MRU cache (2-slot, Vector128-based)
│   │   Catches repeated commands not in the SIMD table (e.g., HSET, LPUSH, ZADD).
│   │   Populated by ParseCommand after ArrayParseCommand resolves a command.
│   │   Slot 1 promoted to slot 0 on hit (LRU).
│   │   Hit → return immediately.
│   │
│   │   // Always (SIMD or not), if buffer starts with *N\r\n$L\r\n:
│   ├── Scalar path (ulong comparisons)
│   │   Three sections:
│   │   (1) Same fixed-arg hot commands as SIMD — fallback when remainingBytes < 16
│   │   (2) Hot commands too long for SIMD (name > 6 chars: PUBLISH, SETRANGE, etc.)
│   │   (3) Hot variable-arg commands (SETEXNX, GETEX, EXPIRE, PEXPIRE)
│   │   Hit → return immediately.
│   │
│   │   // If buffer does NOT start with *N\r\n$L\r\n:
│   └── Inline command check (FastParseInlineCommand)
│       Matches PING\r\n and QUIT\r\n (no array framing).
│       Hit → return immediately.
│
│   // FastParseCommand returned NONE — command not matched by any fast path
│
├── ArrayParseCommand [NoInlining]
│   │
│   ├── MakeUpperCase + retry FastParseCommand
│   │   If the command was lowercase, uppercases in-place and retries
│   │   FastParseCommand. Catches lowercase get, set, ping, etc.
│   │   Hit → return immediately.
│   │
│   ├── Parse RESP array header (*N\r\n)
│   │   Reads the array length. If buffer doesn't start with '*',
│   │   treats as malformed inline command and returns INVALID.
│   │
│   └── HashLookupCommand [NoInlining]
│       │
│       ├── Extract and uppercase command name ($len\r\nNAME\r\n)
│       │
│       ├── Hash table lookup (RespCommandHashLookup.Lookup)
│       │   CRC32 hash, 512-entry table (~16KB, L1-resident), linear probing.
│       │   Covers all ~400 built-in commands. O(1).
│       │
│       ├── If hash miss → TryParseCustomCommand
│       │   Runtime-registered commands (CustomTxn, CustomProcedure, etc.).
│       │   If miss → return INVALID.
│       │
│       └── If has subcommands → HandleSubcommandLookup [NoInlining]
│           Extract and uppercase subcommand name, look up in per-parent
│           hash table (CLUSTER, CLIENT, ACL, CONFIG, COMMAND, BITOP, etc.).
│           If miss → return INVALID with command-specific error message.
│
├── Update MRU cache (on SIMD hardware, if ArrayParseCommand resolved a
│   non-custom command — captures the 16-byte RESP pattern so
│   FastParseCommand's MRU check matches it on subsequent calls)
│
├── Parse arguments (parseState.Read for each remaining token)
│
└── Return command + argument count

Add RespCommandHashLookup: a cache-friendly O(1) hash table for RESP command name resolution. Uses hardware CRC32 (SSE4.2/ARM) for hashing, 32-byte cache-line-aligned entries, and linear probing within L1 cache. Key changes: - New RespCommandHashLookup.cs: static hash table (512 entries, 16KB) mapping uppercase command name bytes to RespCommand enum values - Per-parent subcommand hash tables for CLUSTER, CONFIG, CLIENT, ACL, COMMAND, SCRIPT, LATENCY, SLOWLOG, MODULE, PUBSUB, MEMORY, BITOP - ArrayParseCommand now uses hash lookup for primary commands instead of the ~950-line FastParseArrayCommand nested switch/if-else chains - BITOP pseudo-subcommands (AND/OR/XOR/NOT/DIFF) handled inline via dedicated ParseBitopSubcommand method with hash-based subcommand lookup - Subcommand dispatch (CLUSTER, CONFIG, etc.) falls through to existing SlowParseCommand for full backward compatibility - FastParseCommand hot path (GET, SET, PING, DEL) is completely untouched Performance: O(1) hash lookup (~10-12 cycles) replaces O(n) sequential comparisons (~30-300 cycles) for the long tail of ~170+ commands. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add three optimization tiers to RESP command parsing: Tier 1 - SIMD Vector128 FastParseCommand: - 30 static Vector128<byte> patterns matching full RESP encoding (*N\r\n$L\r\nCMD\r\n) - Single 16-byte load + masked comparison validates header + command in one op - Covers top commands: GET, SET, DEL, TTL, PING, INCR, DECR, EXISTS, etc. - Falls through to existing scalar ulong switch for variable-arg commands Tier 2 - CRC32 hash table (RespCommandHashLookup): - 512-entry cache-line-aligned table (16KB, L1-resident) with hardware CRC32 hash - O(1) lookup for ~200 primary commands + 12 subcommand tables - Replaces ~950-line FastParseArrayCommand nested switch/if-else chains - BITOP pseudo-subcommands handled via dedicated ParseBitopSubcommand Tier 3 - SlowParseCommand (existing): - Subcommand dispatch for admin commands (CLUSTER, CONFIG, ACL, etc.) Additional optimizations: - HashLookupCommand uses GetCommand instead of GetUpperCaseCommand (MakeUpperCase already uppercased the buffer, avoiding redundant work) - TryParseCustomCommand moved after hash lookup (built-in commands are far more common than custom extensions) - FastParseCommand hot path preserved as scalar fallback for edge cases Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…add PING to parsing benchmark

Benchmarks ParseRespCommandBuffer directly to measure pure parsing throughput. Commands categorized by their position in the OLD parser: - Tier 1a SIMD: PING, GET, SET, INCR, EXISTS - Tier 1b Scalar: SETEX, EXPIRE - FastParseArrayCommand top: HSET, LPUSH, ZADD - FastParseArrayCommand deep: ZRANGEBYSCORE, ZREMRANGEBYSCORE, HINCRBYFLOAT - SlowParseCommand: SUBSCRIBE, GEORADIUS, SETIFMATCH Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

2-entry MRU cache sits after SIMD patterns but before scalar switch in FastParseCommand. Caches the last 2 matched command patterns as Vector128 + mask, enabling 3-op cache hits for repeated Tier 1b/2 commands (HSET, LPUSH, ZADD etc.) that would otherwise fall through to the scalar switch or hash table. Cache is populated on successful ArrayParseCommand resolution and excludes: synthetic ParseRespCommandBuffer calls (ACL checks), subcommand results, and custom commands. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…to badrishc/fast-parses

Eliminate SlowParseCommand for all subcommand routing. HandleSubcommandLookup now uses per-parent hash tables for CLUSTER, CLIENT, ACL, CONFIG, COMMAND, SCRIPT, LATENCY, SLOWLOG, MODULE, PUBSUB, MEMORY subcommands. Key fixes: - Fix CLUSTER SET-CONFIG-EPOCH hash entry (was SETCONFIGEPOCH, missing hyphens) - Handle edge cases: COMMAND with 0 args, case-insensitive GETKEYS/USAGE - Error message formatting: GenericErrUnknownSubCommand for CLUSTER/LATENCY, GenericErrUnknownSubCommandNoHelp for others - Remove writeErrorOnFailure guard from MRU cache (unnecessary) - Use consumedBytes (readHead - cmdStartOffset) for cache entry sizing Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…le parsing Replace references to FastParseArrayCommand/SlowParseCommand with hash table instructions. New commands now just need one Add() call in PopulatePrimaryTable(). Document subcommand table wiring and warn about wire-protocol spelling (hyphens etc.). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…fixes - Add Debug.Assert for command name length/positivity in hash table ops - Add startup ValidateSubTable: verifies every subcommand entry round-trips correctly through the hash table (catches typos like SET-CONFIG-EPOCH) - Clean up InsertIntoTable: remove redundant double-assignment of NameWord1/2, add explicit zero-init and clear comments on word layout contract - Fix comment in HashLookupCommand: document that MakeUpperCase only uppercases the first token, subcommands need GetUpperCaseCommand - Add comment documenting MRU cache zero-initialization safety Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

This PR upgrades Garnet’s RESP command parsing pipeline by introducing SIMD-accelerated matching for the hottest commands, plus a cache-friendly hash table (with subcommand tables) to replace the previous deep switch/linear-scan parsing logic. It also adds benchmark coverage and updates contributor documentation to reflect the new recommended parsing extension points.

Changes:

Added RespCommandHashLookup (primary + subcommand hash tables) and integrated it into ArrayParseCommand.
Reworked FastParseCommand to add SIMD Vector128 pattern matching and a per-session 2-slot MRU cache.
Added a dedicated BenchmarkDotNet benchmark for parser-only throughput and updated docs/guides for adding commands.

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
website/docs/dev/fast-parsing-plan.md	Adds a detailed parsing optimization/design document.
libs/server/Resp/Parser/RespCommandHashLookup.cs	New static hash-table-based command/subcommand lookup implementation.
libs/server/Resp/Parser/RespCommand.cs	Integrates SIMD fast path + MRU cache + hash lookup parsing; removes legacy slow parsing paths.
benchmark/BDN.benchmark/Operations/CommandParsingBenchmark.cs	Adds parsing-only microbenchmarks across tiers.
.github/skills/add-garnet-command/SKILL.md	Updates contributor guidance to use the new hash lookup path.
.github/copilot-instructions.md	Updates “add parsing logic” instructions to reference the new hash lookup table.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…to badrishc/fast-parses

Add new commands from dev to RespCommandHashLookup: - Vector Set: VADD, VCARD, VDIM, VEMB, VGETATTR, VINFO, VISMEMBER, VLINKS, VRANDMEMBER, VREM, VSETATTR, VSIM - Range Index (dot-prefixed wire names): RI.CREATE, RI.SET, RI.GET, RI.DEL, RI.RANGE, RI.SCAN, RI.EXISTS, RI.CONFIG, RI.METRICS - String: SETWITHETAG - Cluster subcommands: rename AOFSYNC -> ADVANCE_TIME, add MLOG_KEY_TIME, RESERVE - Internal-only RIPROMOTE, RIRESTORE never come from wire (no hash entry needed)

Keeps FastParseCommand small enough for AggressiveInlining to take effect, reducing call overhead on the common short-buffer scalar path.

kevin-montrose · 2026-05-01T19:55:51Z

+        private static readonly CommandEntry[] bitopSubTable;
+        private static readonly int bitopSubTableMask;
+
+        static RespCommandHashLookup()


Static initializers can cause the JIT to inject checks around static methods, since this is so hot let's use a module initializer instead.

Small example of code check diff:

WithStaticInit.Bar:

L0000 push rbx L0001 sub rsp, 0x20 L0005 mov ebx, ecx L0007 mov rcx, 0x7ffe7cccccc8 L0011 mov edx, 2 L0016 call 0x00007ffedade5e20 L001b mov rcx, 0x7ffe7cccccc8 L0025 mov edx, 2 L002a call 0x00007ffedae04ef0 L002f mov eax, [rax+8] L0032 cdq L0033 idiv ebx L0035 mov eax, edx L0037 add rsp, 0x20 L003b pop rbx L003c ret

WithModuleInit.Bar

L0000 push rbx L0001 sub rsp, 0x20 L0005 mov ebx, ecx L0007 mov rcx, 0x7ffe7cccccc8 L0011 mov edx, 3 L0016 call 0x00007ffedae04ef0 L001b mov eax, [rax+8] L001e cdq L001f idiv ebx L0021 mov eax, edx L0023 add rsp, 0x20 L0027 pop rbx L0028 ret

.NET 8, x64

kevin-montrose · 2026-05-01T19:58:42Z

@@ -0,0 +1,503 @@
+// Copyright (c) Microsoft Corporation.


nit: there's a mix of explicit local types and var in here - let's prefer var everywhere in this change.

kevin-montrose · 2026-05-01T19:59:53Z

+        // SIMD Vector128 patterns for FastParseCommand.
+        // Each encodes the full RESP header + command: *N\r\n$L\r\nCMD\r\n
+        // Masks zero out trailing bytes for patterns shorter than 16 bytes.
+        private static readonly Vector128<byte> s_mask13 = Vector128.Create(


These three .AsByte() calls are superfluous.

badrishc and others added 14 commits March 25, 2026 17:33

plan

2d08b9d

update

1a24e00

Merge remote, resolve conflicts keeping optimized HashLookupCommand; …

2d4e4b2

…add PING to parsing benchmark

update

c38d1e6

update

80875e0

Merge branch 'badrishc/fast-parses' of github.com:microsoft/garnet in…

b4415db

…to badrishc/fast-parses

updates

4ead4a0

Copilot AI review requested due to automatic review settings April 1, 2026 01:09

Copilot started reviewing on behalf of badrishc April 1, 2026 01:10 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

badrishc added 6 commits April 1, 2026 09:26

code review updates

56d8f3b

Merge branch 'dev' into badrishc/fast-parses

b813807

nits

0b119f1

Merge branch 'badrishc/fast-parses' of github.com:microsoft/garnet in…

52c5474

…to badrishc/fast-parses

updates

2c2661c

Merge remote-tracking branch 'origin/dev' into badrishc/fast-parses

507f5a0

badrishc changed the title ~~Improve parser~~ [RESP] Optimize command parsing with SIMD fast path and O(1) hash table lookup Apr 1, 2026

badrishc and others added 6 commits April 1, 2026 14:34

nits

b868080

fixes

b95ac6a

fix flaky test

f6845a4

small cleanup

5db102b

Merge branch 'dev' into badrishc/fast-parses

60bcf4a

badrishc added 2 commits April 30, 2026 12:54

Refactor: extract SIMD path to NoInlining SimdFastParse helper

65e50e1

Keeps FastParseCommand small enough for AggressiveInlining to take effect, reducing call overhead on the common short-buffer scalar path.

Fix PING regression: remove Vector128 gate on scalar fast path

c3d6bd3

kevin-montrose self-requested a review May 1, 2026 19:26

kevin-montrose requested changes May 1, 2026

View reviewed changes

Merge remote-tracking branch 'origin/dev' into badrishc/fast-parses

a04665b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RESP] Optimize command parsing with SIMD fast path and O(1) hash table lookup#1658

[RESP] Optimize command parsing with SIMD fast path and O(1) hash table lookup#1658
badrishc wants to merge 29 commits intodevfrom
badrishc/fast-parses

badrishc commented Apr 1, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kevin-montrose May 1, 2026

Uh oh!

kevin-montrose May 1, 2026

Uh oh!

kevin-montrose May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

badrishc commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of Change

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kevin-montrose May 1, 2026

Choose a reason for hiding this comment

Uh oh!

kevin-montrose May 1, 2026

Choose a reason for hiding this comment

Uh oh!

kevin-montrose May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

badrishc commented Apr 1, 2026 •

edited

Loading