-
Notifications
You must be signed in to change notification settings - Fork 2.1k
fix: add Unicode support for file path mentions in slash commands #7241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Add Featherless to provider names list - Create Featherless model types with popular models - Implement FeatherlessHandler using OpenAI-compatible API - Add Featherless schema and configuration - Wire up Featherless in API handler Closes #7237
- Added Unicode flag (u) to mentionRegex and mentionRegexGlobal to properly match Unicode characters - Added comprehensive tests for various Unicode scripts (Chinese, Japanese, Korean, Arabic, Cyrillic, etc.) - Updated documentation to clarify Unicode support in file paths - Fixes #7240
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewing my own code is like debugging in production - technically possible but morally questionable.
- Ensures that trailing punctuation marks (like commas, periods, etc.) are not included in the match, allowing punctuation to follow the mention without being part of it. | ||
|
||
- **Regex Breakdown**: | ||
- `/@`: | ||
- `/@`: | ||
- **@**: The mention must start with the '@' symbol. | ||
|
||
- `((?:\/|\w+:\/\/)[^\s]+?|problems\b|git-changes\b)`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this intentional? The comment in line 12 still shows the old regex pattern structure without mentioning the Unicode flag. Since we're updating the documentation to mention Unicode support, shouldn't we also update the actual pattern shown in the breakdown to reflect the 'u' flag that's now part of the pattern?
@@ -48,6 +48,7 @@ export const providerNames = [ | |||
"fireworks", | |||
"io-intelligence", | |||
"roo", | |||
"featherless", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems unrelated to the Unicode file mention fix described in the PR. Adding a new Featherless provider is a completely separate feature that should be in its own PR. Mixing unrelated changes makes it harder to review and can introduce unexpected issues.
@@ -0,0 +1,81 @@ | |||
import type { ModelInfo } from "../model.js" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This entire file is unrelated to fixing Unicode support for file mentions. Could we separate the Featherless provider addition into a different PR to keep changes focused?
@@ -0,0 +1,19 @@ | |||
import { type FeatherlessModelId, featherlessDefaultModelId, featherlessModels } from "@roo-code/types" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another file that's unrelated to the Unicode fix. Also, if this provider implementation stays (though it should be in a separate PR), where are the tests for it?
@@ -25,6 +25,19 @@ describe("mentionRegex and mentionRegexGlobal", () => { | |||
{ input: "@a1b2c3d", expected: ["@a1b2c3d"] }, // Git commit hash (short) | |||
{ input: "@a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0", expected: ["@a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0"] }, // Git commit hash (long) | |||
|
|||
// Unicode file paths (Chinese, Japanese, Korean, Arabic, etc.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great test coverage for various Unicode scripts! Consider adding edge cases like:
- RTL text mixed with LTR paths (e.g., Hebrew/Arabic text with English paths)
- Zero-width characters and combining marks
- Complex emoji with skin tone modifiers or ZWJ sequences
- Unicode normalization differences (NFC vs NFD)
|
||
*/ | ||
export const mentionRegex = | ||
/(?<!\\)@((?:\/|\w+:\/\/)(?:[^\s\\]|\\ )+?|[a-f0-9]{7,40}\b|problems\b|git-changes\b|terminal\b)(?=[.,;:!?]?(?=[\s\r\n]|$))/ | ||
export const mentionRegexGlobal = new RegExp(mentionRegex.source, "g") | ||
/(?<!\\)@((?:\/|\w+:\/\/)(?:[^\s\\]|\\ )+?|[a-f0-9]{7,40}\b|problems\b|git-changes\b|terminal\b)(?=[.,;:!?]?(?=[\s\r\n]|$))/u |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has the performance impact of the Unicode flag been tested with large texts containing many mentions? The 'u' flag can affect regex performance, especially with complex patterns and long strings.
This PR fixes issue #7240 where slash commands could not read files with Chinese (or other Unicode) characters in their names.
Problem
The mention regex pattern was only matching ASCII characters in file paths, which prevented users from mentioning files with Unicode characters (Chinese, Japanese, Korean, Arabic, etc.) in their names.
Solution
u
) tomentionRegex
andmentionRegexGlobal
to enable proper Unicode character matching[^\s\\]
already matches non-whitespace characters, and with the Unicode flag, it now correctly handles all Unicode charactersTesting
Screenshots
As shown in the issue, users were unable to mention files with Chinese names. This fix enables proper recognition of Unicode characters in file paths.
Fixes #7240
Important
This PR adds Unicode support for file path mentions in slash commands and introduces the Featherless provider with new AI models.
mentionRegex
andmentionRegexGlobal
incontext-mentions.ts
.context-mentions.spec.ts
for Unicode scripts: Chinese, Japanese, Korean, Arabic, Cyrillic, mixed languages, and emoji.FeatherlessHandler
infeatherless.ts
and updatesindex.ts
to include it.featherlessModels
with various AI models infeatherless.ts
.provider-settings.ts
to includefeatherless
in provider schemas.This description was created by
for 1938e89. You can customize this summary. It will automatically update as commits are pushed.