-
Couldn't load subscription status.
- Fork 641
main: using regex for choosing a parser for given file name #4270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
a5a3a28 to
c50f467
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #4270 +/- ##
==========================================
+ Coverage 85.89% 85.90% +0.01%
==========================================
Files 250 251 +1
Lines 62341 62510 +169
==========================================
+ Hits 53545 53699 +154
- Misses 8796 8811 +15 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR extends the --map-<LANG> option to support regular expression matching for file names, addressing limitations where glob patterns and extension matching are insufficient for generic file names. The implementation adds a new regex-based mapping type alongside existing extension and pattern mappings.
Key Changes:
- Introduced regex pattern support using
%regex%[i]syntax for language file mappings - Added new
rexprcodemodule to handle regex compilation and matching - Extended optlib2c to generate C code from regex mapping definitions
Reviewed Changes
Copilot reviewed 16 out of 17 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| main/rexprcode.c | New module implementing regex pattern compilation, matching, and encoding |
| main/rexprcode_p.h | Public interface for regex code operations |
| main/parse.c | Core integration of regex matching into language detection logic |
| main/parse.h | Added rExprSrc structure definition and REXPR_LAST_ENTRY macro |
| main/parse_p.h | Extended langmapType enum with LMAP_REXPR flag |
| main/options.c | Command-line option parsing for regex patterns with icase flag support |
| optlib/rpmMacros.ctags | Example usage replacing commented-out patterns with regex |
| optlib/rpmMacros.c | Generated C code with regex mapping definitions |
| misc/optlib2c | Extended Perl script to parse and generate regex mapping code |
| source.mak | Build system updates for new source files |
| win32/ctags_vs2013.vcxproj | Visual Studio project file updates |
| win32/ctags_vs2013.vcxproj.filters | Visual Studio filter file updates |
| Tmain/list-map-rexprs.d/* | Test cases for new --list-map-rexprs option |
| Tmain/versioning.d/stdout-expected.txt | Updated test output expectations |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
misc/optlib2c
Outdated
| unless ($_[0]->{'langdef'} eq $1); | ||
| my $spec = $2; | ||
| if ($spec =~ /\((.*)\)/) { | ||
| if ($spec =~ /%(.+)%(i)?/) { |
Copilot
AI
Oct 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regex pattern %(.+)% is greedy and will match incorrectly if the expression contains '%' characters, even with escaping. For example, %a%b%c% would capture a%b%c instead of a. The pattern should be non-greedy: %(.+?)%(i)?/ or better yet, should properly handle escaped '%' characters in the capture group.
| if ($spec =~ /%(.+)%(i)?/) { | |
| if ($spec =~ /%(.+?)%(i)?/) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 16 out of 17 changed files in this pull request and generated 1 comment.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
misc/optlib2c
Outdated
| unless ($_[0]->{'langdef'} eq $1); | ||
| my $spec = $2; | ||
| if ($spec =~ /\((.*)\)/) { | ||
| if ($spec =~ /%(.+)%(i)?/) { |
Copilot
AI
Oct 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regex should use a non-greedy quantifier .+? instead of .+ to prevent matching across multiple patterns when there are multiple % characters in the input. This could cause incorrect parsing of escaped % characters.
| if ($spec =~ /%(.+)%(i)?/) { | |
| if ($spec =~ /%(.+?)%(i)?/) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 16 out of 17 changed files in this pull request and generated 5 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 16 out of 17 changed files in this pull request and generated 3 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
misc/optlib2c
Outdated
| if ($spec =~ /%(.+?)%(i|\{icase\})?/) { | ||
| my $rexpr = { expr => $1, | ||
| iCase => (defined $2 && ($2 eq 'i' || $2 eq 'icase'))? 1: 0 }; |
Copilot
AI
Oct 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regex allows {icase} as an alternative to i, but this syntax is not documented in the PR description or help text. Either document this alternative syntax or remove it to avoid confusion.
| if ($spec =~ /%(.+?)%(i|\{icase\})?/) { | |
| my $rexpr = { expr => $1, | |
| iCase => (defined $2 && ($2 eq 'i' || $2 eq 'icase'))? 1: 0 }; | |
| if ($spec =~ /%(.+?)%(i)?/) { | |
| my $rexpr = { expr => $1, | |
| iCase => (defined $2 && $2 eq 'i')? 1: 0 }; |
|
|
||
| static flagDefinition langmapRexprFlagDef[] = { | ||
| { 'i', "icase", langmap_rexpr_icase_short, langmap_rexpr_icase_long, | ||
| NULL, "applied in a case-insensitive manner"}, |
Copilot
AI
Oct 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The long flag name is 'icase', but the optlib2c script also accepts '{icase}' syntax (line 296 in misc/optlib2c). These should be consistent, or the alternative syntax should be documented.
| NULL, "applied in a case-insensitive manner"}, | |
| NULL, "applied in a case-insensitive manner (accepts both 'icase' and '{icase}' syntax)"}, |
231a606 to
f865e33
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 16 out of 17 changed files in this pull request and generated 2 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 21 out of 23 changed files in this pull request and generated no new comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 22 out of 24 changed files in this pull request and generated no new comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 30 out of 32 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
man/ctags.1.rst.in
Outdated
|
|
||
| If you run @CTAGS_NAME_EXECUTABLE@ with ``@CTAGS_NAME_EXECUTABLE@ -R src``, | ||
| the match is performed with ``src/lib/data.c`` and ``src/lib/logic.c`` If you | ||
| give ``--langmap='YourParser:%src/lib/.*\.c%'``, @CTAGS_NAME_EXECUTABLE@ |
Copilot
AI
Oct 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected option name from '--langmap' to '--map-YourParser'.
| give ``--langmap='YourParser:%src/lib/.*\.c%'``, @CTAGS_NAME_EXECUTABLE@ | |
| give ``--map-YourParser='%src/lib/.*\.c%'``, @CTAGS_NAME_EXECUTABLE@ |
docs/man/ctags.1.rst
Outdated
| If you run ctags with ``ctags -R src``, | ||
| the match is performed with ``src/lib/data.c`` and ``src/lib/logic.c`` If you | ||
| give ``--langmap='YourParser:%src/lib/.*\.c%'``, ctags |
Copilot
AI
Oct 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected option name from '--langmap' to '--map-YourParser'.
| give ``--langmap='YourParser:%src/lib/.*\.c%'``, ctags | |
| give ``--map-YourParser='%src/lib/.*\.c%'``, ctags |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 30 out of 32 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
5e33670 to
2242572
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 35 out of 37 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 35 out of 37 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 35 out of 37 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Masatake YAMATO <[email protected]>
Signed-off-by: Masatake YAMATO <[email protected]>
…nsExtensionNew Delete baseFilenameSansExtensionNew() from the source tree. Signed-off-by: Masatake YAMATO <[email protected]>
The original code used a boolean value to toggle how filenames were mapped to the parser by glob-like pattern or by extension. To support the third way mapping a file name to a parser, by regular expression pattern, we will use an enum value instead of Boolean. Signed-off-by: Masatake YAMATO <[email protected]>
Signed-off-by: Masatake YAMATO <[email protected]>
This change extends --map-<LANG> option to support regular expression matching with the full file name. The original --map-<LANG> option supports the glob based matching and the extension comparison with the file basename. However, two methods are not enough if the file names are too generic. See universal-ctags#3287 . The regular expression passed to --map-<LANG> must be surrounded by % character like --map-RpmMacros='%(.*/)?macros\.d/macros\.([^/]+)$%' If you want to match in a case-insensitive way, append `i' after the second % like --map-RpmMacros='%(.*/)?macros\.d/macros\.([^/]+)$%i' If you want to use % as part of an expression, put \ before % for escaping. Signed-off-by: Masatake YAMATO <[email protected]>
…options Signed-off-by: Masatake YAMATO <[email protected]>
Signed-off-by: Masatake YAMATO <[email protected]>
Signed-off-by: Masatake YAMATO <[email protected]>
Signed-off-by: Masatake YAMATO <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 30 out of 32 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
man/ctags.1.rst.in
Outdated
|
|
||
| ``--list-maps[=(<language>|all)]`` | ||
| Lists the file name patterns, the file extensions, and the relative-path | ||
| regular extensions which associate a file name with a language for either the |
Copilot
AI
Oct 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The term 'regular extensions' on line 1302 should be 'regular expressions' to match terminology used elsewhere in the documentation.
| regular extensions which associate a file name with a language for either the | |
| regular expressions which associate a file name with a language for either the |
docs/man/ctags.1.rst
Outdated
|
|
||
| ``--list-maps[=(<language>|all)]`` | ||
| Lists the file name patterns, the file extensions, and the relative-path | ||
| regular extensions which associate a file name with a language for either the |
Copilot
AI
Oct 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The term 'regular extensions' on line 1302 should be 'regular expressions' to match terminology used elsewhere in the documentation.
| regular extensions which associate a file name with a language for either the | |
| regular expressions which associate a file name with a language for either the |
… parser Signed-off-by: Masatake YAMATO <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 30 out of 32 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
main: using regex for choosing a parser for the given file name
This change extends --map- option to support regular
expression matching with the full file name.
The original --map- option supports glob based matching
and extension comparison with the file basename.
However, two methods are not enough if the file names are too
generic. See #3287 .
The regular expression passed to --map- must be surround
by % character like
--map-RpmMacros='%(.*/)?macros.d/macros.([^/]+)$%'
If you want to match in a case-insensitive way, append `i' after the second % like
--map-RpmMacros='%(.*/)?macros.d/macros.([^/]+)$%i'
If you want to use % as part of an expression, put \ before % for escaping.
TODO: