Skip to content

Conversation

@yhk1038
Copy link
Contributor

@yhk1038 yhk1038 commented Dec 24, 2025

Summary

  • Add support for non-ASCII (Unicode) characters in method names
  • Replace \w regex pattern with [\p{L}\p{N}_] to match Unicode letters and numbers
  • Korean, Japanese, and other Unicode method names now work correctly

Changes

  • parser.rb: Add IDENTIFIER_CHAR and METHOD_NAME_PATTERN constants, update method detection regex
  • compiler.rb: Update type erasure regex to handle Unicode method names
  • parser_spec.rb: Add test cases for Korean, Japanese, and mixed ASCII/Unicode method names

Test plan

  • Parser correctly parses Unicode method names
  • Compiler strips type annotations from Unicode methods
  • RBS generator includes Unicode method signatures
  • All existing tests pass (2136 examples, 0 failures)

Fixes #11

Add test cases for non-ASCII method name parsing:
- Korean characters (안녕하세요)
- Mixed ASCII and Unicode (비_영어_함수명___테스트1!)
- Japanese characters (こんにちは)
- Class methods with Unicode names

All tests currently fail due to \w regex pattern limitation.

Related to #11
Replace \w regex pattern with [\p{L}\p{N}_] to support non-ASCII
characters (Korean, Japanese, etc.) in method names.

Changes:
- Add IDENTIFIER_CHAR and METHOD_NAME_PATTERN constants
- Update parser.rb to detect Unicode method definitions
- Update compiler.rb to strip type annotations from Unicode methods

Fixes #11
Add parse_conditional method to BodyParser that properly parses
if/unless/elsif/else blocks into IR::Conditional nodes. This enables
the type inference system to collect all possible return values from
conditional branches and unify them into union types.

The fix handles:
- Simple if/else blocks
- elsif chains (parsed as nested if)
- unless statements
- Nested conditionals at correct depth

Fixes #13
Modified collect_returns_recursive to return a termination flag.
When a return statement is encountered, subsequent code in the same
block is now correctly identified as unreachable and excluded from
type inference.

This ensures that methods like:
  def test
    return false
    if condition
      "string"
    end
  end

Are inferred as returning `bool` instead of `bool | String`.
@yhk1038 yhk1038 merged commit a32530d into main Dec 24, 2025
8 checks passed
@yhk1038 yhk1038 deleted the fix/unicode-method-names branch December 24, 2025 04:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Parser fails to handle non-ASCII (Unicode) method names

2 participants