Skip to content

Compiler incorrectly uses UTF-8 length instead of ASCII for ascii() constant evaluation #3449

@Gusarich

Description

@Gusarich

The constant evaluator for the ascii() built-in function incorrectly interprets strings with \xHH escapes as UTF-8, causing it to count certain bytes (e.g., \xFF) as multiple bytes instead of a single byte. This misinterpretation leads to erroneous "ascii string is too long" compile-time errors for valid ASCII strings within the 32-byte limit.

Minimal Example:

const SeventeenFFs: String = "\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF"; // 17 bytes
const AsciiValue17: Int = ascii(SeventeenFFs); // Should compile successfully

contract TestAsciiLengthBug {
    val: Int;
    init() {
        self.val = AsciiValue17;
    }
}

Compiler Output:

Error: Cannot evaluate expression to a constant: ascii string is too long, expected up to 32 bytes, got 34

Expected Behavior:
Each \xFF escape sequence must be counted as exactly one byte (ASCII), resulting in a total length of 17 bytes, which is within the 32-byte limit.

Explanation:
The current implementation incorrectly calculates byte length using UTF-8 encoding, turning each non-ASCII codepoint (\xFF) into multiple bytes, thereby exceeding the allowed limit. The correct behavior is to treat each \xHH escape as exactly one byte, adhering to ASCII semantics.


LLM Fuzzing discovery (see #2490)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions