[Breaking change]: BinaryReader.GetString() will return "\uFFFD" on malformed encoded string sequences. #42564

jozkee · 2024-09-10T21:48:07Z

Description

dotnet/runtime#80331 introduced a minor breaking change only affecting malformed encoded payloads.

Prior to .NET 9, a malformed encoded string [0x01, 0xC2] parsed with BinaryReader.ReadString() would return an empty string.

ON .NET 9, it would return "\uFFFD" which is the REPLACEMENT CHARACTER used to replace an unknown, unrecognised, or unrepresentable character. We accepted this change because it only affected malformed payloads and matches Unicode standards.

Version

.NET 9 Preview 7

Previous behavior

var ms = new MemoryStream(new byte[] { 0x01, 0xC2 });
using (var br = new BinaryReader(ms))
{
    string s = br.ReadString();
    Console.WriteLine(s == "\uFFFD"); // false
    Console.WriteLine(s.Length); // 0
}

New behavior

var ms = new MemoryStream(new byte[] { 0x01, 0xC2 });
using (var br = new BinaryReader(ms))
{
    string s = br.ReadString();
    Console.WriteLine(s == "\uFFFD"); // true
    Console.WriteLine(s.Length); // 1
}

Type of breaking change

Binary incompatible: Existing binaries might encounter a breaking change in behavior, such as failure to load or execute, and if so, require recompilation.
Source incompatible: When recompiled using the new SDK or component or to target the new runtime, existing source code might require source changes to compile successfully.
Behavioral change: Existing binaries might behave differently at run time.

Reason for change

Perf improvement affecting a rare scenario.

Recommended action

If you want to keep the previous behavior where incomplete byte sequence were being omitted at the end of the string, you can TrimEnd("\uFFFD") the result.

Feature area

Core .NET libraries

Affected APIs

BinaryReader.ReadString()

Associated WorkItem - 320280

jozkee · 2024-09-10T21:56:19Z

FWIW: This is somewhat undefined behavior and is inconsistent with other decoding APIs in BinaryReader. Using [0xC2], ReadChar() throws EndOfStreamException and ReadChars(1) returns an empty array.

cc @adamsitnik @GrabYourPitchforks @jeffhandley @teo-tsirpanis

jozkee added doc-idea Indicates issues that are suggestions for new topics [org][type][category] breaking-change Indicates a .NET Core breaking change Pri1 High priority, do before Pri2 and Pri3 labels Sep 10, 2024

jozkee assigned gewarren Sep 10, 2024

dotnet-bot added ⌚ Not Triaged Not triaged labels Sep 10, 2024

jozkee mentioned this issue Sep 11, 2024

Fix regression or document breaking change in BinaryReader dotnet/runtime#93500

Closed

dotnetrepoman bot added 🗺️ mapQUEST Only used as a way to mark an issue as updated for quest. RepoMan should instantly remove it. and removed 🗺️ mapQUEST Only used as a way to mark an issue as updated for quest. RepoMan should instantly remove it. labels Oct 1, 2024

gewarren added 🗺️ reQUEST Triggers an issue to be imported into Quest. and removed ⌚ Not Triaged Not triaged labels Oct 1, 2024

sequestor bot added 📌 seQUESTered Identifies that an issue has been imported into Quest. and removed 🗺️ reQUEST Triggers an issue to be imported into Quest. labels Oct 2, 2024

dotnetrepoman bot added ⌚ Not Triaged Not triaged and removed ⌚ Not Triaged Not triaged labels Oct 2, 2024

gewarren mentioned this issue Oct 4, 2024

Two core library breaking changes #42833

Merged

dotnet-policy-service bot added the in-pr This issue will be closed (fixed) by an active pull request. label Oct 4, 2024

gewarren closed this as completed in #42833 Oct 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Breaking change]: BinaryReader.GetString() will return "\uFFFD" on malformed encoded string sequences. #42564

[Breaking change]: BinaryReader.GetString() will return "\uFFFD" on malformed encoded string sequences. #42564

jozkee commented Sep 10, 2024 •

edited by sequestor bot

Loading

jozkee commented Sep 10, 2024

[Breaking change]: BinaryReader.GetString() will return "\uFFFD" on malformed encoded string sequences. #42564

[Breaking change]: BinaryReader.GetString() will return "\uFFFD" on malformed encoded string sequences. #42564

Comments

jozkee commented Sep 10, 2024 • edited by sequestor bot Loading

Description

Version

Previous behavior

New behavior

Type of breaking change

Reason for change

Recommended action

Feature area

Affected APIs

jozkee commented Sep 10, 2024

jozkee commented Sep 10, 2024 •

edited by sequestor bot

Loading