Filter characters before byte conversion #416

vismayku · 2023-08-17T23:26:43Z

This change causes SessionOutputBufferImpl to filter out all characters that
cannot be correctly converted to ISO-8859-1 by simple downcasting to a
byte.

Fix is inspired from: #116
Above mentioned fix was applied to Sync clients only. This request make similar change to async client. Once approved, I will raise similar request for 4.x branch.

ok2c

@vismayku

Please fix the build. Make sure you can run mvn clean verify locally.
Please make symmetric change to the else clause of the same if statement

                    for (int i = 0; i < lineBuffer.length(); i++) {
                        buffer().put((byte) lineBuffer.charAt(i));
                    }

vismayku · 2023-08-18T15:11:33Z

Thank you for the feedback. I will work on it.

vismayku · 2023-08-18T16:11:34Z

Made sure mvn compile clean and verify runs successfully.
Also addressed the change requested.

arturobernalg · 2023-08-18T16:51:44Z

httpcore5/src/main/java/org/apache/hc/core5/http/impl/nio/SessionOutputBufferImpl.java

@@ -171,12 +171,26 @@ public void writeLine(final CharArrayBuffer lineBuffer) throws CharacterCodingEx
                    final int off = buffer().position();
                    final int arrayOffset = buffer().arrayOffset();
                    for (int i = 0; i < len; i++) {
-                        b[arrayOffset + off + i]  = (byte) lineBuffer.charAt(i);
+                        final int c = lineBuffer.charAt(i);
+                        if ((c >= 0x20 && c <= 0x7E) || // Visible ASCII


Is it just me, or does this seem duplicated? Perhaps we can move this to a LangUtils class or somewhere similar to avoid repetition? I've noticed it's used both here and in ByteArrayBuffer.

ok2c

@vismayku I agree with @arturobernalg . There are now three instances of the same logic repeated verbatim. I propose the common logic be extracted and moved to TextUtils. Please mark the new method @Internal. It should not be considered a part of the public APIs

vismayku · 2023-08-21T14:58:49Z

ACK. I'm working on it.

vismayku · 2023-08-21T16:52:58Z

I'm seeing an un-related test case failure. I am seeing the same failure even with freshly checked out repo.


[ERROR] Failures: 
[ERROR]   ClassicRequestBuilderTest.constructor:65 expected: <REDACTED> but was: <[REDACTED]>

arturobernalg · 2023-08-21T19:58:20Z

httpcore5/src/main/java/org/apache/hc/core5/util/TextUtils.java

+        if ((c >= 0x20 && c <= 0x7E) || // Visible ASCII
+            (c >= 0xA0 && c <= 0xFF) || // Visible ISO-8859-1
+             c == 0x09) {               // TAB
+            return (byte) c;


@vismayku I think you need to ensure unexpected truncation by checking the range of the input before performing the cast.

if (c <= 127) { // Ensure it's within byte range return (byte) c; }

Also, squash your commits into a single one for a cleaner commit history

@vismayku I think you need to ensure unexpected truncation by checking the range of the input before performing the cast.

@arturobernalg I think the existing implementation already does that. All non-printable as well as all non-ascii characters get converted to ?. This is expected behavior.

arturobernalg · 2023-08-21T19:58:20Z

httpcore5/src/main/java/org/apache/hc/core5/util/TextUtils.java

+        if ((c >= 0x20 && c <= 0x7E) || // Visible ASCII
+            (c >= 0xA0 && c <= 0xFF) || // Visible ISO-8859-1
+             c == 0x09) {               // TAB
+            return (byte) c;


@vismayku I think you need to ensure unexpected truncation by checking the range of the input before performing the cast.

if (c <= 127) { // Ensure it's within byte range return (byte) c; }

Also, squash your commits into a single one for a cleaner commit history

ok2c

@vismayku @arturobernalg The change-set looks good to me.

vismayku · 2023-08-22T13:54:40Z

@ok2c Sorry for the ping but I have a follow up question.
I also want to contribute similar change to 4.x major version. Which specific major-minor version I should work on?

ok2c · 2023-08-22T16:53:48Z

@ok2c Sorry for the ping but I have a follow up question. I also want to contribute similar change to 4.x major version. Which specific major-minor version I should work on?

@vismayku We are not going to make any changes to the 4.x code beyond critical security and protocol fixes. This is not one of those.

Filter characters before byte conversion

17d51b5

vismayku marked this pull request as ready for review August 17, 2023 23:27

Filter characters before byte conversion

c61ba53

ok2c requested changes Aug 18, 2023

View reviewed changes

Filter characters before byte conversion

b0884d7

arturobernalg requested changes Aug 18, 2023

View reviewed changes

vismayku requested a review from ok2c August 18, 2023 20:25

ok2c requested changes Aug 19, 2023

View reviewed changes

Filter characters before byte conversion

e391e58

vismayku requested review from arturobernalg and ok2c August 21, 2023 16:50

github-advanced-security bot found potential problems Aug 21, 2023

View reviewed changes

arturobernalg requested changes Aug 21, 2023

View reviewed changes

ok2c approved these changes Aug 21, 2023

View reviewed changes

arturobernalg approved these changes Aug 21, 2023

View reviewed changes

ok2c merged commit 4e72f40 into apache:master Aug 22, 2023
9 checks passed

ok2c pushed a commit that referenced this pull request Aug 22, 2023

Filter characters before byte conversion (#416)

b209e7b

ok2c pushed a commit that referenced this pull request Sep 15, 2023

Filter characters before byte conversion (#416)

f255736

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filter characters before byte conversion #416

Filter characters before byte conversion #416

vismayku commented Aug 17, 2023

ok2c left a comment

vismayku commented Aug 18, 2023

vismayku commented Aug 18, 2023

arturobernalg Aug 18, 2023

ok2c left a comment

vismayku commented Aug 21, 2023

vismayku commented Aug 21, 2023 •

edited

Loading

arturobernalg Aug 21, 2023 •

edited

Loading

ok2c Aug 21, 2023 •

edited

Loading

arturobernalg Aug 21, 2023 •

edited

Loading

ok2c left a comment

vismayku commented Aug 22, 2023

ok2c commented Aug 22, 2023

Filter characters before byte conversion #416

Filter characters before byte conversion #416

Conversation

vismayku commented Aug 17, 2023

ok2c left a comment

Choose a reason for hiding this comment

vismayku commented Aug 18, 2023

vismayku commented Aug 18, 2023

arturobernalg Aug 18, 2023

Choose a reason for hiding this comment

ok2c left a comment

Choose a reason for hiding this comment

vismayku commented Aug 21, 2023

vismayku commented Aug 21, 2023 • edited Loading

arturobernalg Aug 21, 2023 • edited Loading

Choose a reason for hiding this comment

ok2c Aug 21, 2023 • edited Loading

Choose a reason for hiding this comment

arturobernalg Aug 21, 2023 • edited Loading

Choose a reason for hiding this comment

ok2c left a comment

Choose a reason for hiding this comment

vismayku commented Aug 22, 2023

ok2c commented Aug 22, 2023

vismayku commented Aug 21, 2023 •

edited

Loading

arturobernalg Aug 21, 2023 •

edited

Loading

ok2c Aug 21, 2023 •

edited

Loading

arturobernalg Aug 21, 2023 •

edited

Loading