Skip to content

Commit 1226588

Browse files
authored
Fix UTF8 data generator in libcudf benchmarks utility (#20465)
Fixes the `string_generator` utility logic to produce valid random UTF8 bytes. The 2nd byte requires the top 2 bits to be `10`. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Bradley Dice (https://github.com/bdice) - Nghia Truong (https://github.com/ttnghia) URL: #20465
1 parent 3b29d65 commit 1226588

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

cpp/benchmarks/common/generate_input.cu

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -464,7 +464,7 @@ struct string_generator {
464464
if (i == end - 1 && ch >= '\x7F') ch = ' '; // last element ASCII only.
465465
if (ch >= '\x7F') { // x7F is at the top edge of ASCII
466466
chars[i++] = '\xC4'; // these characters are assigned two bytes
467-
ch = ch | 0x80;
467+
ch = (ch >> 2) | 0x80;
468468
}
469469
chars[i] = static_cast<char>(ch);
470470
}

0 commit comments

Comments
 (0)