Skip to content

Commit 7b6e5ef

Browse files
caohy1988claude
andcommitted
test: add regression test for mixed-unit offload false positive
Add test_multibyte_under_char_and_byte_limits_stays_inline which exercises the specific #5561 regression: max_length=10000 with 3K emoji chars (12K bytes). Under the old mixed-unit min(), the byte count exceeded the character-based threshold, triggering a false offload. With the fix, both limits are under their respective thresholds and the text stays inline. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 00b8188 commit 7b6e5ef

1 file changed

Lines changed: 31 additions & 0 deletions

File tree

tests/unittests/plugins/test_bigquery_agent_analytics_plugin.py

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7498,3 +7498,34 @@ async def test_no_offloader_falls_back_to_truncate(self):
74987498
assert is_truncated
74997499
assert parts[0]["storage_mode"] == "INLINE"
75007500
assert "TRUNCATED" in parts[0]["text"]
7501+
7502+
@pytest.mark.asyncio
7503+
async def test_multibyte_under_char_and_byte_limits_stays_inline(self):
7504+
"""Multi-byte text under both char limit and byte limit stays inline.
7505+
7506+
This is the specific regression case from #5561: with the old
7507+
mixed-unit min(), max_length=10000 became the offload_threshold,
7508+
and byte_len (12K) > 10000 triggered a false offload even though
7509+
char_len (3K) < max_length and byte_len (12K) < inline_text_limit
7510+
(32KB).
7511+
"""
7512+
mock_offloader = mock.AsyncMock()
7513+
parser = bigquery_agent_analytics_plugin.HybridContentParser(
7514+
offloader=mock_offloader,
7515+
trace_id="t",
7516+
span_id="s",
7517+
max_length=10000,
7518+
)
7519+
7520+
# 3K emoji chars → ~12K bytes
7521+
text = "\U0001f600" * 3000
7522+
assert len(text) < 10000 # under char limit
7523+
assert len(text.encode("utf-8")) > 10000 # bytes > max_length
7524+
assert len(text.encode("utf-8")) < 32 * 1024 # under byte limit
7525+
7526+
content = types.Content(parts=[types.Part(text=text)])
7527+
_, parts, _ = await parser._parse_content_object(content)
7528+
7529+
# Should NOT offload: under both real limits
7530+
mock_offloader.upload_content.assert_not_called()
7531+
assert parts[0]["storage_mode"] == "INLINE"

0 commit comments

Comments
 (0)