Skip to content

[CORE-16628]: Cloud Topics: HTT and more scale tests#30818

Draft
oleiman wants to merge 2 commits into
devfrom
ct/core-16628/htt-and-st
Draft

[CORE-16628]: Cloud Topics: HTT and more scale tests#30818
oleiman wants to merge 2 commits into
devfrom
ct/core-16628/htt-and-st

Conversation

@oleiman

@oleiman oleiman commented Jun 16, 2026

Copy link
Copy Markdown
Member

Cloud topics serve cold reads by fetching L1 objects from object storage through a
per-shard S3 connection pool that the cloud_io scheduler arbitrates under its reservation
policy: the produce path's L0 uploads (producer_upload) keep a reserved floor even when
cold fetches (consumer_fetch) saturate the pool. This PR adds coverage that produce stays
healthy under cold-read pressure for cloud topics, as part of the tier-9 cloud-topics
scaleup (CORE-16628).

CDT scale test

(scale_tests/cloud_topics_cold_read_scale_test.py) a coarse regression
gate. A steady producer keeps a cloud topic warm while a multi-reader consumer group
re-reads it from offset 0 on a loop; the topic's data exceeds the cloud cache, so the
reads keep missing and fetch L1 cold, contending the per-shard pool. The pool is
deliberately small (cloud_storage_max_connections=8, shipped reservation so
producer_upload is floored at 2) — at CDT scale you can't saturate the default
20-connection pool, and the floor mechanism is scale-invariant. It asserts produce holds
≥70% of its offered rate under that contention, with self-confirming guards that the reads
were genuinely cold (pulled back more than the cache) and the pool was genuinely
contended (had waiters). The precise reservation-vs-passthrough A/B lives in the
bench-runner tier-9 configs; the floor itself is unit-tested in
cloud_io/tests/scheduler_test.cc.

The scale test SIGKILLs the brokers at teardown: under sustained pool saturation the
reconciler wedges on orphaned multipart uploads and a graceful shutdown hangs
(CORE-16648). The produce assertion is already decided by then, so the test abandons the
cluster rather than block on the wedge; a code comment marks the force-stop for removal
once CORE-16648 is fixed.

High-throughput cloud stage

(redpanda_cloud_tests/high_throughput_test.py::test_cloud_topics_cold_read) — the
real-cloud analog of the tiered-storage consuming stage. Steady produce on a
storage.mode=cloud topic at max tier ingress while a large backlog (~4 min of ingress,
sized to exceed the batch cache) drains cold from object storage; asserts produce keeps
flowing and the backlog drains. Runs against a real Redpanda Cloud cluster and requires
cloud topics enabled on the tier — a throughput/functionality check at tier scale, not the
pool-saturation floor gate.

Testing: the scale test passes in CDT; the high-throughput stage runs against a
provisioned cloud cluster per tier.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v26.1.x
  • v25.3.x
  • v25.2.x

Release Notes

  • none

@oleiman oleiman self-assigned this Jun 16, 2026
@oleiman oleiman changed the title Ct/core 16628/htt and st [CORE-16628]: Cloud Topics: HTT and more scale tests Jun 16, 2026
@oleiman

oleiman commented Jun 17, 2026

Copy link
Copy Markdown
Member Author

/cdt
rp_version=build
tests/rptest/scale_tests/cloud_topics_cold_read_scale_test.py

@oleiman

oleiman commented Jun 18, 2026

Copy link
Copy Markdown
Member Author

/cdt
tests/rptest/scale_tests/cloud_topics_cold_read_scale_test.py

@oleiman oleiman force-pushed the ct/core-16628/htt-and-st branch from b67093f to 49a6427 Compare June 18, 2026 02:48
@oleiman

oleiman commented Jun 18, 2026

Copy link
Copy Markdown
Member Author

/cdt
tests/rptest/scale_tests/cloud_topics_cold_read_scale_test.py

@oleiman oleiman force-pushed the ct/core-16628/htt-and-st branch from 49a6427 to 7e1cdeb Compare June 18, 2026 03:43
@oleiman

oleiman commented Jun 18, 2026

Copy link
Copy Markdown
Member Author

/cdt
tests/rptest/scale_tests/cloud_topics_cold_read_scale_test.py

@oleiman oleiman force-pushed the ct/core-16628/htt-and-st branch from 7e1cdeb to ff4f4d0 Compare June 18, 2026 04:17
@oleiman

oleiman commented Jun 18, 2026

Copy link
Copy Markdown
Member Author

/cdt
tests/rptest/scale_tests/cloud_topics_cold_read_scale_test.py

@oleiman oleiman force-pushed the ct/core-16628/htt-and-st branch from ff4f4d0 to ddae5c4 Compare June 18, 2026 05:51
@oleiman

oleiman commented Jun 18, 2026

Copy link
Copy Markdown
Member Author

/cdt
tests/rptest/scale_tests/cloud_topics_cold_read_scale_test.py

@oleiman oleiman force-pushed the ct/core-16628/htt-and-st branch from ddae5c4 to 0a354e4 Compare June 18, 2026 07:21
@oleiman

oleiman commented Jun 18, 2026

Copy link
Copy Markdown
Member Author

/cdt
tests/rptest/scale_tests/cloud_topics_cold_read_scale_test.py

@oleiman oleiman force-pushed the ct/core-16628/htt-and-st branch from 0a354e4 to f019710 Compare June 18, 2026 17:11
@oleiman

oleiman commented Jun 18, 2026

Copy link
Copy Markdown
Member Author

/cdt
tests/rptest/scale_tests/cloud_topics_cold_read_scale_test.py

@oleiman oleiman force-pushed the ct/core-16628/htt-and-st branch from f019710 to db9b33f Compare June 18, 2026 20:38
@oleiman

oleiman commented Jun 18, 2026

Copy link
Copy Markdown
Member Author

/cdt
tests/rptest/scale_tests/cloud_topics_cold_read_scale_test.py

@oleiman oleiman force-pushed the ct/core-16628/htt-and-st branch from db9b33f to c745e84 Compare June 18, 2026 21:41
@oleiman

oleiman commented Jun 18, 2026

Copy link
Copy Markdown
Member Author

/cdt
tests/rptest/scale_tests/cloud_topics_cold_read_scale_test.py

@oleiman oleiman force-pushed the ct/core-16628/htt-and-st branch from c745e84 to 422a8e2 Compare June 18, 2026 23:05
@oleiman

oleiman commented Jun 18, 2026

Copy link
Copy Markdown
Member Author

/cdt
tests/rptest/scale_tests/cloud_topics_cold_read_scale_test.py

@oleiman oleiman force-pushed the ct/core-16628/htt-and-st branch from 422a8e2 to 6e40ca8 Compare June 19, 2026 00:50
@oleiman

oleiman commented Jun 19, 2026

Copy link
Copy Markdown
Member Author

/cdt
tests/rptest/scale_tests/cloud_topics_cold_read_scale_test.py

@oleiman oleiman force-pushed the ct/core-16628/htt-and-st branch from 6e40ca8 to 0f9c2a9 Compare June 19, 2026 04:02
oleiman added 2 commits June 18, 2026 21:10
A CDT scale gate for the cloud_io reservation floor. A producer keeps a
cloud topic warm at a moderate rate while a multi-reader consumer group
re-reads it from offset 0 on a loop. The topic's data exceeds the cloud
cache, so the reads keep missing and fetch L1 cold, contending a small
per-shard S3 pool against the produce-path L0 uploads.

The test asserts produce stays healthy under that contention, protected
by producer_upload's reserved floor, with self-confirming guards that
the reads were genuinely cold (more than the cache) and the pool was
genuinely contended (had waiters). A coarse regression gate, not a
reservation-vs-passthrough A/B.

Brokers are force-stopped at teardown: under sustained pool saturation
the reconciler wedges on orphaned multipart uploads and a graceful
shutdown hangs. That bug is tracked in CORE-16648, and we'll reinstate
the usual shutdown ceremony when a fix lands.
stage_cloud_topics_cold_read + test_cloud_topics_cold_read: a cloud-topics
analog of stage_tiered_storage_consuming that runs on a real Redpanda Cloud
cluster at the sold tier (cloud topics is available there). Steady produce at
max tier ingress + an RpkConsumer draining the backlog cold from oldest;
asserts produce advances and the backlog drains. Backlog volume and drain
timeout are calibration knobs.
@oleiman oleiman force-pushed the ct/core-16628/htt-and-st branch from 0f9c2a9 to 3c9000f Compare June 19, 2026 04:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant