Skip to content

admin/v2: add public IcebergService#30846

Open
WillemKauf wants to merge 1 commit into
redpanda-data:devfrom
WillemKauf:iceberg-core
Open

admin/v2: add public IcebergService#30846
WillemKauf wants to merge 1 commit into
redpanda-data:devfrom
WillemKauf:iceberg-core

Conversation

@WillemKauf

Copy link
Copy Markdown
Contributor

Add a read-only admin v2 surface over the Datalake subsystem. Unlike the DatalakeService, this is intended for public consumption for rpk and Console as a way to provide monitoring/health reporting for their Iceberg deployments.

Currently, the IcebergService provides a way to:

  1. Query catalog connectivity on the fly
  2. Probe the datalake::coordinator for per-partition iceberg state, allowing us to report on metrics like translation lag, commit lag, etc per-partition. Callers can use this data to report e.g. top N laggiest partitions in the cluster.

No rpk or Console bindings are provided yet.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v26.1.x
  • v25.3.x
  • v25.2.x

Release Notes

  • none

Copilot AI review requested due to automatic review settings June 18, 2026 18:20
@WillemKauf WillemKauf requested review from a team as code owners June 18, 2026 18:20
@WillemKauf WillemKauf requested review from nguyen-andrew and removed request for a team June 18, 2026 18:20

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new public, read-only Admin API v2 surface (IcebergService) to expose Datalake/Iceberg health and per-partition translation/commit status for consumption by external tooling (e.g. rpk, Console).

Changes:

  • Registers a new Admin v2 service (IcebergService) in the Redpanda admin server wiring.
  • Implements GetIcebergStatus to return per-topic/per-partition translated + committed offsets and computed commit lag, plus catalog reachability.
  • Adds a new Admin v2 protobuf definition and Bazel build targets for the service and its generated code.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/v/redpanda/BUILD Links the new iceberg admin service library into the redpanda target.
src/v/redpanda/application_admin.cc Registers iceberg_service_impl with the admin server.
src/v/redpanda/admin/services/iceberg/iceberg.h Declares the new Admin v2 iceberg service implementation.
src/v/redpanda/admin/services/iceberg/iceberg.cc Implements GetIcebergStatus, mapping coordinator state and probing catalog health.
src/v/redpanda/admin/services/iceberg/BUILD Adds Bazel target for the new service implementation.
proto/redpanda/core/admin/v2/iceberg.proto Introduces the public Admin v2 IcebergService protobuf API.
proto/redpanda/core/admin/v2/BUILD Adds Bazel proto targets for the new iceberg.proto.

Comment thread src/v/redpanda/admin/services/iceberg/iceberg.cc
Add a read-only admin v2 surface over the Datalake subsystem. Unlike
the `DatalakeService`, this is intended for public consumption for
`rpk` and Console as a way to provide monitoring/health reporting for
their Iceberg deployments.

Currently, the `IcebergService` provides a way to:
1. Query catalog connectivity on the fly
2. Probe the `datalake::coordinator` for per-partition iceberg state,
   allowing us to report on metrics like translation lag, commit lag,
   etc per-partition. Callers can use this data to report e.g. top N
   laggiest partitions in the cluster.

No `rpk` or Console bindings are provided yet.
@vbotbuildovich

Copy link
Copy Markdown
Collaborator

CI test results

test results on build#85997
test_status test_class test_method test_arguments test_kind job_url passed reason test_history
FLAKY(PASS) ShadowLinkingRandomOpsTest test_node_operations {"failures": true, "workload_set": "cloud_combos"} integration https://buildkite.com/redpanda/redpanda/builds/85997#019edc13-de0a-4637-8c4d-b9cbaae9feb0 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0183, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkingRandomOpsTest&test_method=test_node_operations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants