Skip to content

Conversation

@paulohtb6
Copy link
Collaborator

@paulohtb6 paulohtb6 commented Oct 8, 2025

Description

Adds Shadowing docs.
Adds emergency runbook.

Resolves https://redpandadata.atlassian.net/browse/DOC-1665
Review deadline: Oct 17th

Page previews

Shadowing
Shadowing guide

Checks

  • New feature
  • Content gap
  • Support Follow-up
  • Small fix (typos, links, copyedits, etc)

@netlify
Copy link

netlify bot commented Oct 8, 2025

Deploy Preview for redpanda-docs-preview ready!

Name Link
🔨 Latest commit aedbad9
🔍 Latest deploy log https://app.netlify.com/projects/redpanda-docs-preview/deploys/68fbe2b5e25cb80008eb009b
😎 Deploy Preview https://deploy-preview-1381--redpanda-docs-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 8, 2025

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

  • Added a navigation entry for a new Shadowing guide under Redpanda deployment manual.
  • Introduced a comprehensive Shadowing documentation page covering architecture, scope, prerequisites, setup, configuration, filtering, monitoring, failover behavior, and best practices, with CLI/Admin API examples.
  • Added an emergency runbook page for disaster failover of Shadow Links, including assessment, verification, failover execution (cluster-wide or selective), monitoring, app reconfiguration, troubleshooting, recovery, and post-incident steps.
  • Included enterprise licensing cross-reference in the emergency guide.
  • No changes to exported/public code entities.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Admin as Operator
  participant Prim as Primary Cluster
  participant Shadow as Shadow Cluster
  participant Ctrl as Admin API / rpk
  participant Sec as Auth/TLS
  participant Obs as Monitoring

  rect rgb(235, 245, 255)
  note over Admin,Ctrl: Configure Shadowing
  Admin->>Ctrl: Create shadow link (templates, filters)
  Ctrl->>Sec: Authenticate / TLS handshake
  Ctrl->>Prim: Apply link config
  Prim-->>Shadow: Establish replication channel
  end

  rect rgb(245, 255, 235)
  note over Prim,Shadow: Ongoing Replication (normal ops)
  Prim-->>Shadow: Replicate topics/configs/ACLs/schema
  Prim-->>Shadow: Preserve offsets/timestamps (where applicable)
  Admin->>Ctrl: rpk/admin queries (status/metrics)
  Ctrl-->>Obs: Emit metrics/alerts
  end

  rect rgb(255, 245, 235)
  note right of Admin: Planned ops are handled in Shadowing guide
  end
Loading
sequenceDiagram
  autonumber
  actor Admin as Operator
  participant Prim as Primary Cluster
  participant Shadow as Shadow Cluster
  participant Ctrl as Admin API / rpk
  participant Apps as Applications/Clients
  participant Obs as Monitoring

  rect rgb(255, 245, 235)
  note over Admin,Prim: Emergency Failover Runbook
  Admin->>Prim: Assess incident, document state
  Admin->>Shadow: Verify readiness/health
  Admin->>Ctrl: Initiate failover (full or selective)
  Ctrl->>Shadow: Transition shadow links (FAILING_OVER→ACTIVE)
  Shadow-->>Obs: Report progress/status
  end

  rect rgb(245, 255, 235)
  note over Apps,Shadow: Post-failover
  Admin->>Apps: Update bootstrap/endpoints, TLS/ACLs
  Apps->>Shadow: Reconnect and resume traffic
  Admin->>Ctrl: Verify topics/consumer groups/offsets
  end

  alt Issues detected
    Obs-->>Admin: Alerts (PAUSED, stuck states, auth failures)
    Admin->>Ctrl: Troubleshoot per runbook steps
  else Stable
    Admin->>Prim: Plan recovery/back-sync later
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed The title succinctly conveys the main change by stating that shadowing documentation is being added and does not include extraneous details, making it clear and focused on the core update.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
Description Check ✅ Passed The pull request description follows the required template structure with all necessary sections completed. The Description section includes the JIRA ticket reference (DOC-1665) and review deadline (Oct 17th), clearly stating what is being added (Shadowing docs and emergency runbook). The Page previews section provides two properly formatted preview links following the specified pattern. The Checks section includes the required checkboxes with "New feature" appropriately selected. The description is concise and complete, providing sufficient context for reviewers to understand the scope of the changes.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@paulohtb6 paulohtb6 marked this pull request as ready for review October 15, 2025 02:44
@paulohtb6 paulohtb6 requested a review from a team as a code owner October 15, 2025 02:44
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
modules/ROOT/nav.adoc (1)

88-91: Add the emergency failover doc to navigation

Shadowing entry looks good. Add a sibling nav item for the emergency runbook so users can find it.

Example:

 **** xref:deploy:redpanda/manual/high-availability.adoc[High Availability]
 **** xref:deploy:redpanda/manual/resilience/shadowing.adoc[Shadowing]
+**** xref:deploy:redpanda/manual/resilience/emergency-shadowing.adoc[Emergency Shadowing Failover]
 **** xref:deploy:redpanda/manual/sizing-use-cases.adoc[Sizing Use Cases]
modules/deploy/pages/redpanda/manual/resilience/shadowing.adoc (2)

290-299: Avoid promoting plaintext secrets in examples

Add a callout suggesting env vars or file-based secrets for credentials (and mTLS certs/keys), not inline plaintext.

Example:

  • Prefer env vars (RPK_SASL_PASSWORD) or reference secret files
  • Link to security guidance on managing secrets

38-38: Diagram TODO

If you need help, I can draft a diagram (draw.io/mermaid) showing active→shadow replication, preserved offsets/timestamps, and replicated artifacts.

modules/deploy/pages/redpanda/manual/resilience/emergency-shadowing.adoc (1)

74-83: Call out irreversibility before executing failover

Add an [IMPORTANT] note that failover promotion is irreversible; no automatic fallback. Place immediately before the commands.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Jira integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 464513b and dafed89.

📒 Files selected for processing (3)
  • modules/ROOT/nav.adoc (1 hunks)
  • modules/deploy/pages/redpanda/manual/resilience/emergency-shadowing.adoc (1 hunks)
  • modules/deploy/pages/redpanda/manual/resilience/shadowing.adoc (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Redirect rules - redpanda-docs-preview
  • GitHub Check: Header rules - redpanda-docs-preview
  • GitHub Check: Pages changed - redpanda-docs-preview
🔇 Additional comments (6)
modules/deploy/pages/redpanda/manual/resilience/emergency-shadowing.adoc (2)

6-10: Enterprise license note is consistent; LGTM

Keep this partial include at the top across both docs for consistency.


48-56: Verify rpk shadow subcommands and flags: Confirm that rpk shadow list, status, failover, delete, resume and their flags (--all, --topic, --no-confirm) used in emergency-shadowing.adoc (and the corresponding sections in shadowing.adoc) match the current output of rpk shadow --help.

modules/deploy/pages/redpanda/manual/resilience/shadowing.adoc (4)

330-425: Verify ShadowLinkConfig schema alignment
Ensure the YAML example’s field names (client_options, authentication_configuration, topic_metadata_sync_options, synced_shadow_topic_properties, consumer_offset_sync_options, security_sync_options) exactly match the ShadowLinkConfig schema in the Admin API or rpk CLI.


54-57: Verify and cite Shadowing’s minimum version requirement

  • Confirm that Shadowing was introduced in Redpanda v25.3 and update the prerequisite if needed.
  • Add a link to the official v25.3 release notes or product specification where this requirement is defined.

557-576: Confirm shadow-link metrics are documented and standardize type/units
Verify that each redpanda_shadow_link_* metric appears in modules/reference/pages/public-metrics-reference.adoc and update every description to explicitly specify the Prometheus type (counter vs gauge) and units (bytes, records, offsets).


231-237: Verify rpk shadow config generate exists and --output flag

Confirm this subcommand and its --output flag are implemented in the CLI; update the docs if they’re missing.

@paulohtb6 paulohtb6 changed the base branch from main to beta October 16, 2025 15:16
@bharathv
Copy link

@paulohtb6 I have a hard time finding these changes in https://deploy-preview-1381--redpanda-docs-preview.netlify.app/current/get-started/intro-to-events/ (can you please point me to the exact URL).

@paulohtb6
Copy link
Collaborator Author

@bharathv Hey Bharath. Changes are in the page previews section on the PR description.

Copying them here too
Page previews
Shadowing
Shadowing runbook

@@ -0,0 +1,212 @@
= Shadowing Runbook
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paulohtb6 I think the page should be renamed too, so "emergency" is not in the URL. Also, the term runbook feels internal to me. What do you think about Failover for Disaster Recovery or Disaster Recovery Guide? cc @Feediver1

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to "Shadowing Guide". Let me know if it's ok or if I should change more.


Redpanda v25.3 introduces xref:deploy:redpanda/manual/resilience/shadowing.adoc[Shadowing], an Enterprise-licensed disaster recovery solution that provides asynchronous, offset-preserving replication between distinct Redpanda clusters. Shadowing enables cross-region data protection by replicating topic data, configurations, consumer group offsets, ACLs, and Schema Registry data with byte-level fidelity.

The shadow cluster operates in read-only mode while continuously receiving updates from the source cluster. During a disaster, you can fail over individual topics or an entire shadow link to make resources fully writable for production traffic. See xref:deploy:redpanda/manual/resilience/shadowing-guide.adoc[Emergency Shadowing Guide] for emergency procedures.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest shadowing-guide.adoc[] since this keeps getting renamed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also suggest failover (one word) here, since most doc uses that. I still think we should discuss overall usage, but for now I'd keep it all consistent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants