-
Couldn't load subscription status.
- Fork 47
Shadowing new structure #1409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Shadowing new structure #1409
Changes from 9 commits
Commits
Show all changes
24 commits
Select commit
Hold shift + click to select a range
66ea1e7
major reorg on dr/shadowing
paulohtb6 43222cc
fix links
paulohtb6 82e6f66
fix links and organization
paulohtb6 ce971a9
modify names
paulohtb6 60114ff
link fixing
paulohtb6 bc7c83d
adjust index
paulohtb6 d880a9a
try new structure under manage
paulohtb6 57fc513
adjust structure
paulohtb6 63e9e77
fix links
paulohtb6 077d54e
Update modules/deploy/pages/redpanda/manual/disaster-recovery/shadowi…
paulohtb6 9e30b19
fix titles
paulohtb6 ae5b0d2
fix whatsnew
paulohtb6 01700d8
fix note
paulohtb6 1240f58
adjust levels
paulohtb6 dd77cd9
add aliases
paulohtb6 ca29f4a
change order
paulohtb6 d2e1a13
fix title
paulohtb6 98054c8
remove link names
paulohtb6 e378143
updates to nav
paulohtb6 52f58ae
Update modules/deploy/pages/redpanda/manual/disaster-recovery/shadowi…
paulohtb6 8672af7
Update modules/deploy/pages/redpanda/manual/disaster-recovery/shadowi…
paulohtb6 642d7ef
Update modules/deploy/pages/redpanda/manual/disaster-recovery/shadowi…
paulohtb6 5a4d418
Update modules/deploy/pages/redpanda/manual/disaster-recovery/shadowi…
paulohtb6 de59acc
Update modules/deploy/pages/redpanda/manual/disaster-recovery/shadowi…
paulohtb6 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
5 changes: 5 additions & 0 deletions
5
modules/deploy/pages/redpanda/manual/disaster-recovery/index.adoc
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| = Disaster Recovery | ||
| :description: Set up disaster recovery for Redpanda clusters using Shadowing for cross-region replication. | ||
| :env-linux: true | ||
| :page-layout: index | ||
| :page-categories: Management, High Availability, Disaster Recovery |
8 changes: 3 additions & 5 deletions
8
...da/manual/resilience/shadowing-guide.adoc → ...recovery/shadowing/disaster-response.adoc
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
98 changes: 98 additions & 0 deletions
98
modules/deploy/pages/redpanda/manual/disaster-recovery/shadowing/failover.adoc
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,98 @@ | ||
| = Failover | ||
| :description: Execute Redpanda Disaster Recovery (Shadowing) failover procedures to transform shadow topics into fully writable resources during disasters. | ||
paulohtb6 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| :page-categories: Management, High Availability, Disaster Recovery | ||
|
|
||
| include::shared:partial$enterprise-license.adoc[] | ||
|
|
||
| include::shared:partial$emergency-shadowing-callout.adoc[] | ||
|
|
||
| Failover is the process of modifying shadow topics or an entire shadow cluster from read-only replicas to fully writable resources, and ceasing replication from the source cluster. You can fail over individual topics for selective workload migration or fail over the entire cluster for comprehensive disaster recovery. This critical operation transforms your shadow resources into operational production assets, allowing you to redirect application traffic when the source cluster becomes unavailable. | ||
|
|
||
| == Failover behavior | ||
|
|
||
| When you initiate failover, Redpanda performs the following operations: | ||
|
|
||
| 1. **Stops replication**: Halts all data fetching from the source cluster for the specified topics or entire shadow link | ||
| 2. **Failover topics**: Converts read-only shadow topics into regular, writable topics | ||
| 3. **Updates topic state**: Changes topic status from `ACTIVE` to `FAILING_OVER`, then `FAILED_OVER` | ||
|
|
||
| Topic failover is irreversible. Once failed over, topics cannot return to shadow mode, and automatic fallback to the original source cluster is not supported. | ||
|
|
||
| == Failover granularity options | ||
|
|
||
| Redpanda supports failover at different levels of granularity to match your disaster recovery needs: | ||
|
|
||
| **Individual topic failover** applies only to specific shadow topics while leaving other topics in the shadow link still replicating. Use this approach when you need to selectively failover specific workloads or when testing failover procedures. | ||
|
|
||
| **Complete shadow link failover (cluster failover)** applies to all shadow topics associated with the shadow link simultaneously, effectively failing over the entire cluster's replicated data. Use this approach during a complete regional disaster when you need to activate the entire shadow cluster as your new production environment. | ||
|
|
||
| **Force delete shadow link (emergency failover)** is an irreversible operation that immediately fails over all topics in the link, bypassing the normal failover state transitions. This action should only be used as a last resort when topics are stuck in transitional states and you need immediate access to all replicated data. | ||
|
|
||
| == Failover states | ||
|
|
||
| === Shadow link states | ||
|
|
||
| The shadow link itself has a simple state model: | ||
|
|
||
| * **`ACTIVE`**: Shadow link is operating normally, replicating data | ||
|
|
||
| Shadow links do not have dedicated failover states. Instead, the link's operational status is determined by the collective state of its shadow topics. | ||
|
|
||
| === Shadow topic states | ||
|
|
||
| Individual shadow topics progress through specific states during failover: | ||
|
|
||
| * **`ACTIVE`**: Normal replication state before failover | ||
| * **`FAULTED`**: Shadow topic has encountered an error and is not replicating | ||
| * **`FAILING_OVER`**: Failover initiated, replication stopping | ||
| * **`FAILED_OVER`**: Failover completed successfully, topic fully writable | ||
|
|
||
|
|
||
|
|
||
| == Post-failover cluster behavior | ||
|
|
||
| After successful failover, your shadow cluster exhibits the following characteristics: | ||
|
|
||
| **Topic accessibility:** | ||
|
|
||
| * Failed over topics become fully writable and readable. | ||
| * Applications can produce and consume messages normally. | ||
| * All Kafka APIs are available for failedover topics. | ||
| * Original offsets and timestamps are preserved. | ||
|
|
||
| **Shadow link status:** | ||
|
|
||
| * The shadow link remains but stops replicating data. | ||
| * Link status shows topics in `FAILED_OVER` state. | ||
| * You can safely delete the shadow link after successful failover. | ||
|
|
||
| **Operational limitations:** | ||
|
|
||
| * No automatic fallback mechanism to the original source cluster. | ||
| * Data transforms remain disabled until you manually re-enable them. | ||
| * Audit log history from the source cluster is not available (new audit logs begin immediately). | ||
|
|
||
| == Failover considerations and limitations | ||
|
|
||
| **Data consistency:** | ||
|
|
||
| * Some data loss may occur due to replication lag at the time of failover. | ||
| * Consumer group offsets are preserved, allowing applications to resume from their last committed position. | ||
| * In-flight transactions at the source cluster are not replicated and will be lost. | ||
|
|
||
| **Recovery-point-objective (RPO):** | ||
|
|
||
| The amount of potential data loss depends on replication lag when disaster occurs. Monitor lag metrics to understand your effective RPO. | ||
|
|
||
| **Network partitions:** | ||
|
|
||
| If the source cluster becomes accessible again after failover, do not attempt to write to both clusters simultaneously. This creates a scenario with potential data inconsistencies, since metadata starts to diverge. | ||
|
|
||
| **Testing requirements:** | ||
|
|
||
| Regularly test failover procedures in non-production environments to validate your disaster recovery processes and measure RTO. | ||
|
|
||
| == Next steps | ||
|
|
||
| * **Execute failover**: For step-by-step failover procedures, see xref:./setup-failover.adoc[Set Up Failover] | ||
| * **Emergency situations**: For rapid disaster response, see xref:./disaster-response.adoc[Disaster Response Procedures] | ||
77 changes: 77 additions & 0 deletions
77
...ploy/pages/redpanda/manual/disaster-recovery/shadowing/how-shadowing-works.adoc
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,77 @@ | ||
| = How Shadowing works | ||
| :description: Set up disaster recovery for Redpanda clusters using Shadowing for cross-region replication. | ||
| :env-linux: true | ||
| :page-categories: Management, High Availability, Disaster Recovery | ||
|
|
||
| include::shared:partial$enterprise-license.adoc[] | ||
|
|
||
| Shadowing is Redpanda's enterprise-grade disaster recovery solution that establishes asynchronous, offset-preserving replication between two distinct Redpanda clusters. A cluster is able to create a dedicated client that continuously replicates source cluster data, including offsets, timestamps, and cluster metadata. This creates a read-only shadow cluster that you can quickly failover to handle production traffic during a disaster. | ||
|
|
||
| include::shared:partial$emergency-shadowing-callout.adoc[] | ||
|
|
||
| Unlike traditional replication tools that re-produce messages, Shadowing copies data at the byte level, ensuring shadow topics contain identical copies of source topics with preserved offsets and timestamps. | ||
|
|
||
| Shadowing replicates: | ||
|
|
||
| * **Topic data**: All records with preserved offsets and timestamps | ||
| * **Topic configurations**: Partition counts, retention policies, and other xref:reference:properties/topic-properties.adoc[topic properties] | ||
| * **Consumer group offsets**: Enables seamless consumer resumption after failover | ||
| * **Access Control Lists (ACLs)**: User permissions and security policies | ||
| * **Schema Registry data**: Schema definitions and compatibility settings | ||
|
|
||
| == How Shadowing fits into disaster recovery | ||
|
|
||
| Shadowing addresses enterprise disaster recovery requirements driven by regulatory compliance and business continuity needs. Organizations typically want to minimize both recovery time objective (RTO) and recovery point objective (RPO), and Shadowing asynchronous replication helps you achieve both goals by reducing data loss during regional outages and enabling rapid application recovery. | ||
|
|
||
| The architecture follows an active-passive pattern. The source cluster processes all production traffic while the shadow cluster remains in read-only mode, continuously receiving updates. If a disaster occurs, you can failover the shadow topics using the Admin API or `rpk`, making them fully writable. At that point, you can redirect your applications to the shadow cluster, which becomes the new production cluster. | ||
|
|
||
| Shadowing complements Redpanda's existing availability and recovery capabilities. xref:deploy:redpanda/manual/high-availability.adoc[High availability] actively protects your day-to-day operations, handling reads and writes seamlessly during node or availability zone failures within a region. Shadowing is your safety net for catastrophic regional disasters. While xref:deploy:redpanda/manual/disaster-recovery/whole-cluster-restore.adoc[Whole Cluster Restore] provides point-in-time recovery from xref:manage:tiered-storage.adoc[Tiered Storage], Shadowing delivers near real-time, cross-region replication for mission-critical applications that require rapid failover with minimal data loss. | ||
|
|
||
| // TODO: insert diagram. Possibly with a .gif animation showing cluster Cluster A being written and cluster B being replicated with a data flow arrow and geo-separation. Diagram must show Icons or labels for topics, configurations, offsets, ACLs, schemas that are being copied | ||
|
|
||
| == Limitations | ||
|
|
||
| Shadowing is designed for active-passive disaster recovery scenarios. Each shadow cluster can maintain only one shadow link. | ||
|
|
||
| Shadowing operates exclusively in asynchronous mode and doesn't support active-active configurations. This means there will always be some replication lag. You cannot write to both clusters simultaneously. | ||
|
|
||
| xref:develop:data-transforms/index.adoc[Data transforms] are disabled on shadow clusters while Shadowing is active. During a disaster, xref:manage:audit-logging.adoc[audit log] history from the source cluster is lost, though the shadow cluster begins generating new audit logs immediately after the failover. | ||
|
|
||
| After you failover shadow topics, automatic fallback to the original source cluster is not supported. | ||
|
|
||
| [CAUTION] | ||
| ==== | ||
| Do not modify synced topic properties on shadow topics. These properties revert to source topic values. | ||
| ==== | ||
|
|
||
| == Setup and Configuration | ||
|
|
||
| Choose your implementation approach: | ||
|
|
||
| * **xref:./setup.adoc[Setup and Configuration]** - Initial shadow configuration, authentication, and topic selection | ||
| * **xref:./monitor.adoc[Monitoring and Operations]** - Health checks, lag monitoring, and operational procedures | ||
| * **xref:./failover.adoc[Planned Failover]** - Controlled disaster recovery testing and migrations | ||
| * **xref:./disaster-response.adoc[Disaster response procedures]** - Rapid disaster response procedures | ||
|
|
||
| == Disaster readiness checklist | ||
|
|
||
| Before a disaster occurs, ensure you have: | ||
|
|
||
| * [ ] Access to shadow cluster administrative credentials | ||
| * [ ] Shadow link names and configuration details, and networking documented | ||
| * [ ] Application connection strings for the shadow cluster prepared | ||
| * [ ] Tested failover procedures in a non-production environment | ||
|
|
||
| == Next steps | ||
|
|
||
| After setting up Shadowing for your Redpanda clusters, consider these additional steps: | ||
|
|
||
| * **Test your disaster recovery procedures**: Regularly practice failover scenarios in a non-production environment. See xref:./disaster-response.adoc[Disaster response procedures] for step-by-step disaster procedures. | ||
|
|
||
| * **Monitor shadow link health**: Set up alerting on the metrics described above to ensure early detection of replication issues. | ||
|
|
||
| * **Implement automated failover**: Consider developing automation scripts that can detect outages and initiate failover based on predefined criteria. | ||
|
|
||
| * **Review security policies**: Ensure your ACL filters replicate the appropriate security settings for your disaster recovery environment. | ||
|
|
||
| * **Document your configuration**: Maintain up-to-date documentation of your shadow link configuration, including network settings, authentication details, and filter definitions. |
6 changes: 6 additions & 0 deletions
6
modules/deploy/pages/redpanda/manual/disaster-recovery/shadowing/index.adoc
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| = Shadowing | ||
| :description: Set up disaster recovery for Redpanda clusters using Shadowing for cross-region replication. | ||
| :env-linux: true | ||
| :page-layout: index | ||
| :page-categories: Management, High Availability, Disaster Recovery | ||
|
|
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.