Skip to content

Commit 0529075

Browse files
committedDec 11, 2024
Added diagrams to overview
1 parent bcb0094 commit 0529075

7 files changed

+18
-4
lines changed
 
62.9 KB
Loading
38.2 KB
Loading
Loading
Loading

‎docs/solutions/ha-architecture.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
1-
# Architecture layout
1+
# Architecture
22

33
As we discussed in the [overview of high availability](high-availability.md), the minimalist approach to a highly-available deployment is to have a three-node PostgreSQL cluster with the cluster management and failover mechanisms, load balancer and a backup / restore solution.
44

5-
The following diagram shows this architecture.
5+
The following diagram shows this architecture with the tools we recommend to use.
66

77
![Architecture of the three-node, single primary PostgreSQL cluster](../_images/diagrams/ha-architecture-patroni.png)
88

‎docs/solutions/ha-measure.md

+6-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,11 @@
11
# Measuring high availability
22

3-
The need for high availability is determined by the business requirements, potential risks, and operational limitations. The level of high availability depends on how much downtime you can bear without negatively impacting your users and how much data loss you can tolerate during the system outage.
3+
The need for high availability is determined by the business requirements, potential risks, and operational limitations (e.g. the more components you add to your infrastructure, the more complex and time-consuming it is to maintain).
4+
5+
The level of high availability depends on the following:
6+
7+
* how much downtime you can bear without negatively impacting your users and
8+
* how much data loss you can tolerate during the system outage.
49

510
The measurement of availability is done by establishing a measurement time frame and dividing it by the time that it was available. This ratio will rarely be one, which is equal to 100% availability. At Percona, we don’t consider a solution to be highly available if it is not at least 99% or two nines available.
611

‎docs/solutions/high-availability.md

+10-1
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ High availability is the ability of the system to operate continuously without t
1717

1818
### How to achieve it?
1919

20-
A short answer is: add redundancy to your deployment, eliminate a single point of failure and have the mechanism to transfer the services from a failed member to the healthy one.
20+
A short answer is: add redundancy to your deployment, eliminate a single point of failure (SPOF) and have the mechanism to transfer the services from a failed member to the healthy one.
2121

2222
For a long answer, let's break it down into steps.
2323

@@ -27,12 +27,16 @@ First, you should have more than one copy of your data. This means, you need to
2727

2828
You typically deploy these instances on separate servers or nodes. An example of such a deployment is the three-instance cluster consisting of one primary and two replica nodes. The replicas receive the data via the replication mechanism.
2929

30+
![Primary-replica setup](../_images/diagrams/ha-overview-replication.png)
31+
3032
PostgreSQL natively supports logical and streaming replication. For high availability we recommend streaming replication as it happens in real time, minimizing the delay between the primary and replica nodes.
3133

3234
#### Step 2. Failover
3335

3436
Next, you may have a situation when a primary node is down or not responding. Reasons for that can be different – from hardware or network issues to software failures, power outages, and scheduled maintenance. In this case, you must have the way to know about it and to transfer the operation from the primary node to one of the secondaries. This process is called failover.
3537

38+
![Failover](../_images/diagrams/ha-overview-failover.png)
39+
3640
You can do a manual failover. It suits for environments where downtime does not impact operations or revenue. However, this requires dedicated personnel and may lead to additional downtime.
3741

3842
Another option is automated failover, which significantly minimizes downtime and is less error-prone than manual one. Automated failover can be accomplished by adding an open-source failover tool to your deployment.
@@ -41,12 +45,17 @@ Another option is automated failover, which significantly minimizes downtime and
4145

4246
Instead of a single node you now have a cluster. How to enable users to connect to the cluster and ensure they always connect to the correct node, especially when the primary node changes? One option is to configure a DNS resolution that resolves the IPs of all cluster nodes. A drawback here is that only the primary node accepts all requests. When your system grows, so does the load and it may lead to overloading the primary node and performance degradation.
4347

48+
![Load-balancer](../_images/diagrams/ha-overview-load-balancer.png)
49+
4450
Another option is to use a load-balancing proxy. Instead of connecting directly to the IP address of the primary node, which can change during a failover, you use a proxy that acts as a single point of entry for the entire cluster. This proxy knows which node is currently the primary and directs all incoming write requests to it. At the same time, it can distribute read requests among the replicas to evenly spread the load and improve performance.
4551

52+
4653
#### Step 4. Backups
4754

4855
Even with replication and failover mechanisms in place, it’s crucial to have regular backups of your data. Backups provide a safety net for catastrophic failures that affect both the primary and replica nodes. While replication ensures data is synchronized across multiple nodes, it does not protect against data corruption, accidental deletions, or malicious attacks that can affect all nodes.
4956

57+
![Backup tool](../_images/diagrams/ha-overview-backup.png)
58+
5059
Having regular backups ensures that you can restore your data to a previous state, preserving data integrity and availability even in the worst-case scenarios. Store your backups in separate, secure locations and regularly test them to ensure that you can quickly and accurately restore them when needed. This additional layer of protection is essential to maintaining continuous operation and minimizing data loss.
5160

5261
As a result, you end up with the following components for a minimalistic highly-available deployment:

0 commit comments

Comments
 (0)