Skip to content

Commit f4fa3ee

Browse files
authored
Update README.md
1 parent 54e8451 commit f4fa3ee

File tree

1 file changed

+8
-7
lines changed

1 file changed

+8
-7
lines changed

0x19-postmortem/README.md

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
1-
iPostmortem: E-commerce Website Checkout Outage (Invented Scenario)
2-
Issue Summary
3-
Duration: Thursday, May 9, 2024, 17:30 PST - Friday, May 10, 2024, 01:00 PST (7.5 hours)
1+
# Postmortem: E-commerce Website Checkout Outage (Invented Scenario)
2+
## Issue Summary
3+
### Duration: Thursday, May 9, 2024, 17:30 PST - Friday, May 10, 2024, 01:00 PST (7.5 hours)
44
Impact: The e-commerce website's checkout process was unavailable during the outage window. Users attempting to purchase items encountered errors and were unable to complete their transactions. This is estimated to have impacted approximately 20% of website visitors during peak evening hours, resulting in lost sales and customer frustration.
5-
Root Cause: A database connection pool configuration error led to an unexpected database connection spike, overwhelming the database server and causing it to crash.
5+
6+
### Root Cause: A database connection pool configuration error led to an unexpected database connection spike, overwhelming the database server and causing it to crash.
67
Timeline
78
17:30 PST: Monitoring alerts indicated a significant increase in database connection errors on the e-commerce website.
89
17:35 PST: The engineering team investigated the alerts and observed a surge in failed checkout transactions.
@@ -13,16 +14,16 @@ Timeline
1314
20:00 PST: A temporary fix was implemented by adjusting the connection pool size to a more appropriate value. The database server recovered and checkout functionality was restored.
1415
23:00 PST: A permanent configuration change was implemented to prevent future occurrences.
1516
01:00 PST: The engineering team confirmed system stability and cleared the incident.
16-
Root Cause and Resolution
17+
### Root Cause and Resolution
1718
The root cause of the outage was a misconfigured database connection pool. The pool was allowing a much higher number of concurrent connections than the database server could handle effectively. This surge in connections overloaded the database server, causing it to crash and become unresponsive.
1819
The issue was resolved by:
1920
Temporary Fix (20:00 PST): Adjusting the connection pool size to a lower value allowed the database server to recover and resume operations.
2021
Permanent Fix (23:00 PST): A permanent configuration change was implemented to set the connection pool size to a more appropriate value based on expected load. This will prevent future occurrences of connection overload.
21-
Corrective and Preventative Measures
22+
### Corrective and Preventative Measures
2223
Review and update database configuration documentation: The connection pool configuration error highlights the need for thorough documentation and regular review of critical system configurations.
2324
Implement automated monitoring: Automated monitoring for database connection pool metrics and server health can provide early warnings of potential issues before they cause outages.
2425
Database performance tuning: The database server should be regularly analyzed and optimized to ensure it can handle expected traffic volume.
2526
Conduct code reviews: Code review processes should emphasize infrastructure configuration management to identify potential issues before deployment.
26-
This postmortem outlines the cause and resolution of the e-commerce website checkout outage. By implementing the corrective and preventative measures outlined above, we aim to improve system resiliency and prevent similar incidents in the future.
27+
#### This postmortem outlines the cause and resolution of the e-commerce website checkout outage. By implementing the corrective and preventative measures outlined above, we aim to improve system resiliency and prevent similar incidents in the future.
2728

2829

0 commit comments

Comments
 (0)