Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
213 changes: 142 additions & 71 deletions operator.md
Original file line number Diff line number Diff line change
@@ -1,120 +1,191 @@
# AWS Aurora PostgreSQL

Amazon Aurora (Aurora) is a fully managed relational database engine that's compatible with PostgreSQL. You already know how PostgreSQL combines the speed and reliability of high-end commercial databases with the simplicity and cost-effectiveness of open-source databases. The code, tools, and applications you use today with your existing PostgreSQL databases can be used with Aurora. With some workloads, Aurora can deliver up to three times the throughput of PostgreSQL without requiring changes to most of your existing applications.
Amazon Aurora is a fully managed relational database engine that's compatible with PostgreSQL. This runbook will guide you through connecting to your Aurora PostgreSQL cluster, troubleshooting common issues, and monitoring your database's performance.

Aurora includes a high-performance storage subsystem. Its PostgreSQL-compatible database engines are customized to take advantage of that fast distributed storage. The underlying storage grows automatically as needed. An Aurora cluster volume can grow to a maximum size of 128 tebibytes (TiB). Aurora also automates and standardizes database clustering and replication, which are typically among the most challenging aspects of database configuration and administration.
## Connecting to Your Database

Aurora is part of the managed database service Amazon Relational Database Service (Amazon RDS). Amazon RDS is a web service that makes it easier to set up, operate, and scale a relational database in the cloud.
### Connect via AWS CLI
Check the status and endpoint of your Aurora cluster:

## Design Decisions
```sh
aws rds describe-db-clusters --query "DBClusters[?DBClusterIdentifier=='<cluster_identifier>'].[Status, Endpoint, ReaderEndpoint]" --output table
```

* Aurora Clusters can only be provisioned on internal or private subnets.
* A KMS key is created for encryption and retained after cluster deletion.
* Tags are copied to snapshots.
* Daily snapshots are configured.
* Root username and password are automatically generated to reduce exposure.
* Username is generated when not being restored from snapshot, otherwise it will use the snapshots username [note](https://github.com/hashicorp/terraform-provider-aws/pull/9505/files#diff-9d869fc908da636b09ac45e62cd373de7223e04ab7a2279385d6ea31004fcbacR92)
* Password is reset on snapshot restore
* No schema is created by default.
* No blue/green support as it is not supported for [PostgreSQL](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/blue-green-deployments-overview.html) yes.
* Instances AZs are auto-assigned by AWS
* 2 artifacts, one for the writer, one for the readers. If no readers the writer will be present here so you can
* For applications that dont use load balanced reader, the writer endpoint can be read from
* Minimum retention period for backups is 1 day, as they [cannot be disabled in Aurora](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.Managing.Backups.html)
> Expect to see the status of the cluster along with the primary and reader endpoints.

## Runbook
### Connect via PostgreSQL
Use the following command to connect to your PostgreSQL database:

```sh
psql -h <host> -U <username> -d <database>
```

Replace `<host>`, `<username>`, and `<database>` with your database's details.

## Troubleshooting Common Issues

### Connection Issues

If unable to connect to the Aurora PostgreSQL cluster:
1. **Check Cluster Status**: Use the AWS CLI to check the cluster's status and ensure it is available:

```sh
aws rds describe-db-clusters --query "DBClusters[?DBClusterIdentifier=='<cluster_identifier>'].[Status, Endpoint, ReaderEndpoint]" --output table
```

Check the cluster's current status and endpoint information:
2. **Verify Security Groups**: Ensure that the correct ingress rules are configured in the security group:

```sh
aws rds describe-db-clusters --query "DBClusters[?DBClusterIdentifier=='<cluster_identifier>'].[Status, Endpoint, ReaderEndpoint]" --output table
```
```sh
aws ec2 describe-security-groups --group-ids <security_group_id> --query "SecurityGroups[*].[GroupId, IpPermissions]" --output table
```

> Expect to see the status of the cluster along with the primary and reader endpoints.
3. **Monitor Active Connections**: In PostgreSQL, check all active connections to see if the database is overloaded:

Verify the security group rules to ensure proper ingress rules are set up:
```sql
SELECT pid, usename, datname, client_addr, application_name, state
FROM pg_stat_activity;
```

```sh
aws ec2 describe-security-groups --group-ids <security_group_id> --query "SecurityGroups[*].[GroupId, IpPermissions]" --output table
```
### High Latency or Slow Queries

> Confirm that the ingress rules allow traffic from your IP or subnet.
1. **Identify Slow Queries**: Use the following query to identify long-running queries:

### High Latency Queries
```sql
SELECT query, state, waiting, query_start
FROM pg_stat_activity
WHERE state <> 'idle'
ORDER BY query_start DESC;
```

If queries are running slow, use the following commands to identify problematic queries:
2. **Enable Slow Query Logging**: Log queries that take longer than a specified threshold (1000ms in this example):

Connect to your PostgreSQL instance and check for slow queries:
```sql
ALTER SYSTEM SET log_min_duration_statement = 1000; -- Logs queries taking more than 1000ms
SELECT pg_reload_conf();
```

```sql
SELECT query, state, waiting, query_start
FROM pg_stat_activity
WHERE state <> 'idle'
ORDER BY query_start DESC;
```
3. **Analyze Query Performance**: Analyze a specific table to update statistics for better query performance:

> Look for queries that have been running for a long time and investigate their execution plans.
```sql
ANALYZE VERBOSE <table_name>;
```

Enable and review PostgreSQL's slow query log:
### Deadlock & Blocking Issues

```sql
ALTER SYSTEM SET log_min_duration_statement = 1000; -- Logs queries that take longer than 1000ms
SELECT pg_reload_conf();
```
1. **Check for Deadlocks**: Use this query to identify any deadlocks in the database:

```sql
SELECT locktype, relation::regclass, mode, granted, pid, usename, application_name
FROM pg_locks
WHERE NOT granted;
```

2. **Identify Blocking Queries**: Find queries that are blocking other queries:

```sql
SELECT blocked_locks.pid AS blocked_pid, blocked_activity.usename AS blocked_user, blocking_locks.pid AS blocking_pid, blocking_activity.usename AS blocking_user, blocked_activity.query AS blocked_query, blocking_activity.query AS current_query
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype
JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;
```

> This will log slow queries to help identify and optimize them.
## Monitoring & Backup Management

### Backup Verification

Ensure your backups are being created and managed as expected.
1. **List Snapshots**: Use this AWS CLI command to check for available snapshots:

List the available snapshots for your Aurora PostgreSQL cluster:
```sh
aws rds describe-db-cluster-snapshots --db-cluster-identifier <cluster_identifier> --query "DBClusterSnapshots[].[DBClusterSnapshotIdentifier, SnapshotCreateTime]" --output table
```

```sh
aws rds describe-db-cluster-snapshots --db-cluster-identifier <cluster_identifier> --query "DBClusterSnapshots[].[DBClusterSnapshotIdentifier, SnapshotCreateTime]" --output table
```
2. **Verify Retention Policy**: Check your backup retention settings to ensure backups are kept as per your policy:

> Verify that snapshots are created according to your backup policy.
```sh
aws rds describe-db-clusters --db-cluster-identifier <cluster_identifier> --query "DBClusters[0].[BackupRetentionPeriod]" --output table
```

Check backup retention settings:
### Disk Space Usage

```sh
aws rds describe-db-clusters --db-cluster-identifier <cluster_identifier> --query "DBClusters[0].[BackupRetentionPeriod]" --output table
1. **Monitor Free Storage Space**: Use CloudWatch to monitor disk space usage for your Aurora PostgreSQL cluster:

```sh
aws cloudwatch get-metric-statistics --namespace "AWS/RDS" --metric-name "FreeStorageSpace" --dimensions Name=DBClusterIdentifier,Value=<cluster_identifier> --statistics Average --period 300 --start-time $(date -u -d '1 hour ago' +"%Y-%m-%dT%H:%M:%SZ") --end-time $(date -u +"%Y-%m-%dT%H:%M:%SZ")
```

2. **Reclaim Disk Space**: Reclaim disk space in your PostgreSQL database by running the following commands:

```sql
VACUUM;
VACUUM FULL; -- This might lock tables, use it during maintenance windows
REINDEX DATABASE your_database_name;
```

### Monitor Storage Usage by Tables

Use this query to check the disk usage for each table in your database:

```sql
SELECT relname AS "Table", pg_size_pretty(pg_total_relation_size(relid)) AS "Size"
FROM pg_catalog.pg_statio_user_tables
ORDER BY pg_total_relation_size(relid) DESC;
```

> Ensure that the retention period is set according to your organization's policy.
## Advanced Monitoring

### Disk Space Usage
### Check Replication Status

Monitor and manage the disk space usage for your Aurora PostgreSQL cluster.
Ensure that your Aurora PostgreSQL cluster's replication is healthy by running this query:

Check the current disk space usage metrics:
```sql
SELECT client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn
FROM pg_stat_replication;
```

```sh
aws cloudwatch get-metric-statistics --namespace "AWS/RDS" --metric-name "FreeStorageSpace" --dimensions Name=DBClusterIdentifier,Value=<cluster_identifier> --statistics Average --period 300 --start-time $(date -u -d '1 hour ago' +"%Y-%m-%dT%H:%M:%SZ") --end-time $(date -u +"%Y-%m-%dT%H:%M:%SZ")
### Monitor WAL (Write-Ahead Logging) Statistics

Track the statistics of Write-Ahead Logging (WAL) with this query:

```sql
SELECT * FROM pg_stat_wal;
```

> Monitor the free storage space to ensure you do not run out of disk space.
### Autovacuum Status

Reclaiming disk space in PostgreSQL:
Monitor autovacuum to ensure dead tuples are being cleaned up regularly:

```sql
VACUUM;
VACUUM FULL; -- This might lock tables, use it during maintenance windows
REINDEX DATABASE your_database_name;
SELECT relname, last_autovacuum, n_dead_tup
FROM pg_stat_user_tables
WHERE last_autovacuum IS NOT NULL;
```

> Regular maintenance tasks like vacuum and reindex help to reclaim space and improve performance.
---

## Design Decisions

* Aurora Clusters can only be provisioned on internal or private subnets.
* A KMS key is created for encryption and retained after cluster deletion.
* Tags are copied to snapshots.
* Daily snapshots are configured.
* Root username and password are automatically generated to reduce exposure.
* Username is generated when not being restored from snapshot, otherwise it will use the snapshots username [note](https://github.com/hashicorp/terraform-provider-aws/pull/9505/files#diff-9d869fc908da636b09ac45e62cd373de7223e04ab7a2279385d6ea31004fcbacR92)
* Password is reset on snapshot restore
* No schema is created by default.
* No blue/green support as it is not supported for [PostgreSQL](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/blue-green-deployments-overview.html) yes.
* Instances AZs are auto-assigned by AWS
* 2 artifacts, one for the writer, one for the readers. If no readers the writer will be present here so you can
* For applications that dont use load balanced reader, the writer endpoint can be read from
* Minimum retention period for backups is 1 day, as they [cannot be disabled in Aurora](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.Managing.Backups.html)

---

## Additional Resources

- [AWS Aurora Postgres User Guide](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.AuroraPostgreSQL.html)
- [AWS Aurora User Guide](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.Overview.html)
- [PostgreSQL Documentation](https://www.postgresql.org/docs/)

## Links
---

* [AWS Aurora Postgres User Guide](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.AuroraPostgreSQL.html)
* [AWS Aurora User Guide](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.Overview.html)
* [AWS Aurora Serverless v2 Guide](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-serverless-v2.html)
* [TLS w/ Serverless v2](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-serverless-v2-administration.html#aurora-serverless-v2.tls)
Loading