When server access logging is enabled for a frequently accessed bucket, a large number of log files are generated per day. Due to the small size and large quantity of these files, it is not suitable to transition them into "cheaper" storage classes like Standard-IA or Glacier.
The module deploys a serverless app to reduce the storage cost of these access log files by compressing multiple files into one tarball. Compressed files are ~8% of original size, and have larger size and smaller quantity, making them eligible for cheaper storage classes and save even more costs.
The app is designed to run in a multi-account setup. The app assumes IAM roles in other accounts to read/write S3 buckets. The producer lists files from buckets and put tasks into an SQS queue, and the queue invokes workers to do the tasks.
It is suggested to run a copy of app in each region, as S3 charges for inter-region traffic.
module "s3-access-log-roller" {
source = "Samsung/s3-rollup/aws"
slug = "us-west-2"
maximum_concurrency = 10
memory_size = 1024
# Allow the app to assume these roles
s3_access_roles = [
"arn:aws:iam::111111111111:role/s3-rollup-bucket-access", # dev
"arn:aws:iam::222222222222:role/s3-rollup-bucket-access", # staging
"arn:aws:iam::333333333333:role/s3-rollup-bucket-access", # prod
]
# Run the app daily
enable_eventbridge_schedule = true
eventbridge_invocation_payload = {
dev = jsonencode({
s3_role = "arn:aws:iam::111111111111:role/s3-rollup-bucket-access"
prefixes = [
"s3://dev-logs/s3/",
]
})
staging = jsonencode({
s3_role = "arn:aws:iam::222222222222:role/s3-rollup-bucket-access"
prefixes = [
"s3://staging-logs/s3/",
]
})
prod = jsonencode({
s3_role = "arn:aws:iam::333333333333:role/s3-rollup-bucket-access"
prefixes = [
"s3://prod-logs/s3/",
]
})
}
}
If there are already a large number of files in the logging buckets, it is suggested to set enable_eventbridge_schedule = false
and run the producer manually to process all the existing files first, as Lambda has a maximum timeout of 15 minutes. After the backlog is cleared, enable EventBridge to run the app daily.
Name | Version |
---|---|
terraform | >= 1.0 |
archive | >= 2.0 |
aws | >= 4.0 |
Name | Version |
---|---|
archive | >= 2.0 |
aws | >= 4.0 |
No modules.
Name | Type |
---|---|
aws_iam_role.main | resource |
aws_iam_role_policy.main | resource |
aws_lambda_event_source_mapping.main | resource |
aws_lambda_function.main | resource |
aws_scheduler_schedule.main | resource |
aws_sqs_queue.dlq | resource |
aws_sqs_queue.main | resource |
archive_file.lambda | data source |
aws_iam_policy_document.assume_role | data source |
aws_iam_policy_document.main | data source |
aws_region.current | data source |
Name | Description | Type | Default | Required |
---|---|---|---|---|
enable_eventbridge_schedule | If true, producer Lambda scans S3 prefixes every day to make tasks. | bool |
true |
no |
ephemeral_storage_size | Lambda function ephemeral storage size in MiB. | number |
10240 |
no |
eventbridge_invocation_payload | Map of account alias => JSON payload to pass to Lambda function by EventBridge. | map(string) |
{} |
no |
maximum_concurrency | How many Lambda function instances can be launched concurrently by SQS. | number |
10 |
no |
memory_size | Lambda function memory size in MiB. | number |
1024 |
no |
runtime | Labmda function runtime. See: https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtimes.html | string |
"python3.8" |
no |
s3_access_roles | ARNs of IAM roles that the function assume to read/write S3 buckets. These child roles can only be created after the parent role is created. | list(string) |
n/a | yes |
security_group_ids | IDs of security groups to attach to Lambda function. Only valid if var.subnet_ids is not null. | list(string) |
null |
no |
slug | Used for naming resources. | string |
n/a | yes |
sqs_message_retention_seconds | How long the message stays in queue before being purged. | number |
1209600 |
no |
sqs_visibility_timeout | How long the message stays invisible when it has been received. Must be greater than Lambda timeout. | number |
900 |
no |
subnet_ids | IDs of subnets to run Lambda function in. If null, Lambda runs in Amazon-managed VPC. | list(string) |
null |
no |
timeout | Lambda function timeout in seconds. | number |
600 |
no |
Name | Description |
---|---|
iam_role_arn | IAM role ARN of the app. This is needed for setting up bucket access roles. |
The module is maintained by Zhuoyun Wei from Samsung Research Canada.