-
Notifications
You must be signed in to change notification settings - Fork 991
New serverless pattern - lambda-SQS-best-Practices-CDK #2733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 13 commits
13a7709
1fdbff3
4396ea5
4e05ab5
9e9583e
3b866cb
4307f14
4bfa260
4ef3fd0
60d3472
b6dd2ed
b459017
5a78339
dc2a322
8919c6b
101de8c
824b01d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,207 @@ | ||
# Lambda SQS Best Practices with AWS CDK | ||
|
||
This pattern demonstrates a production-ready implementation of AWS Lambda processing messages from Amazon SQS using AWS CDK. It serves as a reference architecture for building robust, observable, and maintainable serverless applications, featuring AWS Lambda Powertools integration for enhanced observability through structured logging, custom metrics, and distributed tracing with X-Ray. The pattern implements comprehensive error handling with automatic retries and Dead Letter Queue (DLQ) configuration, along with a detailed CloudWatch Dashboard for operational monitoring. Security is enforced through least privilege IAM roles, while operational excellence is maintained through proper resource configurations and cost optimizations. This enterprise-grade solution includes batch message processing, configurable timeouts, message validation, and a complete monitoring strategy, making it ideal for teams building production serverless applications that require high reliability, observability, and maintainability. | ||
|
||
|
||
<img src="./resources/Lambda-SQS-Best-Practice.png" alt="Architecture" width="100%"/> | ||
|
||
Important: this application uses various AWS services and there are costs associated with these services after the Free Tier usage - please see the [AWS Pricing page](https://aws.amazon.com/pricing/) for details. You are responsible for any AWS costs incurred. No warranty is implied in this example. | ||
|
||
## Requirements | ||
|
||
* [Create an AWS account](https://portal.aws.amazon.com/gp/aws/developer/registration/index.html) if you do not already have one and log in. The IAM user that you use must have sufficient permissions to make necessary AWS service calls and manage AWS resources. | ||
* [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html) installed and configured | ||
* [Git Installed](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git) | ||
* [Node.js 20 or greater](https://nodejs.org/en/download/) installed | ||
* [AWS CDK](https://docs.aws.amazon.com/cdk/v2/guide/getting_started.html) installed | ||
|
||
## Deployment Instructions | ||
|
||
1. Create a new directory, navigate to that directory in a terminal and clone the GitHub repository: | ||
``` | ||
git clone https://github.com/aws-samples/serverless-patterns | ||
``` | ||
1. Change directory to the pattern directory: | ||
``` | ||
cd serverless-patterns/lambda-sqs-best-practices-cdk | ||
``` | ||
|
||
1. Install cdk dependencies | ||
``` | ||
npm install | ||
``` | ||
|
||
1. Install lambda dependencies | ||
``` | ||
cd lambda | ||
npm install | ||
``` | ||
|
||
1. Deploy cdk stack | ||
``` | ||
cd .. | ||
cdk deploy | ||
|
||
``` | ||
|
||
Note: If you are using CDK for the first time then bootstrap CDK in your account by using below command: | ||
|
||
``` | ||
cdk bootstrap aws://ACCOUNT-NUMBER-1/REGION-1 | ||
|
||
``` | ||
|
||
## How it works | ||
|
||
This pattern sets up: | ||
|
||
1. An SQS queue with a Dead Letter Queue (DLQ) for failed message handling | ||
2. A Lambda function with: | ||
- AWS Lambda Powertools integration | ||
- Structured logging | ||
- Custom metrics | ||
- X-Ray tracing | ||
3. A CloudWatch Dashboard for operational monitoring | ||
4. Least priviledge permissions implemented on roles and policies | ||
<img src="./resources/Least-priviledge.png" alt="Architecture" width="100%"/> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the purpose is to showcase the least privilege, then please ask users to navigate to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added |
||
[ ensured by implemeting individual inline policies with only required permissions added to role ] | ||
|
||
|
||
The Lambda function: | ||
- Processes messages in batches | ||
- Validates message format | ||
- Simulates downstream API calls with random failures (5% failure rate) | ||
- Demonstrates handling of external service dependencies | ||
- Handles errors gracefully | ||
- Reports metrics and traces | ||
- Uses structured logging | ||
|
||
Failed messages are: | ||
- Logged with error details | ||
- Sent to DLQ after 3 retries | ||
- Monitored via CloudWatch metrics | ||
|
||
## Testing | ||
|
||
The pattern includes a load testing script to verify functionality: | ||
|
||
1. Set the Queue URL environment variable: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Consider adding print at the end echo $QUEUE_URL There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ideally this script requires us to export environment variables before running the script else prompt us to set valid value. However, sure I can add console.log() for each value inside the script as well There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added console.log which prints the Source Queue URL, DLQ URL and Region from the script |
||
``` | ||
export QUEUE_URL=$(aws cloudformation describe-stacks --stack-name LambdaSqsBestPracticesCdkStack --query 'Stacks[0].Outputs[?OutputKey==`QueueUrl`].OutputValue' --output text) | ||
|
||
export DLQ_URL=$(aws cloudformation describe-stacks --stack-name LambdaSqsBestPracticesCdkStack --query 'Stacks[0].Outputs[?OutputKey==`DlqUrl`].OutputValue' --output text) | ||
|
||
export AWS_REGION=us-east-1 # or your AWS region | ||
``` | ||
|
||
2. Rum test script | ||
Success Scenario | ||
``` | ||
npm run test:success | ||
|
||
``` | ||
Sample result | ||
<img src="./resources/Success-script-sample.png" alt="Architecture" width="100%"/> | ||
|
||
Refer Dashboard to verify all the Messages are processed successfully and no messages in DLQ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As discussed, we are also simulating external API failure from Inside the code. |
||
<img src="./resources/All-messages-processed.png" alt="Architecture" width="100%"/> | ||
|
||
Also refer DLQ count on dashboard | ||
<img src="./resources/No-messages-sent-to-DLQ.png" alt="Architecture" width="100%"/> | ||
|
||
Failure Scenario | ||
``` | ||
npm run test:dlq | ||
``` | ||
Sample result | ||
<img src="./resources/DLQ-Script-smaple-processing.png" alt="Architecture" width="100%"/> | ||
|
||
Verify the same using dashboard | ||
<img src="./resources/dashboard-mesage-processing.png" alt="Architecture" width="100%"/> | ||
|
||
<img src="./resources/dashboard-mesage-processing-2.png" alt="Architecture" width="100%"/> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is the difference between these 2 snapshots? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have tried to showcase what happened to complete batch and retries. The i have mentioned the same using receive count There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please refer dashboard 2, which shows the external API failure message was processed successfully and remaining 10 were retried 2 more times and post hitting receive count to 3 the 10 messages moved to DLQ |
||
|
||
Additionally, confirm the messages in DLQ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Messages won't show on the screen, until you poll the messages. I see it is highlighted in the snapshot, worth to mention. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. noted There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added |
||
<img src="./resources/DLQ-in-messaging.png" alt="Architecture" width="100%"/> | ||
|
||
Note: Refer Monitoring guide to locate “SQS-Processing-Dashboard” | ||
|
||
## Monitoring Guide | ||
|
||
Locating Resources | ||
|
||
``` | ||
1. Navigate to AWS CloudFormation Console | ||
2. Select the stack "LambdaSqsBestPracticesCdkStack" | ||
3. Go to the "Resources" tab | ||
4. Here you can find: | ||
- All resources created by the stack | ||
- Direct links to each resource's console | ||
- Resource physical IDs and types | ||
- Current status of each resource | ||
``` | ||
|
||
CloudWatch Logs | ||
|
||
``` | ||
1. Navigate to CloudWatch Console > Log Groups | ||
2. Find /aws/lambda/BatchProcessingLambdaFunction | ||
3. View structured logs with: | ||
* Batch processing information | ||
* Error details | ||
``` | ||
|
||
Example DeepDive walkthrough on structured logging for a batch : | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ia there a way you can show the life cycle of the message by taking a messageID? and then take it all the way to trace? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have highlighted the Message ID and type of retry. The external API failure was successfully processed on retry. where the invalid message ( poison pill ) will be sent to DLQ. I will try to include complete retry cycle ( all 3 invokes - original and 2 retries ) and message in DLQ for poison pill message. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have showed this for a poison pill record from initial invoke, retry 1, retry 2 and message sent to DLQ |
||
1. Batch information before starting processing | ||
<img src="./resources/batch-info.png" alt="Architecture" width="100%"/> | ||
|
||
2. Success information | ||
<img src="./resources/Success-info.png" alt="Architecture" width="100%"/> | ||
|
||
3. Error information of failure | ||
<img src="./resources/Error-info.png" alt="Architecture" width="100%"/> | ||
|
||
4. Batch processing info | ||
<img src="./resources/Batch-processing-info.png" alt="Architecture" width="100%"/> | ||
|
||
5. Failed items returned back to queue for reprocessing | ||
<img src="./resources/Failed-items.png" alt="Architecture" width="100%"/> | ||
|
||
6. Failed Item retried [note messageID and time for retry] | ||
<img src="./resources/Failed-item-retry.png" alt="Architecture" width="100%"/> | ||
|
||
7. Additionally, in case of failed retries/poison pill | ||
<img src="./resources/poison-pill.png" alt="Architecture" width="100%"/> | ||
|
||
Message in moved to DLQ | ||
<img src="./resources/message-in-DLQ.png" alt="Architecture" width="100%"/> | ||
|
||
Custom tracing can be used as well to get quick information on batch processing | ||
<img src="./resources/trace-info.png" alt="Architecture" width="100%"/> | ||
|
||
|
||
Metrics Dashboard | ||
|
||
``` | ||
1. Go to CloudWatch > Dashboards | ||
2. Find the dashboard “SQS-Processing-Dashboard” | ||
3. Monitor: | ||
* Message processing success rate | ||
* Batch size and processing time | ||
* Error rates | ||
* Monitor Queue metrics to understand Source queue depth, processing speed of messages in queue and DLQ message count | ||
* Lambda performance including duration | ||
|
||
``` | ||
|
||
|
||
<img src="./resources/SQS_operational_dashboard.png" alt="Architecture" width="100%"/> | ||
|
||
## Cleanup | ||
|
||
To remove all deployed resources: | ||
|
||
``` | ||
cdk destroy | ||
``` | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
#!/usr/bin/env node | ||
const cdk = require('aws-cdk-lib'); | ||
const { LambdaSqsBestPracticesCdkStack } = require('../lib/lambda-sqs-best-practices-cdk-stack'); | ||
|
||
const app = new cdk.App(); | ||
new LambdaSqsBestPracticesCdkStack(app, 'LambdaSqsBestPracticesCdkStack', {}); |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,88 @@ | ||
{ | ||
"app": "node bin/lambda-sqs-best-practices-cdk.js", | ||
"watch": { | ||
"include": [ | ||
"**" | ||
], | ||
"exclude": [ | ||
"README.md", | ||
"cdk*.json", | ||
"jest.config.js", | ||
"package*.json", | ||
"yarn.lock", | ||
"node_modules", | ||
"test" | ||
] | ||
}, | ||
"context": { | ||
"@aws-cdk/aws-lambda:recognizeLayerVersion": true, | ||
"@aws-cdk/core:checkSecretUsage": true, | ||
"@aws-cdk/core:target-partitions": [ | ||
"aws", | ||
"aws-cn" | ||
], | ||
"@aws-cdk-containers/ecs-service-extensions:enableDefaultLogDriver": true, | ||
"@aws-cdk/aws-ec2:uniqueImdsv2TemplateName": true, | ||
"@aws-cdk/aws-ecs:arnFormatIncludesClusterName": true, | ||
"@aws-cdk/aws-iam:minimizePolicies": true, | ||
"@aws-cdk/core:validateSnapshotRemovalPolicy": true, | ||
"@aws-cdk/aws-codepipeline:crossAccountKeyAliasStackSafeResourceName": true, | ||
"@aws-cdk/aws-s3:createDefaultLoggingPolicy": true, | ||
"@aws-cdk/aws-sns-subscriptions:restrictSqsDescryption": true, | ||
"@aws-cdk/aws-apigateway:disableCloudWatchRole": true, | ||
"@aws-cdk/core:enablePartitionLiterals": true, | ||
"@aws-cdk/aws-events:eventsTargetQueueSameAccount": true, | ||
"@aws-cdk/aws-ecs:disableExplicitDeploymentControllerForCircuitBreaker": true, | ||
"@aws-cdk/aws-iam:importedRoleStackSafeDefaultPolicyName": true, | ||
"@aws-cdk/aws-s3:serverAccessLogsUseBucketPolicy": true, | ||
"@aws-cdk/aws-route53-patters:useCertificate": true, | ||
"@aws-cdk/customresources:installLatestAwsSdkDefault": false, | ||
"@aws-cdk/aws-rds:databaseProxyUniqueResourceName": true, | ||
"@aws-cdk/aws-codedeploy:removeAlarmsFromDeploymentGroup": true, | ||
"@aws-cdk/aws-apigateway:authorizerChangeDeploymentLogicalId": true, | ||
"@aws-cdk/aws-ec2:launchTemplateDefaultUserData": true, | ||
"@aws-cdk/aws-secretsmanager:useAttachedSecretResourcePolicyForSecretTargetAttachments": true, | ||
"@aws-cdk/aws-redshift:columnId": true, | ||
"@aws-cdk/aws-stepfunctions-tasks:enableEmrServicePolicyV2": true, | ||
"@aws-cdk/aws-ec2:restrictDefaultSecurityGroup": true, | ||
"@aws-cdk/aws-apigateway:requestValidatorUniqueId": true, | ||
"@aws-cdk/aws-kms:aliasNameRef": true, | ||
"@aws-cdk/aws-autoscaling:generateLaunchTemplateInsteadOfLaunchConfig": true, | ||
"@aws-cdk/core:includePrefixInUniqueNameGeneration": true, | ||
"@aws-cdk/aws-efs:denyAnonymousAccess": true, | ||
"@aws-cdk/aws-opensearchservice:enableOpensearchMultiAzWithStandby": true, | ||
"@aws-cdk/aws-lambda-nodejs:useLatestRuntimeVersion": true, | ||
"@aws-cdk/aws-efs:mountTargetOrderInsensitiveLogicalId": true, | ||
"@aws-cdk/aws-rds:auroraClusterChangeScopeOfInstanceParameterGroupWithEachParameters": true, | ||
"@aws-cdk/aws-appsync:useArnForSourceApiAssociationIdentifier": true, | ||
"@aws-cdk/aws-rds:preventRenderingDeprecatedCredentials": true, | ||
"@aws-cdk/aws-codepipeline-actions:useNewDefaultBranchForCodeCommitSource": true, | ||
"@aws-cdk/aws-cloudwatch-actions:changeLambdaPermissionLogicalIdForLambdaAction": true, | ||
"@aws-cdk/aws-codepipeline:crossAccountKeysDefaultValueToFalse": true, | ||
"@aws-cdk/aws-codepipeline:defaultPipelineTypeToV2": true, | ||
"@aws-cdk/aws-kms:reduceCrossAccountRegionPolicyScope": true, | ||
"@aws-cdk/aws-eks:nodegroupNameAttribute": true, | ||
"@aws-cdk/aws-ec2:ebsDefaultGp3Volume": true, | ||
"@aws-cdk/aws-ecs:removeDefaultDeploymentAlarm": true, | ||
"@aws-cdk/custom-resources:logApiResponseDataPropertyTrueDefault": false, | ||
"@aws-cdk/aws-s3:keepNotificationInImportedBucket": false, | ||
"@aws-cdk/aws-ecs:enableImdsBlockingDeprecatedFeature": false, | ||
"@aws-cdk/aws-ecs:disableEcsImdsBlocking": true, | ||
"@aws-cdk/aws-ecs:reduceEc2FargateCloudWatchPermissions": true, | ||
"@aws-cdk/aws-dynamodb:resourcePolicyPerReplica": true, | ||
"@aws-cdk/aws-ec2:ec2SumTImeoutEnabled": true, | ||
"@aws-cdk/aws-appsync:appSyncGraphQLAPIScopeLambdaPermission": true, | ||
"@aws-cdk/aws-rds:setCorrectValueForDatabaseInstanceReadReplicaInstanceResourceId": true, | ||
"@aws-cdk/core:cfnIncludeRejectComplexResourceUpdateCreatePolicyIntrinsics": true, | ||
"@aws-cdk/aws-lambda-nodejs:sdkV3ExcludeSmithyPackages": true, | ||
"@aws-cdk/aws-stepfunctions-tasks:fixRunEcsTaskPolicy": true, | ||
"@aws-cdk/aws-ec2:bastionHostUseAmazonLinux2023ByDefault": true, | ||
"@aws-cdk/aws-route53-targets:userPoolDomainNameMethodWithoutCustomResource": true, | ||
"@aws-cdk/aws-elasticloadbalancingV2:albDualstackWithoutPublicIpv4SecurityGroupRulesDefault": true, | ||
"@aws-cdk/aws-iam:oidcRejectUnauthorizedConnections": true, | ||
"@aws-cdk/core:enableAdditionalMetadataCollection": true, | ||
"@aws-cdk/aws-lambda:createNewPoliciesWithAddToRolePolicy": true, | ||
"@aws-cdk/aws-s3:setUniqueReplicationRoleName": true, | ||
"@aws-cdk/aws-events:requireEventBusPolicySid": true | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid using the production-ready, enterprise-ready, enterprise-grade
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted, will work on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to more generic wordings for pattern description