Skip to content

New serverless pattern - lambda-SQS-best-Practices-CDK #2733

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 13 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
207 changes: 207 additions & 0 deletions lambda-sqs-best-practices-cdk/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,207 @@
# Lambda SQS Best Practices with AWS CDK

This pattern demonstrates a production-ready implementation of AWS Lambda processing messages from Amazon SQS using AWS CDK. It serves as a reference architecture for building robust, observable, and maintainable serverless applications, featuring AWS Lambda Powertools integration for enhanced observability through structured logging, custom metrics, and distributed tracing with X-Ray. The pattern implements comprehensive error handling with automatic retries and Dead Letter Queue (DLQ) configuration, along with a detailed CloudWatch Dashboard for operational monitoring. Security is enforced through least privilege IAM roles, while operational excellence is maintained through proper resource configurations and cost optimizations. This enterprise-grade solution includes batch message processing, configurable timeouts, message validation, and a complete monitoring strategy, making it ideal for teams building production serverless applications that require high reliability, observability, and maintainability.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid using the production-ready, enterprise-ready, enterprise-grade

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted, will work on this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to more generic wordings for pattern description



<img src="./resources/Lambda-SQS-Best-Practice.png" alt="Architecture" width="100%"/>

Important: this application uses various AWS services and there are costs associated with these services after the Free Tier usage - please see the [AWS Pricing page](https://aws.amazon.com/pricing/) for details. You are responsible for any AWS costs incurred. No warranty is implied in this example.

## Requirements

* [Create an AWS account](https://portal.aws.amazon.com/gp/aws/developer/registration/index.html) if you do not already have one and log in. The IAM user that you use must have sufficient permissions to make necessary AWS service calls and manage AWS resources.
* [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html) installed and configured
* [Git Installed](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git)
* [Node.js 20 or greater](https://nodejs.org/en/download/) installed
* [AWS CDK](https://docs.aws.amazon.com/cdk/v2/guide/getting_started.html) installed

## Deployment Instructions

1. Create a new directory, navigate to that directory in a terminal and clone the GitHub repository:
```
git clone https://github.com/aws-samples/serverless-patterns
```
1. Change directory to the pattern directory:
```
cd serverless-patterns/lambda-sqs-best-practices-cdk
```

1. Install cdk dependencies
```
npm install
```

1. Install lambda dependencies
```
cd lambda
npm install
```

1. Deploy cdk stack
```
cd ..
cdk deploy

```

Note: If you are using CDK for the first time then bootstrap CDK in your account by using below command:

```
cdk bootstrap aws://ACCOUNT-NUMBER-1/REGION-1

```

## How it works

This pattern sets up:

1. An SQS queue with a Dead Letter Queue (DLQ) for failed message handling
2. A Lambda function with:
- AWS Lambda Powertools integration
- Structured logging
- Custom metrics
- X-Ray tracing
3. A CloudWatch Dashboard for operational monitoring
4. Least priviledge permissions implemented on roles and policies
<img src="./resources/Least-priviledge.png" alt="Architecture" width="100%"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the purpose is to showcase the least privilege, then please ask users to navigate to
IAM --> Roles --> LambdaSqsBestPracticesCdk** and then review the Managed and customer inline policies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

[ ensured by implemeting individual inline policies with only required permissions added to role ]


The Lambda function:
- Processes messages in batches
- Validates message format
- Simulates downstream API calls with random failures (5% failure rate)
- Demonstrates handling of external service dependencies
- Handles errors gracefully
- Reports metrics and traces
- Uses structured logging

Failed messages are:
- Logged with error details
- Sent to DLQ after 3 retries
- Monitored via CloudWatch metrics

## Testing

The pattern includes a load testing script to verify functionality:

1. Set the Queue URL environment variable:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding print at the end

echo $QUEUE_URL
echo $DLQ_URL
echo $AWS_REGION

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally this script requires us to export environment variables before running the script else prompt us to set valid value. However, sure I can add console.log() for each value inside the script as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added console.log which prints the Source Queue URL, DLQ URL and Region from the script

```
export QUEUE_URL=$(aws cloudformation describe-stacks --stack-name LambdaSqsBestPracticesCdkStack --query 'Stacks[0].Outputs[?OutputKey==`QueueUrl`].OutputValue' --output text)

export DLQ_URL=$(aws cloudformation describe-stacks --stack-name LambdaSqsBestPracticesCdkStack --query 'Stacks[0].Outputs[?OutputKey==`DlqUrl`].OutputValue' --output text)

export AWS_REGION=us-east-1 # or your AWS region
```

2. Rum test script
Success Scenario
```
npm run test:success

```
Sample result
<img src="./resources/Success-script-sample.png" alt="Architecture" width="100%"/>

Refer Dashboard to verify all the Messages are processed successfully and no messages in DLQ
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why there is failure in this case?

Screenshot 2025-07-20 at 10 45 14 AM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, we are also simulating external API failure from Inside the code.
this could be related to same. we should have a success scenario on next retry as well

<img src="./resources/All-messages-processed.png" alt="Architecture" width="100%"/>

Also refer DLQ count on dashboard
<img src="./resources/No-messages-sent-to-DLQ.png" alt="Architecture" width="100%"/>

Failure Scenario
```
npm run test:dlq
```
Sample result
<img src="./resources/DLQ-Script-smaple-processing.png" alt="Architecture" width="100%"/>

Verify the same using dashboard
<img src="./resources/dashboard-mesage-processing.png" alt="Architecture" width="100%"/>

<img src="./resources/dashboard-mesage-processing-2.png" alt="Architecture" width="100%"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between these 2 snapshots?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried to showcase what happened to complete batch and retries. The i have mentioned the same using receive count

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refer dashboard 2, which shows the external API failure message was processed successfully and remaining 10 were retried 2 more times and post hitting receive count to 3 the 10 messages moved to DLQ


Additionally, confirm the messages in DLQ
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Messages won't show on the screen, until you poll the messages. I see it is highlighted in the snapshot, worth to mention.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noted

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

<img src="./resources/DLQ-in-messaging.png" alt="Architecture" width="100%"/>

Note: Refer Monitoring guide to locate “SQS-Processing-Dashboard”

## Monitoring Guide

Locating Resources

```
1. Navigate to AWS CloudFormation Console
2. Select the stack "LambdaSqsBestPracticesCdkStack"
3. Go to the "Resources" tab
4. Here you can find:
- All resources created by the stack
- Direct links to each resource's console
- Resource physical IDs and types
- Current status of each resource
```

CloudWatch Logs

```
1. Navigate to CloudWatch Console > Log Groups
2. Find /aws/lambda/BatchProcessingLambdaFunction
3. View structured logs with:
* Batch processing information
* Error details
```

Example DeepDive walkthrough on structured logging for a batch :
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ia there a way you can show the life cycle of the message by taking a messageID? and then take it all the way to trace?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have highlighted the Message ID and type of retry. The external API failure was successfully processed on retry. where the invalid message ( poison pill ) will be sent to DLQ. I will try to include complete retry cycle ( all 3 invokes - original and 2 retries ) and message in DLQ for poison pill message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have showed this for a poison pill record from initial invoke, retry 1, retry 2 and message sent to DLQ

1. Batch information before starting processing
<img src="./resources/batch-info.png" alt="Architecture" width="100%"/>

2. Success information
<img src="./resources/Success-info.png" alt="Architecture" width="100%"/>

3. Error information of failure
<img src="./resources/Error-info.png" alt="Architecture" width="100%"/>

4. Batch processing info
<img src="./resources/Batch-processing-info.png" alt="Architecture" width="100%"/>

5. Failed items returned back to queue for reprocessing
<img src="./resources/Failed-items.png" alt="Architecture" width="100%"/>

6. Failed Item retried [note messageID and time for retry]
<img src="./resources/Failed-item-retry.png" alt="Architecture" width="100%"/>

7. Additionally, in case of failed retries/poison pill
<img src="./resources/poison-pill.png" alt="Architecture" width="100%"/>

Message in moved to DLQ
<img src="./resources/message-in-DLQ.png" alt="Architecture" width="100%"/>

Custom tracing can be used as well to get quick information on batch processing
<img src="./resources/trace-info.png" alt="Architecture" width="100%"/>


Metrics Dashboard

```
1. Go to CloudWatch > Dashboards
2. Find the dashboard “SQS-Processing-Dashboard”
3. Monitor:
* Message processing success rate
* Batch size and processing time
* Error rates
* Monitor Queue metrics to understand Source queue depth, processing speed of messages in queue and DLQ message count
* Lambda performance including duration

```


<img src="./resources/SQS_operational_dashboard.png" alt="Architecture" width="100%"/>

## Cleanup

To remove all deployed resources:

```
cdk destroy
```

Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#!/usr/bin/env node
const cdk = require('aws-cdk-lib');
const { LambdaSqsBestPracticesCdkStack } = require('../lib/lambda-sqs-best-practices-cdk-stack');

const app = new cdk.App();
new LambdaSqsBestPracticesCdkStack(app, 'LambdaSqsBestPracticesCdkStack', {});
88 changes: 88 additions & 0 deletions lambda-sqs-best-practices-cdk/cdk.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
{
"app": "node bin/lambda-sqs-best-practices-cdk.js",
"watch": {
"include": [
"**"
],
"exclude": [
"README.md",
"cdk*.json",
"jest.config.js",
"package*.json",
"yarn.lock",
"node_modules",
"test"
]
},
"context": {
"@aws-cdk/aws-lambda:recognizeLayerVersion": true,
"@aws-cdk/core:checkSecretUsage": true,
"@aws-cdk/core:target-partitions": [
"aws",
"aws-cn"
],
"@aws-cdk-containers/ecs-service-extensions:enableDefaultLogDriver": true,
"@aws-cdk/aws-ec2:uniqueImdsv2TemplateName": true,
"@aws-cdk/aws-ecs:arnFormatIncludesClusterName": true,
"@aws-cdk/aws-iam:minimizePolicies": true,
"@aws-cdk/core:validateSnapshotRemovalPolicy": true,
"@aws-cdk/aws-codepipeline:crossAccountKeyAliasStackSafeResourceName": true,
"@aws-cdk/aws-s3:createDefaultLoggingPolicy": true,
"@aws-cdk/aws-sns-subscriptions:restrictSqsDescryption": true,
"@aws-cdk/aws-apigateway:disableCloudWatchRole": true,
"@aws-cdk/core:enablePartitionLiterals": true,
"@aws-cdk/aws-events:eventsTargetQueueSameAccount": true,
"@aws-cdk/aws-ecs:disableExplicitDeploymentControllerForCircuitBreaker": true,
"@aws-cdk/aws-iam:importedRoleStackSafeDefaultPolicyName": true,
"@aws-cdk/aws-s3:serverAccessLogsUseBucketPolicy": true,
"@aws-cdk/aws-route53-patters:useCertificate": true,
"@aws-cdk/customresources:installLatestAwsSdkDefault": false,
"@aws-cdk/aws-rds:databaseProxyUniqueResourceName": true,
"@aws-cdk/aws-codedeploy:removeAlarmsFromDeploymentGroup": true,
"@aws-cdk/aws-apigateway:authorizerChangeDeploymentLogicalId": true,
"@aws-cdk/aws-ec2:launchTemplateDefaultUserData": true,
"@aws-cdk/aws-secretsmanager:useAttachedSecretResourcePolicyForSecretTargetAttachments": true,
"@aws-cdk/aws-redshift:columnId": true,
"@aws-cdk/aws-stepfunctions-tasks:enableEmrServicePolicyV2": true,
"@aws-cdk/aws-ec2:restrictDefaultSecurityGroup": true,
"@aws-cdk/aws-apigateway:requestValidatorUniqueId": true,
"@aws-cdk/aws-kms:aliasNameRef": true,
"@aws-cdk/aws-autoscaling:generateLaunchTemplateInsteadOfLaunchConfig": true,
"@aws-cdk/core:includePrefixInUniqueNameGeneration": true,
"@aws-cdk/aws-efs:denyAnonymousAccess": true,
"@aws-cdk/aws-opensearchservice:enableOpensearchMultiAzWithStandby": true,
"@aws-cdk/aws-lambda-nodejs:useLatestRuntimeVersion": true,
"@aws-cdk/aws-efs:mountTargetOrderInsensitiveLogicalId": true,
"@aws-cdk/aws-rds:auroraClusterChangeScopeOfInstanceParameterGroupWithEachParameters": true,
"@aws-cdk/aws-appsync:useArnForSourceApiAssociationIdentifier": true,
"@aws-cdk/aws-rds:preventRenderingDeprecatedCredentials": true,
"@aws-cdk/aws-codepipeline-actions:useNewDefaultBranchForCodeCommitSource": true,
"@aws-cdk/aws-cloudwatch-actions:changeLambdaPermissionLogicalIdForLambdaAction": true,
"@aws-cdk/aws-codepipeline:crossAccountKeysDefaultValueToFalse": true,
"@aws-cdk/aws-codepipeline:defaultPipelineTypeToV2": true,
"@aws-cdk/aws-kms:reduceCrossAccountRegionPolicyScope": true,
"@aws-cdk/aws-eks:nodegroupNameAttribute": true,
"@aws-cdk/aws-ec2:ebsDefaultGp3Volume": true,
"@aws-cdk/aws-ecs:removeDefaultDeploymentAlarm": true,
"@aws-cdk/custom-resources:logApiResponseDataPropertyTrueDefault": false,
"@aws-cdk/aws-s3:keepNotificationInImportedBucket": false,
"@aws-cdk/aws-ecs:enableImdsBlockingDeprecatedFeature": false,
"@aws-cdk/aws-ecs:disableEcsImdsBlocking": true,
"@aws-cdk/aws-ecs:reduceEc2FargateCloudWatchPermissions": true,
"@aws-cdk/aws-dynamodb:resourcePolicyPerReplica": true,
"@aws-cdk/aws-ec2:ec2SumTImeoutEnabled": true,
"@aws-cdk/aws-appsync:appSyncGraphQLAPIScopeLambdaPermission": true,
"@aws-cdk/aws-rds:setCorrectValueForDatabaseInstanceReadReplicaInstanceResourceId": true,
"@aws-cdk/core:cfnIncludeRejectComplexResourceUpdateCreatePolicyIntrinsics": true,
"@aws-cdk/aws-lambda-nodejs:sdkV3ExcludeSmithyPackages": true,
"@aws-cdk/aws-stepfunctions-tasks:fixRunEcsTaskPolicy": true,
"@aws-cdk/aws-ec2:bastionHostUseAmazonLinux2023ByDefault": true,
"@aws-cdk/aws-route53-targets:userPoolDomainNameMethodWithoutCustomResource": true,
"@aws-cdk/aws-elasticloadbalancingV2:albDualstackWithoutPublicIpv4SecurityGroupRulesDefault": true,
"@aws-cdk/aws-iam:oidcRejectUnauthorizedConnections": true,
"@aws-cdk/core:enableAdditionalMetadataCollection": true,
"@aws-cdk/aws-lambda:createNewPoliciesWithAddToRolePolicy": true,
"@aws-cdk/aws-s3:setUniqueReplicationRoleName": true,
"@aws-cdk/aws-events:requireEventBusPolicySid": true
}
}
Loading