Cloud Formation Stack Glitch Vulnerability Report #8558
Sidzeppelin95
started this conversation in
Architechtures
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I encountered a severe vulnerability while deploying SAM template from AWS quick start templates. While I was looking for the right template for the application I was working with at the time, I found an error in the 9th template called Lambda Execution Streaming template. When using the SAM build command in cloud shell, one of the blocks in the cloud formation stack was showing an error I don’t exactly remember as of now but it was an SQL/Code Injection (CWE-94) type of vulnerability/warning which could also work but could have been compromised.
a. High Level Findings Breakdown: The error had something to do with stack resource allocation which had some kind of glitch such that the particular block was pushed behind/ ahead of the stack or it was not retrieved at all from the source. I tried to create the same stack with different iterations from the options starting from sam init to look for any changes in the error and I found some technical absurdity which gave me an idea about what the error could be.
Proof of concept -
The error could be described as FIFO/LIFO stack insertion warning/vulnerability where that element is not reading the resources hence that stack block is not generating the required output. An error(s) & warning(s) was displayed in the resource type column where there was an issue with the lambda streaming function. It was raising invalidsamtemplateexception where some glitch from the backend source which was not able to transform into a standard cloudformation template. It can also be termed as SERVICE_ERROR and LOCAL_SERVICE_ERROR. The cource of the error rises from the stack resource not being properly updated i.e. the data steaming function does not meet the required functionality to generate the stack resource hence it is not publishing the required output in the affected area.
Scope –
Experimenting and resolving the template cloud stack block and publishing the template.
b. Risk & Growth Analysis:
Severity of Vulnerability: CVVS Rating 7.4
Report ID and title: Green, Pending Program Review
There are several risks that could have been exploited and grown into the cloud resource if this was not given priority to be worked upon. The following factors could be the determining factor to derive the severity of this issue and define the risk factor.
Unpatched Stack Block: This stack block, if not found and fixed could have led to hackers stealing the resource base of the template. Paraphrased: If this overlooked block in the system isn't addressed and corrected, it could allow hackers to steal the template's essential resources.
Data Theft during template use: If some big company’s developer starts building the application using this template while the hacker notices the activity through this loophole, they can potentially take away their data.
Corruption of Resources: They could also corrupt the adjacent or neighboring stack resources by gaining access through this block or bypassing the security algorithm in the process.
Manipulation of Network and Resources: The hacker(s) could have gained access to the resource ARN or URL where they could harm the network and resource availability of the application in use. This can also enable them to disrupt the network IP address and/or protocols and delete or insert malicious content in the resources crucial for the application.
Credential Theft and Impersonation: Hacker(s) could have concealed their presence within the block, gradually acquiring user credentials. They can simply hide behind that block and slowly gain access to the user credentials and misuse that for their own advantage/personal gain or even send threats to the network’s security.
Exploitation of Resources and Credentials: Furthermore, the hackers can exploit the resources in the stack and other user credentials for sending threats to the company or individual involved in that particular work or the specific project.
Generating Replica of the template resources: Lastly, if the problem persisted or overlooked for a period of time, they could have generated a replica of the template and used it as their own template in the quick start templates menu. This would have resulted in a serious disaster if it were implemented and used by some random user. The whole application would become a liability and other cloud services to be integrated along with it would also become vulnerable to this attack.
In summary, these vulnerabilities pose significant risks, allowing hackers to gain unauthorized access, steal sensitive data, corrupt resources, and impersonate users, potentially causing severe harm to the affected systems and organizations.
c. References: Here’s a screenshot of the described template build stack which shows all in green now.
Engagement Background: Once I started conducting an experiment to find out where the error was arising from and what it is exactly, I discovered the AWS Code space where I could see the template code in Python for my understanding.
a. Methodology: After continuously exploiting the error I was at the base of understanding the root of the problem. I kept changing the steps that come before building the application to see if it produces any change in the error which it did. Hence, I could point out where and how the error could be arising so I looked in the code where the observed behavior was coming from and started resolving it. I did not raise any issue at the time because I found the problem interesting enough to continue until resolved. The overall methodology adopted was a top down approach so I could filter out the inner workings responsible for this error and find out the correct way to resolve and publish the correct message.
b. Classification & Severity: There was a missing argument in the code and a special character glitch which was added by me while I got access to the code inside the AWS environment using code commit & code pipeline. The best way to classify the error would be Syntax, Runtime and Logical error where a part of the code (argument) was missing and a character (‘&’) was also missing from a particular area in the code which would go un-noticed under many cases as this took some time to figure out and add the & argument to fix the issue. The CVVS severity shows that the rating for this vulnerability is 7.4 and I suppose this would classify this multi-faceted issue as a critical error as per my understanding. This is because a developer could go on and use the template and have other resources assigned to that particular area where the warning occurs to neglect its shortcomings and still be using the same template for the project. This criticality arises from the potential consequences: a developer might overlook warning, deploy the template, and assign critical resources to the flawed area. This oversight could have led to operational failures, data corruption, or security breaches, making it imperative to address this issue. Consequently, other essential resources might be assigned to the flawed area, risking operational stability, data integrity and potentially security breaches.
Findings: Throughout the course of fixing this error, I found some interesting results when I completed the first iteration and the red color in the error box turned yellow. This was done by adding a missing argument in the template code. It was a recursive action with primary resource allocation to the stack ID/block that was missing from the stack. The stack resources are generated through a sam translator that converts the models r events into readable and writable resource types. The function’s argument had a parameter that was assigned to take the input data object and reach the cloudformation template block without any objections but it was stuck. As I observed it was a pointer that had been defined and that pointer needs to take input data from the streaming resource in the code. This should match as per the code but it has a missing argument and the pointer did not know what resource to stream the data, hence it was showing this error. So the path of the data source needed to be streamed was to be cleared with the parameter initialized and streaming this value as a resource into the cloud environment.
a. Findings Overview: The resource allocated was a data streaming type stack block that was initialized but not fulfilled in the block. It had an argument issue and a character glitch that kept occurring without being noticed from the server backend. Due to this error everything generated from the stack element (string, variables and data types) had an element that was corrupted, for example the Code URI generated from the template YAML/JSON file would be different for a template with a complete stack as compared to the one with a missing element or we could say the element is there but the server less model is not able to fetch it from the source.
b. Vulnerability Details: The vulnerability can be defined as memory address or resource allocated in a stack that is there but it is unreadable or flawed when the stack is called. The memory address is in the form of a lambda function which streams some data relevant to the template and sends it to that stack block for further usage. So, there is an issue with this particular stack block and this can become a victim of stack smashing. An untrusted user can corrupt the stack in such a way as to inject executable code into the running program and control the process. In addition, the vulnerability may be reintroduced on subsequent updates to the application since the development team will continue to use the vulnerable package. This could harm the configurability of the deployed application, the Docker file and other application based dependencies that it is utilizing. SAM CLI utilizes Infrastructure as code so any discrepancies in the cloud formation stack could potentially interrupt the whole infrastructure that the application is standing on. This can also be termed as a buffer or stack overflow type vulnerability but when I kept digging deeper into the problem it had another glitch (syntax error). This must have been the developer’s fault where he/she must have overlooked a simple syntax or forgot to add the character & before some string literal or variable.
Problem Resolution: After carefully analyzing the problem and understanding its inner mechanics, I tried validating the template using aws cloudformation validate-template –template-body file://filename.json. There I found some errors arising from the template code so I tried validating the logical IDs and parameters. After going through the process I realized that the resource’s source wasn’t exactly going into the primary stack through the layer in the template. This is where I started fixing the problem by going deep into the code with the following algorithm followed.
a. Accessing the AWS Code Space: By looking at the code for Lambda Response Streaming template I was able to gain deeper understanding of the inner workings and how the stack resource was being deployed in the stack next to it. I copied the code into the compiler to start debugging the exact area to be worked upon. Now, obviously the code wasn’t going to produce any result due to cloud compatibility and server which is not available in the machine but I had to carefully look through the place where the error was arising and try to compile that part of the code somehow. Firstly, I figured out the area that was faulty by my observation and then ran it through vscode.dev to debug the issue.
b. Steps Induced: To check whether I could debug the template from the cloudshell itself I looked up some commands so I could get more clarity about the error and end up resolving it. I found the troubleshooting guide and started following the steps and debugging.
i. I used the aws cloudformation linter command to validate the template. The console displayed the following errors for unresolved resource dependencies, resource does not exist and every condition must be a string error. I worked on the problem until this error got resolved.
ii. To proceed with troubleshooting I accessed the cloudformation console to learn more about the error and found valid results based on my observations. Now, I could provide the logical explanation behind the faulty stack block.
iii. Then I related this information with the code of the template to find if the stack parameters or variables for resource entry/exit point (data streaming) had some issues with the cloudformation stack. I found out that the stack element was not empty and was returning something whose value the interpreter was not able to either read, convert or reconstruct into the required data stream with its associated resource & logical IDs (having defined variables and parameters). Now, I had to examine the code to determine the error and make sure that the assigned cloud resource is being read and written into the application template being deployed. The following resource types were being deployed in the application.
CloudFormation Resource Type Logical ID
AWS::ApiGatewayV2::Api ServerlessHttpApi
AWS::ApiGatewayV2::Stage ServerlessHttpApiApiGatewayDefaultStage
AWS::Lambda::Permission MyFunction\ ThumbnailApi\ Permission
ServerlessHttpApi* resources were generated one per stack. We must note that one parent stack may contain multiple classes and objects stacked together working as a function in the required stream.
iv. By examining the code I could point out the exact area where the problem was arising so I had to change one of the arguments for auto-scaling signal argument
for input/output data buffer. It did fix a part of the problem but still the error persisted in some other way which was confusing and I went back to re-examining the code after updating the stack with some new features rollback completed and others failed. After executing the stack resource into the template to generate the required resource output, it generated the following result. This was achieved using the get method and publish to display the output. The following shows a result of stack block being updated.
================================= ================================
Cloudformation Resource Type Logical ID
================================== ================================
AWS:: Lambda::Function MyFunction
AWS:: IAM::Role MyFunction\ Role
================================== ================================
Inside the stack element, the following function modes were stacked together inside a block. They are API, Function, SimpleTable, Application, LambdaLayerVersion, HttpApi and StateMachine.
v. For further interpretation, I checked the cloud watch trails for the data logs and error blocks to find out exactly what’s going on in the backend of the subjective resource function. Here I was sure about replacing the cloudformation resource with a new updated stack ID. Hence, I created a new environment variable for python runtime environment that could capture the exact resource ID with the associated code to transfer the needed data streaming functions into the function ARN of the template. Moreover, the lambda function that was created had an event associated with its streaming functionality with an API Gateway so these additional resources had to be generated as well.
================================== ================================
CloudFormation Resource Type Logical ID
==================================================================
AWS::ApiGateway::RestApi ServerlessRestApi
AWS::ApiGateway::Stage ServerlessRestApi\ Prod\ Stage
AWS::ApiGateway::Deployment ServerlessRestApi\ Deployment\ SHA (10 Digits of SHA256 of Swagger)
AWS::Lambda::Permission MyFunction\ ThumbnailApi\ Permission\ Prod (Prod is the default Stage Name for implicit APIs)
==================================================================
vi. After updating the stack resource you must check the communication signal from the server to the cloud console using cfn-signal. This shows if the stack has been updated from the backend and is able to connect to the cloud network without interruptions.
vii. Now I had to setup a new stack policy to parse the data in the updated stack and protect its content with rollback configuration. Furthermore, I had to check the stack status code and verify it with the correct logical and resource ID that needs to be entered and displayed. Once I got into the troubleshooting drill, I ran the diagnosis back and forth to get the desired output. Everything was stabilizing but I still couldn’t quite resolve one not so typical issue.
viii. In the cloudformation template I had to check the AWS specific parameter properties for each assigned resource that the error was pointed out by the log files and error type. By looking at the code space for the template I could not figure out the problem at first but I kept trying to execute the command to compile the code successfully.
ix. After a few iterations, I saw a simple error where the resource’s entry and exit points were stuck with the parameter function assigned to it. The area where it was being declared had a single & before it but it required && for streaming data to and fro. I changed the code to try again and it worked.
x. The parameter that was taking this resource from the cloud code space to the cloudformation stack was unable to return to its original path after processing some data values or strings (whatever the resource contained) and required another declaration to stabilize the parameter and its existing associated resource.
xi. This was a tedious process and took some time and brainstorming to get the solution. I updated the stack resource ID and stack policy for making the cloudformation template live in the cloud space. Then I built and deployed the template again just to check if it has started working as it should.
update_policy.py
CodeDeployLambdaAliasUpdate = namedtuple(
"CodeDeployLambdaAliasUpdate",
["ApplicationName", "DeploymentGroupName", "BeforeAllowTrafficHook", "AfterAllowTrafficHook"],
)
xii. This generated a new template YAML/JSON file which had a new code URI assigned to it due to changes in the code template stack ID and policy.
xiii. Now I had to test and validate the integrated package module so I built a default application using the template to see if the template is able to communicate with the desired resources to be used as services by the user.
xiv. Now I had to generate the configuration file in the virtual environment for testing using pytest –cov samcli –cov to ensure code coverage and when the test passed I used echo “path of config file” into the development environment.
xv. Checked the telemetry status, functional test and linter commands for static and dynamic analysis. All of this was updated using codecommit and codepipeline in AWS.
These were the steps followed through to resolve this problem so now I had to change some files.
I submitted this report to aws security more than two years ago and got mixed signals from them saying one of the resources belonged to the amazon retail website and more. Even after all that, I didn't get the recognition for it in any form. I also submitted a report to the trust and safety team for abuse but still haven't received any response from them in over 3 months now. Kindly suggest.
Beta Was this translation helpful? Give feedback.
All reactions