-
Notifications
You must be signed in to change notification settings - Fork 421
Add HTTP 413 response when incoming request is too large #19212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
synapse/http/site.py
Outdated
| if len(content_length_headers) != 1: | ||
| logger.warning( | ||
| "Dropping request from %s because it contains multiple Content-Length headers: %s %s", | ||
| self.client, | ||
| command.decode("ascii", errors="replace"), | ||
| self.get_redacted_uri(), | ||
| ) | ||
| self.loseConnection() | ||
| return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explain why we do this.
In the SBG we say this type thing:
element-hq/sbg -> crates/matrix-sbg-module-oauth-par/src/lib.rs#L597-L624
// If there are 0 or multiple `client_id` query parameters, return
// an error. We don't want to even try to pick the right one if
// there are multiple as we could run into problems similar to
// request smuggling vulnerabilities which rely on the mismatch of
// how different systems interpret information.
| # we should get a 415 | ||
| self.assertRegex(transport.value().decode(), r"^HTTP/1\.1 415 ") | ||
|
|
||
| def test_content_length_too_large(self) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(test hasn't been added yet)
synapse/http/site.py
Outdated
| command.decode("ascii", errors="replace"), | ||
| self.get_redacted_uri(), | ||
| ) | ||
| self.loseConnection() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be nice to respond with an actual HTTP status code and a good error explaining why.
In the case of multiple Content-Length headers -> 400 Bad Request
In the case of no Content-Length header -> 411 Length Required
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, in the case of multiple headers we can do that.
In the case of no header, we can't jump to that conclusion so easily. It depends on the HTTP version. Required in HTTP 1, not required if there is a Transfer-Encoding: chunked header in the HTTP 1.1, and not required in HTTP 2.
https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Content-Length
We could bake in all these additional checks if you think it doesn't blow this PR up. I'm not sure if twisted already handles these cases or not.
synapse/http/site.py
Outdated
| content_length_headers = self.requestHeaders.getRawHeaders(b"Content-Length") | ||
| if content_length_headers is not None: | ||
| if len(content_length_headers) != 1: | ||
| logger.warning( | ||
| "Dropping request from %s because it contains multiple Content-Length headers: %s %s", | ||
| self.client, | ||
| command.decode("ascii", errors="replace"), | ||
| self.get_redacted_uri(), | ||
| ) | ||
| self.loseConnection() | ||
| return | ||
|
|
||
| try: | ||
| content_length = int(content_length_headers[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nicer if this kind of thing was a bit more contained in a block instead of separately validating and then doing a cheeky lookup again.
As an example of how we do this in Rust:
element-hq/sbg -> crates/matrix-sbg-module-oauth-par/src/lib.rs#L597-L624
It could be pulled out into a nested function here 🤷 - Since we should be using this kind of pattern wherever we're looking at headers, it could also be a nice helper utility which raises a certain exception type which can be caught here and we react as we do now here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pulled it out into a helper function. Let me know if tha's what you had in mind.
| # we should get a 415 | ||
| self.assertRegex(transport.value().decode(), r"^HTTP/1\.1 415 ") | ||
|
|
||
| def test_content_length_too_large(self) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a bunch of testing and Synapse (Twisted) handles this by truncating the HTTP request at the specified size of the
Content-Length. So the request will go through to the server correctly but the body will be silently truncated. The rest of the bytes are dropped.
Perhaps some request with a basic JSON body that will be cut-off and we expect M_NOT_JSON 🤷
| content_length, | ||
| self.content.tell(), | ||
| self.get_method(), | ||
| self.get_redacted_uri(), |
Check failure
Code scanning / CodeQL
Clear-text logging of sensitive information High
sensitive data (password)
This expression logs
sensitive data (password)
This expression logs
sensitive data (password)
This expression logs
sensitive data (password)
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 11 hours ago
Sensitive information embedded in URIs—such as passwords, session IDs, OIDC tokens, "state", "code", or other authentication-related parameters—must be properly redacted before the URI is logged. The current redact_uri utility only removes access_token and client_secret query parameters. To comprehensively fix this, update the redact_uri function (in synapse/http/__init__.py) to also redact the following (at minimum):
passwordstatecodesid- any other parameters frequently used in OIDC or SSO flows
These new regex replacements should be written analogously to the existing ones for query parameters. Throughout the codebase, the only spot where URIs are logged relies on get_redacted_uri() which wraps redact_uri, so improving redact_uri applies the fix everywhere the logging is done.
Required changes:
- In
synapse/http/__init__.py, update theredact_urifunction to include patterns for the new sensitive parameters:password,state,code, andsid. - No changes required in
synapse/http/site.py, as all URI logging routes throughget_redacted_uri(). - No changes required for imports or method signatures.
- No changes required for test files.
-
Copy modified lines R37-R42 -
Copy modified lines R46-R53
| @@ -34,14 +34,23 @@ | ||
| super().__init__(504, msg) | ||
|
|
||
|
|
||
| ACCESS_TOKEN_RE = re.compile(r"(\?.*access(_|%5[Ff])token=)[^&]*(.*)$") | ||
| CLIENT_SECRET_RE = re.compile(r"(\?.*client(_|%5[Ff])secret=)[^&]*(.*)$") | ||
| ACCESS_TOKEN_RE = re.compile(r"([?&]access(_|%5[Ff])token=)[^&]*") | ||
| CLIENT_SECRET_RE = re.compile(r"([?&]client(_|%5[Ff])secret=)[^&]*") | ||
| PASSWORD_RE = re.compile(r"([?&]password=)[^&]*") | ||
| STATE_RE = re.compile(r"([?&]state=)[^&]*") | ||
| CODE_RE = re.compile(r"([?&]code=)[^&]*") | ||
| SID_RE = re.compile(r"([?&]sid=)[^&]*") | ||
|
|
||
|
|
||
| def redact_uri(uri: str) -> str: | ||
| """Strips sensitive information from the uri replaces with <redacted>""" | ||
| uri = ACCESS_TOKEN_RE.sub(r"\1<redacted>\3", uri) | ||
| return CLIENT_SECRET_RE.sub(r"\1<redacted>\3", uri) | ||
| # redact access_token, client_secret, password, state, code, sid | ||
| uri = ACCESS_TOKEN_RE.sub(r"\1<redacted>", uri) | ||
| uri = CLIENT_SECRET_RE.sub(r"\1<redacted>", uri) | ||
| uri = PASSWORD_RE.sub(r"\1<redacted>", uri) | ||
| uri = STATE_RE.sub(r"\1<redacted>", uri) | ||
| uri = CODE_RE.sub(r"\1<redacted>", uri) | ||
| uri = SID_RE.sub(r"\1<redacted>", uri) | ||
| return uri | ||
|
|
||
|
|
||
| class QuieterFileBodyProducer(FileBodyProducer): |
| content_length, | ||
| self._max_request_body_size, | ||
| self.get_method(), | ||
| self.get_redacted_uri(), |
Check failure
Code scanning / CodeQL
Clear-text logging of sensitive information High
sensitive data (password)
This expression logs
sensitive data (password)
This expression logs
sensitive data (password)
This expression logs
sensitive data (password)
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 11 hours ago
To securely log URIs without exposing sensitive information, the redaction must target all standard authentication/secret parameters regardless of case or encoding, and also strip passwords or secrets from path segments (e.g., /user/<password>/something, although this rarely occurs for well-designed APIs, but it’s best to mitigate the risk).
Enhance the redact_uri() function in synapse/http/__init__.py to additionally scrub query parameters containing names such as password, token, session, sid, state, code, and make redaction case-insensitive and decoding-aware.
The fix is limited to the code snippets shown, as requested.
Required changes:
- In
synapse/http/__init__.py, update the regular expressions in the redaction routine to:- Cover a wider variety of parameter names,
- Be case-insensitive,
- Properly handle percent-encoded or other encodings in query strings,
- Possibly redact secrets in path segments (optional: see note below).
- Ensure get_redacted_uri in
synapse/http/site.pyis using the updated function.
Note:
Redacting values in path segments is not usually needed unless APIs route secrets in the path (which is uncommon for correct RESTful APIs). The code does not show explicit path secrets, but to be thorough, a regex can be added for path-based redaction if required.
-
Copy modified lines R37-R56 -
Copy modified lines R60-R63
| @@ -34,14 +34,33 @@ | ||
| super().__init__(504, msg) | ||
|
|
||
|
|
||
| ACCESS_TOKEN_RE = re.compile(r"(\?.*access(_|%5[Ff])token=)[^&]*(.*)$") | ||
| CLIENT_SECRET_RE = re.compile(r"(\?.*client(_|%5[Ff])secret=)[^&]*(.*)$") | ||
| SENSITIVE_PARAM_NAMES = [ | ||
| "access_token", | ||
| "client_secret", | ||
| "password", | ||
| "session", | ||
| "sid", | ||
| "state", | ||
| "code", | ||
| "token", | ||
| # Add any other parameter names that may contain secrets | ||
| ] | ||
| # Build regex to catch sensitive query params, handling percent-encoding and case-insensitivity | ||
| # This pattern matches sensitive param names after '?' or '&', possibly percent-encoded | ||
| SENSITIVE_PARAM_REGEXPS = [ | ||
| re.compile( | ||
| r"([?&])(" + p.replace("_", r"(_|%5[fF])") + r")=([^&]*)", | ||
| re.IGNORECASE, | ||
| ) | ||
| for p in SENSITIVE_PARAM_NAMES | ||
| ] | ||
|
|
||
|
|
||
| def redact_uri(uri: str) -> str: | ||
| """Strips sensitive information from the uri replaces with <redacted>""" | ||
| uri = ACCESS_TOKEN_RE.sub(r"\1<redacted>\3", uri) | ||
| return CLIENT_SECRET_RE.sub(r"\1<redacted>\3", uri) | ||
| # Iteratively apply all sensitive param redactions | ||
| for param_re in SENSITIVE_PARAM_REGEXPS: | ||
| uri = param_re.sub(r"\1\2=<redacted>", uri) | ||
| return uri | ||
|
|
||
|
|
||
| class QuieterFileBodyProducer(FileBodyProducer): |
Related to #17035, when Synapse receives a request that is larger than the maximum size allowed, it aborts the connection without ever sending back a HTTP response.
I dug into our usage of twisted and how best to try and report such an error and this is what I came up with.
It would be ideal to be able to report the status from within
handleContentChunkbut that is called too early on in the twisted http handling code, before things have been setup enough to be able to properly write a response.I tested this change out locally (both with C-S and S-S apis) and they do receive a 413 response now in addition to the connection being closed.
Hopefully this will aid in being able to quickly detect when #17035 is occurring as the current situation makes it very hard to narrow things down to that specific issue without making a lot of assumptions.
Pull Request Checklist
EventStoretoEventWorkerStore.".code blocks.