Skip to content

Conversation

@devonh
Copy link
Member

@devonh devonh commented Nov 21, 2025

Related to #17035, when Synapse receives a request that is larger than the maximum size allowed, it aborts the connection without ever sending back a HTTP response.
I dug into our usage of twisted and how best to try and report such an error and this is what I came up with.

It would be ideal to be able to report the status from within handleContentChunk but that is called too early on in the twisted http handling code, before things have been setup enough to be able to properly write a response.
I tested this change out locally (both with C-S and S-S apis) and they do receive a 413 response now in addition to the connection being closed.

Hopefully this will aid in being able to quickly detect when #17035 is occurring as the current situation makes it very hard to narrow things down to that specific issue without making a lot of assumptions.

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Code style is correct (run the linters)

@devonh devonh requested a review from a team as a code owner November 21, 2025 17:14
@MadLittleMods MadLittleMods added A-Federation A-to-device-messages EDU messages sent exactly once to a specific set of devices. Related to E2EE labels Nov 24, 2025
@devonh devonh requested a review from MadLittleMods November 27, 2025 00:06
Comment on lines 162 to 170
if len(content_length_headers) != 1:
logger.warning(
"Dropping request from %s because it contains multiple Content-Length headers: %s %s",
self.client,
command.decode("ascii", errors="replace"),
self.get_redacted_uri(),
)
self.loseConnection()
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain why we do this.

In the SBG we say this type thing:

element-hq/sbg -> crates/matrix-sbg-module-oauth-par/src/lib.rs#L597-L624

// If there are 0 or multiple `client_id` query parameters, return
// an error. We don't want to even try to pick the right one if
// there are multiple as we could run into problems similar to
// request smuggling vulnerabilities which rely on the mismatch of
// how different systems interpret information.

# we should get a 415
self.assertRegex(transport.value().decode(), r"^HTTP/1\.1 415 ")

def test_content_length_too_large(self) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(test hasn't been added yet)

command.decode("ascii", errors="replace"),
self.get_redacted_uri(),
)
self.loseConnection()
Copy link
Contributor

@MadLittleMods MadLittleMods Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice to respond with an actual HTTP status code and a good error explaining why.

In the case of multiple Content-Length headers -> 400 Bad Request

In the case of no Content-Length header -> 411 Length Required

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, in the case of multiple headers we can do that.
In the case of no header, we can't jump to that conclusion so easily. It depends on the HTTP version. Required in HTTP 1, not required if there is a Transfer-Encoding: chunked header in the HTTP 1.1, and not required in HTTP 2.
https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Content-Length

We could bake in all these additional checks if you think it doesn't blow this PR up. I'm not sure if twisted already handles these cases or not.

Comment on lines 160 to 173
content_length_headers = self.requestHeaders.getRawHeaders(b"Content-Length")
if content_length_headers is not None:
if len(content_length_headers) != 1:
logger.warning(
"Dropping request from %s because it contains multiple Content-Length headers: %s %s",
self.client,
command.decode("ascii", errors="replace"),
self.get_redacted_uri(),
)
self.loseConnection()
return

try:
content_length = int(content_length_headers[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nicer if this kind of thing was a bit more contained in a block instead of separately validating and then doing a cheeky lookup again.

As an example of how we do this in Rust:
element-hq/sbg -> crates/matrix-sbg-module-oauth-par/src/lib.rs#L597-L624

It could be pulled out into a nested function here 🤷 - Since we should be using this kind of pattern wherever we're looking at headers, it could also be a nice helper utility which raises a certain exception type which can be caught here and we react as we do now here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pulled it out into a helper function. Let me know if tha's what you had in mind.

# we should get a 415
self.assertRegex(transport.value().decode(), r"^HTTP/1\.1 415 ")

def test_content_length_too_large(self) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a bunch of testing and Synapse (Twisted) handles this by truncating the HTTP request at the specified size of the Content-Length. So the request will go through to the server correctly but the body will be silently truncated. The rest of the bytes are dropped.

Perhaps some request with a basic JSON body that will be cut-off and we expect M_NOT_JSON 🤷

content_length,
self.content.tell(),
self.get_method(),
self.get_redacted_uri(),

Check failure

Code scanning / CodeQL

Clear-text logging of sensitive information High

This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (password)
as clear text.

Copilot Autofix

AI about 11 hours ago

Sensitive information embedded in URIs—such as passwords, session IDs, OIDC tokens, "state", "code", or other authentication-related parameters—must be properly redacted before the URI is logged. The current redact_uri utility only removes access_token and client_secret query parameters. To comprehensively fix this, update the redact_uri function (in synapse/http/__init__.py) to also redact the following (at minimum):

  • password
  • state
  • code
  • sid
  • any other parameters frequently used in OIDC or SSO flows

These new regex replacements should be written analogously to the existing ones for query parameters. Throughout the codebase, the only spot where URIs are logged relies on get_redacted_uri() which wraps redact_uri, so improving redact_uri applies the fix everywhere the logging is done.

Required changes:

  • In synapse/http/__init__.py, update the redact_uri function to include patterns for the new sensitive parameters: password, state, code, and sid.
  • No changes required in synapse/http/site.py, as all URI logging routes through get_redacted_uri().
  • No changes required for imports or method signatures.
  • No changes required for test files.

Suggested changeset 1
synapse/http/__init__.py
Outside changed files

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/synapse/http/__init__.py b/synapse/http/__init__.py
--- a/synapse/http/__init__.py
+++ b/synapse/http/__init__.py
@@ -34,14 +34,23 @@
         super().__init__(504, msg)
 
 
-ACCESS_TOKEN_RE = re.compile(r"(\?.*access(_|%5[Ff])token=)[^&]*(.*)$")
-CLIENT_SECRET_RE = re.compile(r"(\?.*client(_|%5[Ff])secret=)[^&]*(.*)$")
+ACCESS_TOKEN_RE = re.compile(r"([?&]access(_|%5[Ff])token=)[^&]*")
+CLIENT_SECRET_RE = re.compile(r"([?&]client(_|%5[Ff])secret=)[^&]*")
+PASSWORD_RE = re.compile(r"([?&]password=)[^&]*")
+STATE_RE = re.compile(r"([?&]state=)[^&]*")
+CODE_RE = re.compile(r"([?&]code=)[^&]*")
+SID_RE = re.compile(r"([?&]sid=)[^&]*")
 
-
 def redact_uri(uri: str) -> str:
     """Strips sensitive information from the uri replaces with <redacted>"""
-    uri = ACCESS_TOKEN_RE.sub(r"\1<redacted>\3", uri)
-    return CLIENT_SECRET_RE.sub(r"\1<redacted>\3", uri)
+    # redact access_token, client_secret, password, state, code, sid
+    uri = ACCESS_TOKEN_RE.sub(r"\1<redacted>", uri)
+    uri = CLIENT_SECRET_RE.sub(r"\1<redacted>", uri)
+    uri = PASSWORD_RE.sub(r"\1<redacted>", uri)
+    uri = STATE_RE.sub(r"\1<redacted>", uri)
+    uri = CODE_RE.sub(r"\1<redacted>", uri)
+    uri = SID_RE.sub(r"\1<redacted>", uri)
+    return uri
 
 
 class QuieterFileBodyProducer(FileBodyProducer):
EOF
@@ -34,14 +34,23 @@
super().__init__(504, msg)


ACCESS_TOKEN_RE = re.compile(r"(\?.*access(_|%5[Ff])token=)[^&]*(.*)$")
CLIENT_SECRET_RE = re.compile(r"(\?.*client(_|%5[Ff])secret=)[^&]*(.*)$")
ACCESS_TOKEN_RE = re.compile(r"([?&]access(_|%5[Ff])token=)[^&]*")
CLIENT_SECRET_RE = re.compile(r"([?&]client(_|%5[Ff])secret=)[^&]*")
PASSWORD_RE = re.compile(r"([?&]password=)[^&]*")
STATE_RE = re.compile(r"([?&]state=)[^&]*")
CODE_RE = re.compile(r"([?&]code=)[^&]*")
SID_RE = re.compile(r"([?&]sid=)[^&]*")


def redact_uri(uri: str) -> str:
"""Strips sensitive information from the uri replaces with <redacted>"""
uri = ACCESS_TOKEN_RE.sub(r"\1<redacted>\3", uri)
return CLIENT_SECRET_RE.sub(r"\1<redacted>\3", uri)
# redact access_token, client_secret, password, state, code, sid
uri = ACCESS_TOKEN_RE.sub(r"\1<redacted>", uri)
uri = CLIENT_SECRET_RE.sub(r"\1<redacted>", uri)
uri = PASSWORD_RE.sub(r"\1<redacted>", uri)
uri = STATE_RE.sub(r"\1<redacted>", uri)
uri = CODE_RE.sub(r"\1<redacted>", uri)
uri = SID_RE.sub(r"\1<redacted>", uri)
return uri


class QuieterFileBodyProducer(FileBodyProducer):
Copilot is powered by AI and may make mistakes. Always verify output.
content_length,
self._max_request_body_size,
self.get_method(),
self.get_redacted_uri(),

Check failure

Code scanning / CodeQL

Clear-text logging of sensitive information High

This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (password)
as clear text.

Copilot Autofix

AI about 11 hours ago

To securely log URIs without exposing sensitive information, the redaction must target all standard authentication/secret parameters regardless of case or encoding, and also strip passwords or secrets from path segments (e.g., /user/<password>/something, although this rarely occurs for well-designed APIs, but it’s best to mitigate the risk).

Enhance the redact_uri() function in synapse/http/__init__.py to additionally scrub query parameters containing names such as password, token, session, sid, state, code, and make redaction case-insensitive and decoding-aware.
The fix is limited to the code snippets shown, as requested.

Required changes:

  • In synapse/http/__init__.py, update the regular expressions in the redaction routine to:
    • Cover a wider variety of parameter names,
    • Be case-insensitive,
    • Properly handle percent-encoded or other encodings in query strings,
    • Possibly redact secrets in path segments (optional: see note below).
  • Ensure get_redacted_uri in synapse/http/site.py is using the updated function.

Note:
Redacting values in path segments is not usually needed unless APIs route secrets in the path (which is uncommon for correct RESTful APIs). The code does not show explicit path secrets, but to be thorough, a regex can be added for path-based redaction if required.


Suggested changeset 1
synapse/http/__init__.py
Outside changed files

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/synapse/http/__init__.py b/synapse/http/__init__.py
--- a/synapse/http/__init__.py
+++ b/synapse/http/__init__.py
@@ -34,14 +34,33 @@
         super().__init__(504, msg)
 
 
-ACCESS_TOKEN_RE = re.compile(r"(\?.*access(_|%5[Ff])token=)[^&]*(.*)$")
-CLIENT_SECRET_RE = re.compile(r"(\?.*client(_|%5[Ff])secret=)[^&]*(.*)$")
+SENSITIVE_PARAM_NAMES = [
+    "access_token",
+    "client_secret",
+    "password",
+    "session",
+    "sid",
+    "state",
+    "code",
+    "token",
+    # Add any other parameter names that may contain secrets
+]
+# Build regex to catch sensitive query params, handling percent-encoding and case-insensitivity
+#   This pattern matches sensitive param names after '?' or '&', possibly percent-encoded
+SENSITIVE_PARAM_REGEXPS = [
+    re.compile(
+        r"([?&])(" + p.replace("_", r"(_|%5[fF])") + r")=([^&]*)",
+        re.IGNORECASE,
+    )
+    for p in SENSITIVE_PARAM_NAMES
+]
 
-
 def redact_uri(uri: str) -> str:
     """Strips sensitive information from the uri replaces with <redacted>"""
-    uri = ACCESS_TOKEN_RE.sub(r"\1<redacted>\3", uri)
-    return CLIENT_SECRET_RE.sub(r"\1<redacted>\3", uri)
+    # Iteratively apply all sensitive param redactions
+    for param_re in SENSITIVE_PARAM_REGEXPS:
+        uri = param_re.sub(r"\1\2=<redacted>", uri)
+    return uri
 
 
 class QuieterFileBodyProducer(FileBodyProducer):
EOF
@@ -34,14 +34,33 @@
super().__init__(504, msg)


ACCESS_TOKEN_RE = re.compile(r"(\?.*access(_|%5[Ff])token=)[^&]*(.*)$")
CLIENT_SECRET_RE = re.compile(r"(\?.*client(_|%5[Ff])secret=)[^&]*(.*)$")
SENSITIVE_PARAM_NAMES = [
"access_token",
"client_secret",
"password",
"session",
"sid",
"state",
"code",
"token",
# Add any other parameter names that may contain secrets
]
# Build regex to catch sensitive query params, handling percent-encoding and case-insensitivity
# This pattern matches sensitive param names after '?' or '&', possibly percent-encoded
SENSITIVE_PARAM_REGEXPS = [
re.compile(
r"([?&])(" + p.replace("_", r"(_|%5[fF])") + r")=([^&]*)",
re.IGNORECASE,
)
for p in SENSITIVE_PARAM_NAMES
]


def redact_uri(uri: str) -> str:
"""Strips sensitive information from the uri replaces with <redacted>"""
uri = ACCESS_TOKEN_RE.sub(r"\1<redacted>\3", uri)
return CLIENT_SECRET_RE.sub(r"\1<redacted>\3", uri)
# Iteratively apply all sensitive param redactions
for param_re in SENSITIVE_PARAM_REGEXPS:
uri = param_re.sub(r"\1\2=<redacted>", uri)
return uri


class QuieterFileBodyProducer(FileBodyProducer):
Copilot is powered by AI and may make mistakes. Always verify output.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Federation A-to-device-messages EDU messages sent exactly once to a specific set of devices. Related to E2EE A-Validation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants