Skip to content

Conversation

@steven10a
Copy link
Collaborator

@steven10a steven10a commented Nov 20, 2025

Making URL detection more robust in response to JS issue 39

  • Normalized allowed_schemes entries for matching
  • Reworked _is_url_allowed to handle host only entries by using the allowed_schemes
  • Added blocking URLs that contain password when block_userinfo=true
  • Better port matching with validation
  • Support scheme less inputs for improved usability (example.com doesn't get blocked when https://example.com is in the allow list. But http://example.com does)
  • Expanded test coverage

Copilot AI review requested due to automatic review settings November 20, 2025 16:21
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copilot finished reviewing on behalf of steven10a November 20, 2025 16:25
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances URL detection robustness by adding scheme normalization, improving userinfo blocking, and expanding URL matching capabilities to handle full URLs with paths, ports, and query strings.

Key changes:

  • Added normalization for allowed_schemes to handle various input formats (e.g., "HTTPS://", "http:", " https ")
  • Extended block_userinfo to block URLs containing only passwords without usernames
  • Expanded _is_url_allowed to properly handle full URLs, path matching with segment boundaries, port matching, query strings, and fragments

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/guardrails/checks/text/urls.py Added scheme normalization validator, improved userinfo blocking, enhanced _is_url_allowed with comprehensive URL component matching including paths, ports, queries, and fragments, and added detailed docstrings
tests/unit/checks/test_urls.py Added comprehensive test coverage for password-only blocking, scheme normalization, path segment boundaries, scheme-less matching, CIDR blocks, port matching, query/fragment matching, and edge case handling

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@steven10a steven10a requested a review from Copilot November 20, 2025 16:35
@steven10a
Copy link
Collaborator Author

@codex review

Copilot finished reviewing on behalf of steven10a November 20, 2025 16:38
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@steven10a steven10a requested a review from Copilot November 20, 2025 16:59
@steven10a
Copy link
Collaborator Author

@codex review

Copilot finished reviewing on behalf of steven10a November 20, 2025 17:01
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. Keep it up!

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@steven10a steven10a requested a review from Copilot November 20, 2025 17:14
Copilot finished reviewing on behalf of steven10a November 20, 2025 17:20
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@steven10a steven10a requested a review from Copilot November 20, 2025 18:36
@steven10a
Copy link
Collaborator Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copilot finished reviewing on behalf of steven10a November 20, 2025 18:40
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@steven10a
Copy link
Collaborator Author

@codex review

@steven10a steven10a requested a review from Copilot November 20, 2025 18:56
Copilot finished reviewing on behalf of steven10a November 20, 2025 19:02
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


DEFAULT_PORTS = {
"http": 80,
"https": 443,
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Limited default port coverage: The DEFAULT_PORTS dictionary only includes mappings for HTTP (80) and HTTPS (443), but the code supports additional schemes like FTP, data, javascript, vbscript, and mailto (as seen in the detection patterns and special scheme handling). FTP typically uses port 21 by default. Consider adding FTP's default port or documenting that only HTTP/HTTPS have default port handling.

Suggested change
"https": 443,
"https": 443,
"ftp": 21,

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit]

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@steven10a steven10a requested a review from Copilot November 20, 2025 19:23
@steven10a
Copy link
Collaborator Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +253 to +255
# Security validations - only validate scheme if it was explicitly provided
if has_explicit_scheme and original_scheme not in config.allowed_schemes:
return None, f"Blocked scheme: {original_scheme}", has_explicit_scheme

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Skip scheme validation for schemeless URLs

The new scheme check only runs when has_explicit_scheme is true, so bare domains now bypass config.allowed_schemes entirely. With the default allowed_schemes={"https"}, an allow list of ["example.com"], and input text containing just example.com, _validate_url_security accepts the URL (parsed as HTTP) because has_explicit_scheme is false, and _is_url_allowed then marks it allowed. Previously schemeless URLs were validated against the allowed schemes (default https-only), so this change silently permits HTTP links whenever the user omits the scheme, weakening the https-only policy.

Useful? React with 👍 / 👎.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WAI. Same comment as before.

Copilot finished reviewing on behalf of steven10a November 20, 2025 19:29
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

# Handle domain matching
allowed_domain = allowed_entry.replace("www.", "")
url_domain = url_host.replace("www.", "")
allowed_domain = allowed_host.replace("www.", "")
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using replace("www.", "") removes all occurrences of "www." from the hostname, not just the prefix. This could lead to unexpected behavior. For example, "www.www.example.com" would be treated as equivalent to "example.com" when matching allow list entries. Consider using removeprefix("www.") instead to only remove the "www." prefix, or document this behavior explicitly if it's intentional.

Suggested change
allowed_domain = allowed_host.replace("www.", "")
allowed_domain = allowed_host.removeprefix("www.")

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is [nit] or actually beneficial. Extra instances of www. would clearly be a typo and does not add anything malicious. Without removing the extra www. we would have higher unnecessary mismatches

return False

url_host = url_host.lower()
url_domain = url_host.replace("www.", "")
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using replace("www.", "") removes all occurrences of "www." from the hostname, not just the prefix. This could lead to unexpected behavior. For example, "www.www.example.com" would be treated as equivalent to "example.com" when matching allow list entries. Consider using removeprefix("www.") instead to only remove the "www." prefix, or document this behavior explicitly if it's intentional.

Suggested change
url_domain = url_host.replace("www.", "")
url_domain = url_host.removeprefix("www.")

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WAI. Same comment as above

@gabor-openai gabor-openai self-requested a review November 21, 2025 01:08
Copy link
Collaborator

@gabor-openai gabor-openai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TY

@gabor-openai gabor-openai merged commit bf067db into main Nov 21, 2025
9 checks passed
@gabor-openai gabor-openai deleted the dev/steven/url_tlds branch November 21, 2025 17:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants