-
Notifications
You must be signed in to change notification settings - Fork 17
Enhancing URL detection #57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enhances URL detection robustness by adding scheme normalization, improving userinfo blocking, and expanding URL matching capabilities to handle full URLs with paths, ports, and query strings.
Key changes:
- Added normalization for
allowed_schemesto handle various input formats (e.g., "HTTPS://", "http:", " https ") - Extended
block_userinfoto block URLs containing only passwords without usernames - Expanded
_is_url_allowedto properly handle full URLs, path matching with segment boundaries, port matching, query strings, and fragments
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
src/guardrails/checks/text/urls.py |
Added scheme normalization validator, improved userinfo blocking, enhanced _is_url_allowed with comprehensive URL component matching including paths, ports, queries, and fragments, and added detailed docstrings |
tests/unit/checks/test_urls.py |
Added comprehensive test coverage for password-only blocking, scheme normalization, path segment boundaries, scheme-less matching, CIDR blocks, port matching, query/fragment matching, and edge case handling |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Codex Review: Didn't find any major issues. Keep it up! ℹ️ About Codex in GitHubCodex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| DEFAULT_PORTS = { | ||
| "http": 80, | ||
| "https": 443, |
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Limited default port coverage: The DEFAULT_PORTS dictionary only includes mappings for HTTP (80) and HTTPS (443), but the code supports additional schemes like FTP, data, javascript, vbscript, and mailto (as seen in the detection patterns and special scheme handling). FTP typically uses port 21 by default. Consider adding FTP's default port or documenting that only HTTP/HTTPS have default port handling.
| "https": 443, | |
| "https": 443, | |
| "ftp": 21, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| # Security validations - only validate scheme if it was explicitly provided | ||
| if has_explicit_scheme and original_scheme not in config.allowed_schemes: | ||
| return None, f"Blocked scheme: {original_scheme}", has_explicit_scheme |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Skip scheme validation for schemeless URLs
The new scheme check only runs when has_explicit_scheme is true, so bare domains now bypass config.allowed_schemes entirely. With the default allowed_schemes={"https"}, an allow list of ["example.com"], and input text containing just example.com, _validate_url_security accepts the URL (parsed as HTTP) because has_explicit_scheme is false, and _is_url_allowed then marks it allowed. Previously schemeless URLs were validated against the allowed schemes (default https-only), so this change silently permits HTTP links whenever the user omits the scheme, weakening the https-only policy.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WAI. Same comment as before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Handle domain matching | ||
| allowed_domain = allowed_entry.replace("www.", "") | ||
| url_domain = url_host.replace("www.", "") | ||
| allowed_domain = allowed_host.replace("www.", "") |
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using replace("www.", "") removes all occurrences of "www." from the hostname, not just the prefix. This could lead to unexpected behavior. For example, "www.www.example.com" would be treated as equivalent to "example.com" when matching allow list entries. Consider using removeprefix("www.") instead to only remove the "www." prefix, or document this behavior explicitly if it's intentional.
| allowed_domain = allowed_host.replace("www.", "") | |
| allowed_domain = allowed_host.removeprefix("www.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is [nit] or actually beneficial. Extra instances of www. would clearly be a typo and does not add anything malicious. Without removing the extra www. we would have higher unnecessary mismatches
| return False | ||
|
|
||
| url_host = url_host.lower() | ||
| url_domain = url_host.replace("www.", "") |
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using replace("www.", "") removes all occurrences of "www." from the hostname, not just the prefix. This could lead to unexpected behavior. For example, "www.www.example.com" would be treated as equivalent to "example.com" when matching allow list entries. Consider using removeprefix("www.") instead to only remove the "www." prefix, or document this behavior explicitly if it's intentional.
| url_domain = url_host.replace("www.", "") | |
| url_domain = url_host.removeprefix("www.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WAI. Same comment as above
gabor-openai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TY
Making URL detection more robust in response to JS issue 39
allowed_schemesentries for matching_is_url_allowedto handle host only entries by using theallowed_schemesblock_userinfo=trueexample.comdoesn't get blocked when https://example.com is in the allow list. But http://example.com does)