Skip to content

Fix safe_url_string to preserve IPv6 brackets in netloc#253

Merged
Gallaecio merged 1 commit intoscrapy:masterfrom
danishashko:fix/safe-url-string-ipv6
Feb 19, 2026
Merged

Fix safe_url_string to preserve IPv6 brackets in netloc#253
Gallaecio merged 1 commit intoscrapy:masterfrom
danishashko:fix/safe-url-string-ipv6

Conversation

@danishashko
Copy link
Contributor

@danishashko danishashko commented Feb 18, 2026

Fixes #193

Problem

safe_url_string manually rebuilds the netloc component from the parts returned by urlsplit(). The hostname attribute always strips square brackets from IPv6 addresses (e.g. [2402:4e00:40:40::2:3b6] becomes 2402:4e00:40:40::2:3b6). The brackets were not restored when rebuilding netloc_bytes, so the reconstructed URL was malformed.

On subsequent calls (e.g. inside Scrapy middlewares reconstructing a Request from the stored URL), urlsplit() on the bracket-less URL parsed the first IPv6 segment as the host and the remainder as a port, raising ValueError: Port could not be cast to integer value.

Fix

Detect IPv6 addresses by checking for : in the hostname and restore the brackets before appending to netloc_bytes. IPv4 addresses and DNS hostnames never contain :, so the check is unambiguous.

Tests

  • Removed http://[2a01:5cc0:1:2::4] from KNOWN_SAFE_URL_STRING_URL_ISSUES (the xfail for this issue) since it now passes.
  • Added four new parametrized cases: the address from the bug report with and without a port, plus ::1 with and without a port and path.

@codecov
Copy link

codecov bot commented Feb 18, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.97%. Comparing base (61db33b) to head (a396d67).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #253   +/-   ##
=======================================
  Coverage   97.96%   97.97%           
=======================================
  Files           9        9           
  Lines         491      493    +2     
  Branches       83       84    +1     
=======================================
+ Hits          481      483    +2     
  Misses          6        6           
  Partials        4        4           
Files with missing lines Coverage Δ
w3lib/url.py 99.57% <100.00%> (+<0.01%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@wRAR wRAR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@Gallaecio Gallaecio merged commit 1ed46a8 into scrapy:master Feb 19, 2026
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

safe_url_string handling IPv6 URLs

3 participants