Non-space whitespace characters are removed from anchor URL

Leading and trailing whitespace characters are removed from the link value during the removal of space characters, making extracting/following the link fail.
```
my $mech = WWW::Mechanize->new();
$mech->update_html(qq'<a href="\x0b">link</a>');
say length $mech->links->[0]->URI->as_string; # 0
$mech->update_html(qq'<a href="\x{3000}">link</a>');
say length $mech->links->[0]->URI->as_string; # 0
```
According to HTML5 spec, space characters are /[\x09\x0a\x0c\x0d\x20]/:
> https://www.w3.org/TR/html52/infrastructure.html#infrastructure-urls
> A string is a valid URL potentially surrounded by spaces if, after stripping leading and trailing white space from it, it is a valid URL.
> A string is a valid non-empty URL potentially surrounded by spaces if, after stripping leading and trailing white space from it, it is a valid non-empty URL.
> 
> Re: stripping leading and trailing white space
> https://www.w3.org/TR/html52/infrastructure.html#strip-leading-and-trailing-white-space
> When a user agent is to strip leading and trailing white space from a string, the user agent must remove all space characters that are at the start or end of the string.
> 
> Re: space characters
> https://www.w3.org/TR/html52/infrastructure.html#space-characters
> The space characters, for the purposes of this specification, are U+0020 SPACE, U+0009 CHARACTER TABULATION (tab), U+000A LINE FEED (LF), U+000C FORM FEED (FF), and U+000D CARRIAGE RETURN (CR).

`URI->new()` is causing this, as its document says: it removes white space characters (\s,) which depends on a version of Unicode spec each version of Perl confirms.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Non-space whitespace characters are removed from anchor URL #266

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Non-space whitespace characters are removed from anchor URL #266

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions