Skip to content

Conversation

@bdtech
Copy link
Contributor

@bdtech bdtech commented May 23, 2019

Why: When checking if a redirect is present, an exact match is currently required (with or without the trailing slash). This currently requires two redirects to be created to adequately cover both possible URL's path.

Fix: This change facilitates trailing slash agnostic support - a redirect for path /test is the same as /test/. Query strings are also supported in this manner.

Unit Tests Passed: OK (9 tests, 18 assertions). Also, cURL checks performed to confirm as well.

Addresses Issue #50 @GaryJones @rebeccahum

When checking if a redirect is present, an exact match is currently required (with or without the trailing slash). This change faciliates trailing slash agnostic support - a redirect for path /test is the same as /test/. Query strings are also supported in this manner.

Unit Tests Passed: OK (9 tests, 18 assertions). Also cURL checks performed to confirm as well.
@bdtech bdtech marked this pull request as ready for review May 23, 2019 15:28
bdtech added 2 commits May 23, 2019 14:22
resovled conflict and now will re-apply trailing slash support since upstream branch code was cleaned up (moved into new includes file).
Copy link
Contributor

@GaryJones GaryJones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added feedback.

* @param bool | $trailing_slash_toggle If TRUE, will return URL toggling the trailing slash on/off (i.e add slash if not present).
* @return string | Will return the current URL path based via $_SERVER.
*/
static function get_current_url_path( $trailing_slash_toggle = false ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're adding a new function here. Where are the corresponding unit tests to show that it works?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested numerous URLs with cURL - will proceed to create the unit test. I was confident the function did not add much complexity to risk unexpected results.

require __DIR__ . '/includes/class-wpcom-legacy-redirector.php';

WPCOM_Legacy_Redirector::start();
WPCOM_Legacy_Redirector::start(); No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set your editor to ensure that one clear linespace is left at the end of files, as per PSR-1 / WPCS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. I have the default WordPress style active but need to find this setting in Storm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot 2019-05-24 at 08 19 22

* Returns current URL's relative path including the query string. Optionally add or remove trailig slash to get variant URL.
* @param bool | $trailing_slash_toggle If TRUE, will return URL toggling the trailing slash on/off (i.e add slash if not present).
* @return string | Will return the current URL path based via $_SERVER.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

phpcs --standard=WordPress-Docs for this file gives (amongst others):

449 | ERROR | [x] Expected 2 space(s) before asterisk; 3 found (Squiz.Commenting.DocCommentAlignment.SpaceBeforeStar)
 450 | ERROR | [x] Expected 2 space(s) before asterisk; 3 found (Squiz.Commenting.DocCommentAlignment.SpaceBeforeStar)
 450 | ERROR | [x] There must be exactly one blank line before the tags in a doc comment (Generic.Commenting.DocComment.SpacingBeforeTags)
 451 | ERROR | [x] Expected 2 space(s) before asterisk; 3 found (Squiz.Commenting.DocCommentAlignment.SpaceBeforeStar)
 452 | ERROR | [x] Expected 2 space(s) before asterisk; 3 found (Squiz.Commenting.DocCommentAlignment.SpaceBeforeStar)

(You have extra single spaces before the tabs in four lines.)

I'd also add that there should be no | between the type and the variable name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replicated the comment block from the nearby function. I'll rectify.

*/
static function get_current_url_path( $trailing_slash_toggle = false ) {

$url = parse_url( $_SERVER['REQUEST_URI'], PHP_URL_PATH );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not wp_parse_url()?

You also need to check that the $_SERVER['REQUEST_URI'] is set, and sanitize it if so.

Copy link
Contributor Author

@bdtech bdtech May 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are solid recommendations, I will improve it. Keep in mind the URL is already escaped via get_request_uri() and is not being stored in the DB.

Wanted to initially keep the code changes minimal since this was existing/working functionality then take a second pass time permitting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$_SERVER['REQUEST_URI' is filled by wp_fix_server_vars() by default so it won't be empty. I'll proceed to check just in case.

Copy link
Contributor Author

@bdtech bdtech May 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GaryJones Why is it necessary to manually parse and re-add the query string to the URL? REQUEST_URI already includes the relative path including query_strings.

}

if ( ! empty( $_SERVER['QUERY_STRING'] ) ) {
$url .= '?' . $_SERVER['QUERY_STRING'];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to sanitize this user input?

'http://example.com',
),

'redirect_simple_trailslash' => array(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tests that a FROM string with a trailing slash can be inserted.

How about one with a trailing slash and a querystring?

Or one with two trailing slashes at the end?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested these scenarios with cURL. I'll create additional unit tests.

$url = rtrim( $url, '/' );
}
else {
$url .= "/";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the use of double-quotes here?


// Add the trailing slash if not present; or vice-versa to toggle it.
if ( $trailing_slash_toggle ) {
if ( substr( $url , -1 ) == '/' ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the decision to use == instead of ===?

if ( !$redirect_uri ) {
// trailing slash agnostic support to check if redirect exists by toggling slash based on current URL.
$request_path = apply_filters( 'wpcom_legacy_redirector_request_path', self::get_current_url_path( TRUE ) );
$redirect_uri = self::get_redirect_uri( $request_path );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current flow is to lookup /foo (or /foo/), normalise it, check it's not a WP_Error, parse it to get query params and save them, hash the URL, check the cache for a post ID, get the redirect post, then optionally add querystring args back on to the URL / permalink.

The additional code here then repeats it all again for /foo/ (or /foo).

Have you considered any alternative approaches, where the request URL is amended before any lookups are done?

Copy link
Contributor Author

@bdtech bdtech May 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GaryJones There are a few possibilities.

What about checking if the site’s permalink structure includes a trailing slash? There’s a function to check this user_trailingslashit().

This can then be applied consistently to the request URL being checked through maybe_do_redirect(). You’ll get a match that way the first time, with or without the trailing slash, since a site-wide standard will be enforced.

The caveat is we’d have to decide if we want to support allowing separate redirects for both variants (/foo/ and /foo).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct - lookup what the current permalink has, and change the request URL value to include / not include that, before the wpcom_legacy_redirector_request_path filter has run so that clients can still override our logic, but it happens all before the first lookup.

Here, we make the decision to enforce the trailing slash or not, but let the client make the decision if they agree or not.

Copy link
Contributor Author

@bdtech bdtech May 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GaryJones We would be able to apply this policy of defaulting the trailing slash for maybe_do_redirect(), prior to wpcom_legacy_redirector_request_path, based on the site's permalink rule for trailing slashes. Just as you've outlined.

This should reduce the likelihood of the performance hit of repeating get_redirect_uri() as you mentioned. Essentially we are prioritizing the trailing slash lookup with what should jive with the site's (current) permalink settings. This obviously depends if the redirect was created with the matching trailing slash policy in the first place.

However, the caveat is there will still be a performance hit for actual 404's (that don't have a redirect) since we'd still have to possibly check for both /foo and /foo/

Ideally, the trailing slash policy should be already be enforced by the web server. That's also the most prudent route for SEO.

Copy link
Contributor Author

@bdtech bdtech May 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To improve performance, what about the idea of adding a cache key that stores the actual redirect destination? This would return the value of get_redirect_uri().

If the URL has a query string we could set the TTL lower (HOUR_IN_SECONDS) so it can pick up any possible future changes from the filter wpcom_legacy_redirector_preserve_query_params

This will improve the current flow and not require get_redirect_uri() processing on every subsequent request. We'd immediately check the cache for an exact match based on the current URL to speed things up tremendously.

@GaryJones GaryJones added status: awaiting feedback Awaiting feedback from reporter type: enhancement New feature or request labels May 24, 2019
@GaryJones GaryJones modified the milestones: 1.4.0, Future Release May 24, 2019
@GaryJones
Copy link
Contributor

Since this is going to change a fundamental behaviour of the plugin, I've moved this to a future milestone (likely 2.0.0) - we shouldn't rush this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

status: awaiting feedback Awaiting feedback from reporter type: enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants