Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: prevent the backfill from running forever. #1065

Merged
merged 7 commits into from
Oct 24, 2024

Conversation

eddiesshop
Copy link
Contributor

@eddiesshop eddiesshop commented Oct 18, 2024

Description

There's an edge case where an author that no longer exists can still be assigned to a post. This throws the backfill script into an infinite loop, because the respective author-term is never found/created, and so the underlying problem of missing author-term records is never resolved. The infinite loop is started when at the end of the while loop, the script asks for "remaining posts which need author terms" and so it returns the same rows over and over.

This fix addresses this in 2 ways:

  1. If an author is not found, we insert a postmeta record indicating that it should be skipped for processing. This prevents the loop continuously returning the same post for processing.
  2. Checks have been added so that the script can't go beyond what should be the maximum number of rows needing to be addressed.

Deploy Notes

Are there any new dependencies added that should be taken into account when deploying to WordPress.org?
No.

Steps to Test

  1. Create a handful of new posts. Make sure they're published.
  2. Confirm what the maximum User ID is with SELECT MAX(ID) FROM wp_users.
  3. Update the post_author column for those posts to IDs that do not exist in the wp_users table (any ID above MAX(ID)).
  4. Run the command (wp co-authors-plus create-author-terms-for-posts). Be prepared to kill it. Notice how it goes beyond 100%.
  5. Check out the PR.
  6. Re-run the command from Step 4. This time, notice how the posts with non-existing authors have been skipped for processing, and the command finishes running on its own.

There's an edge case where an author that no longer exists can still be assigned to a post. This throws the backfill script into an infinite loop, because the respective author-term is never found/created, and so the underlying problem of missing author-term records is never resolved. The infinite loop is started when at the end of the while loop, the script asks for "remaining posts which need author terms" and so it returns the same rows over and over.

This fix addresses this in 2 ways:
1. If an author is not found, we look for the most prolific author on the site and assign the posts to them. If there is no prolific author, one is created. And if one can't be created, an exception is thrown so that the script can't proceed.
2. Checks have been added so that the script can't go beyond what should be the maximum number of rows needing to be addressed.
php/class-wp-cli.php Outdated Show resolved Hide resolved
php/class-wp-cli.php Outdated Show resolved Hide resolved
@brookegs
Copy link

brookegs commented Oct 18, 2024 via email

This approach is more faithful with what the current condition on the site would be anyway. If the post author doesn't exist on the site, you wouldn't be able to see the particular post in question in an author archive anyway. Skipping the post instead of reassigning it to the first available admin user is a cleaner solution.
php/class-wp-cli.php Outdated Show resolved Hide resolved
php/class-wp-cli.php Outdated Show resolved Hide resolved
Copy link
Member

@naxoc naxoc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to go!

@iuravic iuravic self-requested a review October 24, 2024 16:30
@leogermani leogermani merged commit fc8067a into develop Oct 24, 2024
9 checks passed
@leogermani leogermani deleted the fix/author-term-backfill branch October 24, 2024 21:23
leogermani added a commit that referenced this pull request Nov 5, 2024
* Increase composer.json required PHP version to 7.4

* Update README to match required PHP version 7.4

* Remove PHP 7.1 from integration tests

* PHP 7.4: Use array_key_first()

Slightly cleaner to use the native function for getting the first key's value from an array.

* PHP 7.4: Use instanceof

* PHP 7.4: Use null coalescing

* PHP 7.4: Add return types

* PHP 7.4: Collapse nested dirname() calls

* CI: Remove MySQL workaround for PHP <= 7.3

* Increase WordPress required version to 5.9

* Update integration tests to use WordPress 5.9

* Remove unnecessary phpunit versions for WordPress 5.9

* CI: Update tested versions

Doesn't make sense to test WP versions would unsupported PHP versions (e.g. WP 5.9 with PHP 8.3).

* Composer: Update dev-dependencies

* PHPCS: Consolidate config into config file

The PHPCS in the composer.json was duplicating but obscuring some aspects of what was in the `phpcs.xml.dist` file. This change consolidates the Composer commands and the config file.

* Support for Yoast %%name%% variable

* CI: Update deploy.yml

Increase actions/checkout dependency version.

* CI: Update integrate.yml action versios

* Contents edited to consolidate instructions within the Wiki and bring more attention to its existence (#1055)

* add: created a new CLI cmd to backfill missing author terms for posts. (#1060)

* add: created a new CLI cmd to backfill missing author terms for posts.

* add: adding some comments to the new and old backfill commands.

The comments are meant to clarify the key differences between the two commands, and that the new one should be preferred over the old one.

* add: batching is the default, pass `--unbatched` flag to run w/o it.

---------

Co-authored-by: Gary Jones <[email protected]>
Co-authored-by: Alec Geatches <[email protected]>

* Fix/missing wp user type (#988)

* fix: preventing loss of fact that a guest author might also be a WP_User

* fix: making the update operation dependent on $append flag.

This might be a problematic decision. But the way I justify this change is that if you are appending co-authors, there may already be a WP_User set as the author. So we don't really have to care whether one is passed or not. Because of this, we do not need to forcibly return a `false` flag since that is confusing to the caller, especially because we actually do save the guest authors which are given in the call! Instead, if the $append flag is false, we should expect that at least one user will be a WP_User. In that case, if none is passed in, then there is a mismatch of the intended authors. Because now, the `wp_posts.post_author` column will have an old `wp_users.ID` which remains set and most likely isn't the intent of the caller.

* fix: attempting DB update only when $new_author is not empty.

Also, returning the actual response from the DB, to make this call even more accurate in terms of what is actually happen at the DB layer.

* fix: need to ensure pure WP_User is processed correctly as post_author.

A pure WP_User (i.e. a WP_User that IS NOT linked to a Guest Author) needs to be handled specially.

* fix: a necessary refactor of the `get_coauthor_by` function.

This refactor is absolutely necessary in order for all the previous fixes to work as expected. Without this fix, what happens is that when you use `get_coauthor_by` by searching with a Guest Author, if that Guest Author has a valid link to a WP_User, it is summarily ignored. Functions like `add_coauthors` expect at least one coauthor to be a valid WP_User so that the `wp_posts.post_author` column can be appropriately updated. The only case where this function is returning an expected value is when you search by the WP_User first. When it arrives at `$guest_author = $this->guest_authors->get_guest_author_by( $key, $value, $force );`, `$guest_author === false`. It is then forced to move to the switch statement to find a user via their WP_User data.

With this refactor, `get_coauthor_by` will now check if the `linked_account` attribute is set. If so, it will attempt to find the corresponding user for the Guest Account. It still gives priority to returning a Guest Author. When a Guest Author is not found, it will search for a WP_User. If found, it will also search to see if a linked Guest Author account exists. If it does, it will return that Guest Author object instead, without losing the fact that this account also has a WP_User associated with it.

* fix: returning a plain WP_User if guest authors is not enabled.

I forgot to run tests on my previous commit. This satisfies the test Test_CoAuthors_Plus::test_get_coauthor_by_when_guest_authors_not_enabled which is expecting a WP_User when the plugin is not enabled.

* feat: adding additional tests for co-authors-plus.php functionality.

* fix: preventing loss of fact that a guest author might also be a WP_User

* fix: making the update operation dependent on $append flag.

This might be a problematic decision. But the way I justify this change is that if you are appending co-authors, there may already be a WP_User set as the author. So we don't really have to care whether one is passed or not. Because of this, we do not need to forcibly return a `false` flag since that is confusing to the caller, especially because we actually do save the guest authors which are given in the call! Instead, if the $append flag is false, we should expect that at least one user will be a WP_User. In that case, if none is passed in, then there is a mismatch of the intended authors. Because now, the `wp_posts.post_author` column will have an old `wp_users.ID` which remains set and most likely isn't the intent of the caller.

* fix: attempting DB update only when $new_author is not empty.

Also, returning the actual response from the DB, to make this call even more accurate in terms of what is actually happen at the DB layer.

* fix: need to ensure pure WP_User is processed correctly as post_author.

A pure WP_User (i.e. a WP_User that IS NOT linked to a Guest Author) needs to be handled specially.

* fix: a necessary refactor of the get_coauthor_by function.

This refactor is absolutely necessary in order for all the previous fixes to work as expected. Without this fix, what happens is that when you use `get_coauthor_by` by searching with a Guest Author, any link to a WP_User the Guest Author may have is summarily ignored. Functions like `add_coauthors` expect at least one coauthor to be a valid WP_User so that the `wp_posts.post_author` column can be appropriately updated. The only case where this function is currently returning an expected value is when you search by a WP_User account/field first. When it arrives at `$guest_author = $this->guest_authors->get_guest_author_by( $key, $value, $force );`, `$guest_author === false`. It is then forced to move to the switch statement to find a user via their WP_User data.

With this refactor, `get_coauthor_by` will now check if the `linked_account` attribute is set. If so, it will then attempt to find the corresponding WP_User for the Guest Author. Crucially, it still gives priority to returning a Guest Author. When a Guest Author is not found, it will then attempt to search for a WP_User. If found, it will also search to see if a linked Guest Author account exists. If it does, it will return that Guest Author object instead, without losing the fact that this account also has a WP_User associated with it.

* fix: renaming user_login's for new authors introduced for new tests.

These user_login's were causing other tests to fail because you cannot create another user with the same user_login.

* fix: removing use of assertObjectHasProperty

Older version of PHPUnit do not have this function available. Updating to workaround: `assertTrue( property_exists( $obj, 'prop' ) )`

* fix: typo in function call

* fix: using strict comparison instead of function call `is_null`

* fix: using more descriptive assertion for array validation.

* fix: using `create_and_get` post factory func, to avoid query call.

* fix: removing use of newly introduced is_wp_user property.

Relying instead on wp_user property which has already been used before.

* fix: PHPCS fixes and added commentary/descriptions to docblocks.

* fix: some small quick fixes for formatting and documentation

* fix: removing repetitive test.

* add: new assertion func that determines if an obj is not a WP_User class

* add: new assertion to help determine if a Post has the correct Authors

* add: new test solely for CoAuthorPlus::get_coauthor_by().

By fully testing CoAuthorPlus::get_coauthor_by(), we can remove some repetitive assertions that don't directly relate to what's being tested.

* fix: was passing string values when I should've been passing Author objs

* fix: using a data provider for very similar tests

---------

Co-authored-by: Gary Jones <[email protected]>

* bumping version to 3.6.2 (#1064)

* bumping version to 3.6.2

* Update CHANGELOG.md

Co-authored-by: Gary Jones <[email protected]>

* add changelog link

---------

Co-authored-by: Gary Jones <[email protected]>

* fix: prevent the backfill from running forever. (#1065)

* fix: prevent the backfill from running forever.

There's an edge case where an author that no longer exists can still be assigned to a post. This throws the backfill script into an infinite loop, because the respective author-term is never found/created, and so the underlying problem of missing author-term records is never resolved. The infinite loop is started when at the end of the while loop, the script asks for "remaining posts which need author terms" and so it returns the same rows over and over.

This fix addresses this in 2 ways:
1. If an author is not found, we look for the most prolific author on the site and assign the posts to them. If there is no prolific author, one is created. And if one can't be created, an exception is thrown so that the script can't proceed.
2. Checks have been added so that the script can't go beyond what should be the maximum number of rows needing to be addressed.

* fix: obtaining the first available admin user account instead.

* fix: updating output to reflect that the ID belongs to an Admin account.

* fix: this function should be private

* fix: switching tactic to skipping posts that have missing post_author.

This approach is more faithful with what the current condition on the site would be anyway. If the post author doesn't exist on the site, you wouldn't be able to see the particular post in question in an author archive anyway. Skipping the post instead of reassigning it to the first available admin user is a cleaner solution.

* fix: removed unused references from a past commit

* fix: appeasing PHPCS

* Bump versions to 3.6.3 (#1070)

---------

Co-authored-by: Alec Geatches <[email protected]>
Co-authored-by: Gary Jones <[email protected]>
Co-authored-by: claudiulodro <[email protected]>
Co-authored-by: Yoli Hodde <[email protected]>
Co-authored-by: Eddie Carrasco <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants