Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RCORE-2174 Bootstrap store is not being reset if initial subscription bootstrap is interrupted by role change #7831

Merged
merged 10 commits into from
Jul 1, 2024
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@

### Fixed
* <How do the end-user experience this issue? what was the impact?> ([#????](https://github.com/realm/realm-core/issues/????), since v?.?.?)
* None.
* Fix data from a previous interrupted bootstrap was potentially being included with the bootstrap data during retry attempt
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't this potentially cause diverging history?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not immediately, but manually removing the extra entries may cause a diverging history error when merged with the server, since it expects those to not be in the local realm.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not immediately, but manually removing the extra entries may cause a diverging history error

Not sure what you mean.

After thinking a bit more, I think it can actually cause orphaned objects not diverging history.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, orphaned objects is what I was trying to say

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kmorkos correct me if i'm wrong, but I think this will just lead to compensating writes rather than diverging history.

But also, this changelog entry is super confusing. The audience for changelog entries are SDK engineers and end users who likely don't know about the pending bootstrap store. Perhaps something like

  • If a sync session were interrupted by a disconnect while downloading a bootstrap more writes may have been made to the database than necessary when the sync session reconnected, and there may be objects stored that do not match the actual state of the server - potentially leading to compensating writes.

Also, I don't think this started in 14.8.0 - I think this started in v12.0.0 https://github.com/realm/realm-core/blob/master/CHANGELOG.md#1200-release-notes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbreams I think we can refer to those objects as orphaned objects as we do in other places, otherwise I agree with your suggestion for changelog entry.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the bug is that the client may end up with objects that the server doesn't think it has (if the object was in the new query's view during the first attempt, but then moved out that query's view before the second attempt).

From there, one of three things could happen:

  • The object moves back into the client's query view at some point in the future, and we are eventually consistent™️
  • The object never moves back into the client's query view, and the client just holds on to this stale view of an object it was never supposed to have indefinitely
  • The client tries modifying the object at some point, at which point they'll get a compensating write because the server interprets it as modifying an object outside of their query view

I think for all intents and purposes @jbreams' description is more accurate than referring to them as "orphaned objects" unless that terminology is used elsewhere to refer to the above scenario

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reworded the changelog entry based on Jonathan's recommendation - hopefully it is clearer now.

and complete bootstraps were potentially not being applied if the session restarted once fully downloaded. ([#7827](https://github.com/realm/realm-core/issues/7827), since 14.8.0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the data would be re-downloaded right?


### Breaking changes
* None.
Expand Down
4 changes: 1 addition & 3 deletions src/realm/sync/client.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -850,7 +850,7 @@ bool SessionImpl::process_flx_bootstrap_message(const SyncProgress& progress, Do
}

try {
process_pending_flx_bootstrap();
process_pending_flx_bootstrap(); // throws
}
catch (const IntegrationException& e) {
on_integration_failure(e);
Expand All @@ -869,8 +869,6 @@ void SessionImpl::process_pending_flx_bootstrap()
if (!m_is_flx_sync_session || m_state != State::Active) {
return;
}
// Should never be called if session is not active
REALM_ASSERT_EX(m_state == SessionImpl::Active, m_state);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this assertion didn't really make sense

auto bootstrap_store = m_wrapper.get_flx_pending_bootstrap_store();
if (!bootstrap_store->has_pending()) {
return;
Expand Down
12 changes: 11 additions & 1 deletion src/realm/sync/noinst/client_impl_base.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1527,6 +1527,16 @@ void Session::cancel_resumption_delay()
if (unbind_process_complete())
initiate_rebind(); // Throws

try {
process_pending_flx_bootstrap(); // throws
}
catch (const IntegrationException& error) {
on_integration_failure(error);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if we have an integration failure here? I guess the client will just continue to resume the session but without applying the bootstrap?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - it's the same if there is an integration failure during activate()

}
catch (...) {
on_integration_failure(IntegrationException(exception_to_status()));
}

m_conn.one_more_active_unsuspended_session(); // Throws
if (m_try_again_activation_timer) {
m_try_again_activation_timer.reset();
Expand Down Expand Up @@ -1733,7 +1743,7 @@ void Session::activate()
m_conn.one_more_active_unsuspended_session(); // Throws

try {
process_pending_flx_bootstrap();
process_pending_flx_bootstrap(); // throws
}
catch (const IntegrationException& error) {
on_integration_failure(error);
Expand Down
21 changes: 10 additions & 11 deletions src/realm/sync/noinst/client_impl_base.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -1550,17 +1550,16 @@ inline void ClientImpl::Session::initiate_rebind()
inline void ClientImpl::Session::reset_protocol_state() noexcept
{
// clang-format off
m_enlisted_to_send = false;
m_bind_message_sent = false;
m_error_to_send = false;
m_ident_message_sent = false;
m_unbind_message_sent = false;
m_unbind_message_send_complete = false;
m_error_message_received = false;
m_unbound_message_received = false;
m_client_error = util::none;

m_upload_progress = m_progress.upload;
m_enlisted_to_send = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did this all change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted - I had reformatted the function when I was playing around with moving the bootstrap apply to this funciton.

m_bind_message_sent = false;
m_error_to_send = false;
m_ident_message_sent = false;
m_unbind_message_sent = false;
m_unbind_message_send_complete = false;
m_error_message_received = false;
m_unbound_message_received = false;
m_client_error = util::none;
m_upload_progress = m_progress.upload;
m_last_version_selected_for_upload = m_upload_progress.client_version;
m_last_download_mark_sent = m_last_download_mark_received;
// clang-format on
Expand Down
8 changes: 4 additions & 4 deletions test/object-store/sync/flx_role_change.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -776,9 +776,8 @@ TEST_CASE("flx: role changes during bootstrap complete successfully", "[sync][fl
}
SECTION("Role change during subscription bootstrap") {
auto realm_1 = Realm::get_shared_realm(config);
/// TODO: update to:
/// bool initial_subscription = GENERATE(false, true);
bool initial_subscription = true;
bool initial_subscription = GENERATE(false, true);

if (initial_subscription) {
auto table = realm_1->read_group().get_table("class_Person");
auto role_col = table->get_column_key("role");
Expand All @@ -800,9 +799,10 @@ TEST_CASE("flx: role changes during bootstrap complete successfully", "[sync][fl
// The test will update the rule to change access from all records to only the employee
// records while a new subscription for all Person entries is being bootstrapped.
update_role(default_rule, {{"role", "employee"}});

// Set up a new bootstrap while offline
realm_1->sync_session()->shutdown_and_wait();
{
// Set up a new bootstrap while offline
auto table = realm_1->read_group().get_table("class_Person");
auto new_subs = realm_1->get_latest_subscription_set().make_mutable_copy();
new_subs.clear();
Expand Down
Loading