msglist: Fetch more messages despite previous muted batch by sm-sayedi · Pull Request #1989 · zulip/zulip-flutter

sm-sayedi · 2025-11-15T07:57:18Z

Fixes: #1256

Rebased atop #2162.

Screen recordings

First batch - Empty

Before

Empty.first.batch.-.Before.mov

After

Empty.first.batch.-.After.mov

First batch - Less than a screenful

Before

First.batch.with.few.messages.-.Before.mov

After

First.batch.with.few.messages.-.After.mov

chrisbobbe

Thanks! Here's a review of the first three commits:

542996e msglist [nfc]: Mention fetchNewer in the MessageListView dartdoc
ce56456 msglist [nfc]: Add a new point to _MessageSequence.messages dartdoc
3c10bed msglist [nfc]: Remove one nested try block in _fetchMore

and a partial review of the fourth:

03d61ca msglist: Fetch newer messages despite previous muted batch

For that fourth commit, can you say briefly in the commit message why it's not a complete fix for the issue, to help orient the reader to what comes next?

chrisbobbe · 2025-11-21T00:08:05Z


      processResult(result);
+    } catch (e) {
+      hasFetchError = true;


msglist [nfc]: Remove one nested try block in `_fetchMore` The nested `try` block doesn't seem to be making any difference, so good to remove.

This isn't NFC: now, hasFetchError is about what happens when data is fetched and processed (with processResult), not just about what happens when data is fetched. That doesn't seem desirable, just from looking at its name.

Indeed, thanks for the catch. Dropped the commit.

chrisbobbe · 2025-11-21T01:09:37Z

+  /// The ID of the oldest known message so far in this narrow.
+  ///
+  /// This will be `null` if no messages of this narrow are fetched yet.
+  /// Having a non-null value for this doesn't always mean [haveOldest] is `true`.
+  ///
+  /// The related message may not appear in [messages] because it
+  /// is muted in one way or another.
+  int? get oldMessageId => _oldMessageId;


The ID of the oldest known message so far in this narrow.

There isn't necessarily a message with this ID in the narrow, though, right? In a quick skim, I don't see event-handling code to update _oldMessageId when the corresponding message is deleted or moved out of the narrow. We should avoid saying the message is in the narrow when it might not be.

We should avoid saying the message is in the narrow when it might not be.

Some nits in this revision:

The dartdoc summary line still says the inaccurate thing.

"A non-null value for this doesn't always mean [haveOldest] is true." → This is written to prevent a misunderstanding that I think isn't very likely; does that seem right? Maybe instead of this line, we can clarify more directly by making this getter's name more explicit, like oldestFetchedMessageId (which might be helpful in any case).

I'd like to remove "that's fine for what this is used for" because it can't be verified except by reading the code that consumes it. (An alternative would be to restrict its allowed usages by adding some words to the dartdoc.)

A few wording nits for clarity; see below.

Suggested change

/// The ID of the oldest known message so far in this narrow.

///

/// This will be `null` if no messages of this narrow are fetched yet.

/// Having a non-null value for this doesn't always mean [haveOldest] is `true`.

///

/// The related message may not appear in [messages] because it

/// is muted in one way or another.

int? get oldMessageId => _oldMessageId;

/// The ID of the oldest message fetched so far in this narrow.

///

/// This is used as the anchor for fetching the next batch of older messages

/// and will be `null` if no messages of this narrow have been fetched yet.

///

/// A message with this ID might not appear in [messages]:

/// - The message may be in a muted conversation.

/// - The message may have been moved or deleted after it was fetched.

int? get oldestFetchedMessageId => _oldestFetchedMessageId;

(Similarly for the corresponding newet-fetched-message code.)

gnprice

Thanks!

Similar to #1951 (review), here's some high-level comments before I go on vacation.

gnprice · 2025-11-26T00:39:14Z

        // TODO perhaps offer mark-as-read even when not done fetching?
-        MarkAsReadWidget(narrow: widget.narrow),
+        if (model.messages.isNotEmpty)
+          MarkAsReadWidget(narrow: widget.narrow),


What's the relationship of this change to the main changes happening in this commit? What's the user-visible change in behavior it causes?

So with the new changes in the commit, if the initial newest batch is all muted (model.messages.isEmpty), and then when older messages are being fetched, the "Mark as read" widget will be shown along with a progress indicator and no messages, so I thought it may be a good idea to hide "Mark as read" until there are visible messages populated.

Before (this one-line change) After (this one-line change)

Hmm, I see.

What does that situation look like before this branch / before even the first batch is fetched? I believe we show a different progress indicator (centered), and don't show the mark-read button.

Can we have that behavior continue when we've had some fetch requests finish, but haven't yet actually found any messages? I think I'd like to think of that state as equivalent to when the initial fetch is still going.

IOW I'd like to think of it as "still working on fetching messages", like in the doc comment on MessageListView.fetched (in main):

/// Whether [messages] and [items] represent the results of a fetch. /// /// This allows the UI to distinguish "still working on fetching messages" /// from "there are in fact no messages here". bool get fetched => switch (_status) { FetchingStatus.unstarted || FetchingStatus.fetchInitial => false, _ => true, };

Yeah, I think that will be an improvement. It will also avoid the abrupt change from a centered progress indicator to a bottom-aligned indicator.

gnprice · 2025-11-26T00:44:02Z

+  FetchingInitialStatus _fetchInitialStatus = .unstarted;
+  FetchingMoreStatus _fetchOlderStatus = .unstarted;
+  FetchingMoreStatus _fetchNewerStatus = .unstarted;


Another benefit of this change is that if there's a backoff in one direction, it will not affect the other one.

Hmm — that sounds like a problem, not a benefit. 🙂 The fetches are all going to the same server, so if one type of fetch has trouble that suggests we should hold off on our next requests, then the same need applies to requests at the other end of the list.

(In fact ideally we'd be sharing backoff information across all requests on a given account. See also #946. But these are potentially some of the more frequent requests, so it's good that we at least share it among these.)

More broadly:

But imagine if the number of messages in the initial batch occupies less than a screenful, and then `fetchOlder` returns no messages or a few messages that combined with the initial batch messages are still less than a screenful; in that case, there will be no change in the scroll metrics to call `fetchNewer`.

This change feels like the wrong layer for solving this problem.

When the fetchOlder returns, that will notify the view-model's listeners, right? What if we have the _MessageListState react to that by looking to see if more fetching is needed, and calling fetchOlder/fetchNewer if so?

IOW, we could have its _modelChanged method call the same logic that's at the end of _handleScrollMetrics. Probably that means pulling that part out as its own smaller helper method.

gnprice · 2025-11-26T00:54:05Z

+    } while (visibleMessageCount < kMessageListFetchBatchSize / 2
+             && this.generation == generation);


If we have the widget state react in the way I suggested in my last comment just now, does that also let us skip these loops? We'd have the state retain responsibility for taking the initiative to call fetchOlder and fetchNewer when potentially needed.

Yeah, we can do that; removed them in the new revision.

sm-sayedi · 2026-01-09T21:29:45Z

Thanks @chrisbobbe and @gnprice for the previous reviews. Pushed a new revision, PTAL.

sm-sayedi · 2026-01-09T21:33:56Z

+      check(messageListItemCount(tester)).equals(501);
    });

    testWidgets('observe double-fetch glitch', (tester) async {


This test case is now very similar to the test case above ("basic") because of the new double-fetch glitch in _modelChanged. Should we remove this one then?

sm-sayedi · 2026-01-09T21:41:37Z

+      check(messageListItemCount(tester)).equals(2 + 2 + 98);
+    });
+
+    testWidgets('mid-history and fetch-older with too few messages, fetch-newer request is made', (tester) async {


This test is for handling an edge case, the handling of which required a lot of changes in the first revision of this PR, before Greg suggested an alternative, straightforward method of solving that problem. I explained that edge case in the description of a commit message, which is not included now.

This change is necessary when there is a need to fetch more messages in both directions, older and newer, and when fetching in one direction avoids fetching in another direction at the same time, because of the `if (busyFetchingMore) return` line in both `fetchOlder` and `fetchNewer`. This scenario happens when a conversation is opened in its first unread, such that the number of messages in the initial batch is so low (because they're muted in one way or another) that it's already past the certain point where the scroll metrics listener in the widget code triggers both `fetchOlder` and `fetchNewer`. In 2025-11, that code first calls `fetchOlder` then `fetchNewer`, and for the reason mentioned above, `fetchNewer` will not fetch any new messages. But that's fine, because as soon as older messages from `fetchOlder` arrives, there will be a change in the scroll metrics, so `fetchNewer` will be called again, fetching new messages. But imagine if the number of messages in the initial batch occupies less than a screenful, and then `fetchOlder` returns no messages or a few messages that combined with the initial batch messages are still less than a screenful; in that case, there will be no change in the scroll metrics to call `fetchNewer`. With the three fetch request types being separated, especially the two request types for older and newer messages, each direction can fetch more messages independently without interfering with one another.

chrisbobbe

Thanks! Comments below, from reading everything except the tests.

chrisbobbe · 2026-01-24T00:18:47Z

-      // TODO: This ends up firing a second time shortly after we fetch a batch.
-      //   The result is that each time we decide to fetch a batch, we end up
-      //   fetching two batches in quick succession.  This is basically harmless
-      //   but makes things a bit more complicated to reason about.
-      //   The cause seems to be that this gets called again with maxScrollExtent
-      //   still not yet updated to account for the newly-added messages.
+      // This ends up firing a second time shortly after we fetch a batch while
+      // there is a fling going on.
+      // The result is that each time we decide to fetch a batch, we end up
+      // fetching two batches in quick succession. This is basically harmless
+      // but makes things a bit more complicated to reason about.
+      // The cause is that this gets called again with minScrollExtent
+      // still not yet updated to account for the newly-added messages.
+      // This relates to how [SchedulerBinding] executes different tasks when
+      // producing a new frame, like first executing transient callbacks
+      // (typically ticking animations) followed by persistent callbacks
+      // (typically the build/layout/paint pipeline), and so on.
+      // So when there is a new message batch received, the related widgets are
+      // marked dirty for the next frame. With the ongoing fling, the underlying
+      // animation registers transient callback(s) for [ScrollPosition.setPixels]
+      // to be executed in the transient callbacks phase at the start of
+      // the frame. It will then notify its listeners, eventually calling
+      // `_scrollChanged` and in turn current method with old minScrollExtent,
+      // causing the second batch fetch. Then in the persistent callbacks phase,
+      // minScrollExtent will be updated, effective in the next frame.


msglist [nfc]: Explain the reasoning behind the double-fetch glitch

Ideally we'd fix the glitch, right? (The old comment is a TODO, saying "This is basically harmless but makes things a bit more complicated to reason about.")

I haven't digested the new comment yet (I'm kind of getting lost in it), but can we use the details you've learned to fix the glitch? In that case I think a new issue is a better place to put them, and we can just point the existing TODO to that issue (and fix anything that's wrong; I see you've done s/maxScrollExtent/minScrollExtent/ which might have been intentional).

Filed #2104.

chrisbobbe · 2026-01-24T00:49:12Z

    return findScrollView(tester).controller;
  }

+  int? messageListItemCount(WidgetTester tester) =>


msglist test [nfc]: Move a few helpers to the top, for reusability These are: itemCount, findPlaceholder, and findLoadingIndicator. Also, rename itemCount to messageListItemCount, to make it easier what it is about in the upcoming tests.

Commit-message nit: I think are some missing words in that last sentence?

I’m not sure about the missing words — the phrasing may have been unclear. Reworded it a little; hope that’s better now 🙂

Also, rename itemCount to messageListItemCount, to make it clearer what it refers to in the upcoming tests.

chrisbobbe · 2026-01-24T01:31:19Z

+  /// The ID of the oldest known message so far in this narrow.
+  ///
+  /// This will be `null` if no messages of this narrow are fetched yet.
+  /// Having a non-null value for this doesn't always mean [haveOldest] is `true`.
+  ///
+  /// The related message may not appear in [messages] because it
+  /// is muted in one way or another.
+  int? get oldMessageId => _oldMessageId;


We should avoid saying the message is in the narrow when it might not be.

Some nits in this revision:

The dartdoc summary line still says the inaccurate thing.

"A non-null value for this doesn't always mean [haveOldest] is true." → This is written to prevent a misunderstanding that I think isn't very likely; does that seem right? Maybe instead of this line, we can clarify more directly by making this getter's name more explicit, like oldestFetchedMessageId (which might be helpful in any case).

I'd like to remove "that's fine for what this is used for" because it can't be verified except by reading the code that consumes it. (An alternative would be to restrict its allowed usages by adding some words to the dartdoc.)

A few wording nits for clarity; see below.

Suggested change

/// The ID of the oldest known message so far in this narrow.

///

/// This will be `null` if no messages of this narrow are fetched yet.

/// Having a non-null value for this doesn't always mean [haveOldest] is `true`.

///

/// The related message may not appear in [messages] because it

/// is muted in one way or another.

int? get oldMessageId => _oldMessageId;

/// The ID of the oldest message fetched so far in this narrow.

///

/// This is used as the anchor for fetching the next batch of older messages

/// and will be `null` if no messages of this narrow have been fetched yet.

///

/// A message with this ID might not appear in [messages]:

/// - The message may be in a muted conversation.

/// - The message may have been moved or deleted after it was fetched.

int? get oldestFetchedMessageId => _oldestFetchedMessageId;

(Similarly for the corresponding newet-fetched-message code.)

chrisbobbe · 2026-01-24T01:41:40Z

+  /// This makes sure there is at least one non-muted message fetched; if any.
+  /// It may do so my repeatedly calling [fetchOlder] and [fetchNewer].


Suggested change

/// This makes sure there is at least one non-muted message fetched; if any.

/// It may do so my repeatedly calling [fetchOlder] and [fetchNewer].

/// If the results don't include at least one non-muted message,

/// this will call [fetchOlder] and/or [fetchNewer]

/// until one is found or the narrow's oldest and newest messages are reached.

chrisbobbe · 2026-01-24T01:52:07Z

+  /// This also may or may not represent all the message history that
+  /// conceptually belongs in this narrow because some messages might be
+  /// muted in one way or another and they may not appear in the message list.


How about:

Suggested change

/// This also may or may not represent all the message history that

/// conceptually belongs in this narrow because some messages might be

/// muted in one way or another and they may not appear in the message list.

/// Also, messages may be excluded if they are in muted conversations.

chrisbobbe · 2026-01-24T01:57:28Z

+    while (messages.isEmpty && !(haveOldest && haveNewest)) {
+      await fetchOlder(partOfInitialFetch: true);
+      if (messages.isNotEmpty) break;
+      await fetchNewer(partOfInitialFetch: true);
+    }


We probably also want to exit the loop if fetchNewer finds some non-muted messages.

I think that’s already covered by the loop condition — it breaks as soon as messages aren’t empty 🙂 But if that’s good for explicitness, I’ll be happy to add it.

Ah yeah you're right, that makes sense.

chrisbobbe · 2026-01-24T02:10:05Z

-      if (this.generation == generation) {
+      if (this.generation == generation && !partOfInitialFetch) {


In the initial-fetch case, if one of these fetch-older/fetch-newer requests fails, I think we still want to use BackoffMachine to delay the next attempt, right?

When writing this, I was thinking about using backoff, but then looking at #945 (PR #1050) where the backoff logic was introduced, the motivation there was to avoid the retry storm caused by a failed fetchOlder (or fetchNewer) request in the widgets code, specifically by the listener for the scroll state. In this (initial-fetch) case though, there is no code path (at least as of now) to cause a retry storm. The while loop will break promptly whenever one of the fetch-older or fetch-newer requests fails. So I think handling of this is similar to the case when the initial-fetch request fails. How about adding a TODO pointing to #2085?

chrisbobbe · 2026-01-24T02:26:07Z

+      // This relates to `_modelChanged` (and in turn the current method) being
+      // called right away as a listener of the model, before the next frame
+      // being drawn with the newly-added messages, updating minScrollExtent.


Interesting. If we want this condition check to run after a frame is drawn with the newly-added messages, how about putting this model.fetchOlder call in a SchedulerBinding.instance.addPostFrameCallback? I wonder if that might actually be the fix for both double-fetch glitches?

Yeah, addPostFrameCallback can avoid these glitches. Fix included in the new revision.

sm-sayedi · 2026-01-27T21:24:42Z

Thanks Chris for the review. Pushed new changes, PTAL.

chrisbobbe

Thanks! A few comments below, including one that should help reduce complexity. 🙂

chrisbobbe · 2026-02-06T21:41:41Z

-        if (itemCount(tester)! > 101) break; // The loading indicator appeared.
+        if (connection.takeRequests().singleOrNull != null) break;


Would .isNotEmpty be equivalent to .singleOrNull != null here?

chrisbobbe · 2026-02-06T21:45:21Z

    });

-    testWidgets('observe double-fetch glitch', (tester) async {
+    testWidgets('do not observe double-fetch glitch', (tester) async {


Let's add a comment like

// Regression test for: https://github.com/zulip/zulip-flutter/issues/2104

at the top of the test body.

chrisbobbe · 2026-02-06T21:51:58Z

+    while (messages.isEmpty && !(haveOldest && haveNewest)) {
+      await fetchOlder(partOfInitialFetch: true);
+      if (messages.isNotEmpty) break;
+      await fetchNewer(partOfInitialFetch: true);
+    }


Ah yeah you're right, that makes sense.

chrisbobbe · 2026-02-06T22:26:27Z

+    if (model.messages.isEmpty && model.haveOldest && model.haveNewest) {
      // If the fetch came up empty, there's nothing to read,
      // so opening the keyboard won't be bothersome and could be helpful.
      // It's definitely helpful if we got here from the new-DM page.
      MessageListPage.ancestorOf(context)
        .composeBoxState?.controller.requestFocusIfUnfocused();
    }
-    _prevFetched = model.fetched;


Interesting; I agree that it does seem better to not "request-focus-if-unfocused" except when we know there aren't any messages to show in the narrow's entire history.

It looks like the condition on _prevFetched was to make sure this "autofocus" behavior was only triggered once per message list. With that removed, I think that means we're allowing an "autofocus" multiple times: for example, each time addOutboxMessage is called during a time when the whole history is fetched and there are no visible messages. I think we probably don't want that? If you'd like to make a change here, can you do it in separate commit(s)?

Possibly the stateful variable should have different semantics, for example with a name like _hasAutofocused rather than trying to make it about the state of message fetches.

Actually, this change was necessary in one of the first revisions, where there was no loop in the fetchInitial method for finding at least one non-muted message. Instead, at that time, we relied on fetchOlder/Newer to find one. In that case, when the initial batch came up empty, the compose box would autofocus even though technically there were more messages to fetch. But right now where we have the loop in fetchInitial, model.fetched would become true if either there is at least one message or there are no messages in the history at all. So we'll not autofocus unnecessarily.

But if the change still seems safer, I'll be happy to add it.

I think something like this could be helpful, but done in a separate commit (before or after):

if (model.messages.isEmpty && model.haveNewest && model.haveOldest && !_hasAutofocused) { // If there are no messages to show in the whole history, // opening the keyboard won't be bothersome and could be helpful. // It's definitely helpful if we got here from the new-DM page. MessageListPage.ancestorOf(context) .composeBoxState?.controller.requestFocusIfUnfocused(); _hasAutofocused = true; }

chrisbobbe · 2026-02-07T00:52:14Z

+    if (model.fetched) {
+      SchedulerBinding.instance.addPostFrameCallback((_) {
+        if (scrollController.hasClients) {
+          // This is needed here in order to fetch more messages if the
+          // visible (non-muted) messages of the previous batch combined
+          // with the already fetched messages are less than a screenful.
+          // In such case, there will be no change in the scroll metrics to
+          // fire a notification that will fetch more messages.
+          // Also, this should be called post frame, otherwise it will end up
+          // fetching an additional batch shortly after we fetch a batch,
+          // even if we don't need the additinal batch at the moment.
+          // The cause is that this will be called again with `ScrollMetrics`
+          // still not yet updated to account for the newly-added messages.
+          // This relates to `_modelChanged` called right away as a listener of
+          // the model, before the next frame being drawn with the newly-added
+          // messages, updating `ScrollMetrics`.
+          _fetchMoreIfNeeded(scrollController.position);
+        }
+      });
+    }


Since _fetchMoreIfNeeded causes an assert(fetched) (in fetchOlder/fetchNewer), it should run synchronously after the model.fetched check, without an async gap in between. The fetched flag can change from true to false, e.g. when MessageListView.renarrowAndFetch is called. I'm not sure if the post-frame callback could actually realistically race with that happening, but better to write the code so that's more clearly not a class of possible bugs.

Also here's a draft of a comment that highlights the fetch-in-loop behavior caused by this code, and I think reads a little more smoothly:

SchedulerBinding.instance.addPostFrameCallback((_) { if (!model.fetched || !scrollController.hasClients) { return; } // If fetchInitial or fetchOlder/fetchNewer // haven't filled model.messages with any visible (i.e. unmuted) messages, // or anyway not enough to fill the screen, fetch again. // If we're in a long run of muted messages, this has the effect of // fetching in a loop until we've either fetched the narrow's whole history // or we've filled the screen with visible messages, // without needing user scroll input between iterations. // // The right time for the "if-needed" check is // after the current model change has been laid out // and the scroll metrics have been updated, // so that when we receive and lay out a screenful of messages, // we don't fetch again unnecessarily. // That's why we do it in a post-frame callback. _fetchMoreIfNeeded(scrollController.position); });

Thanks, I like this version. It’s safer to have model.fetched check inside addPostFrameCallback.

chrisbobbe · 2026-02-07T01:02:10Z

  /// Fetch messages, starting from scratch.
+  ///
+  /// If the results don't include at least one non-muted message,
+  /// this will call [fetchOlder] and/or [fetchNewer]
+  /// until one is found or the narrow's oldest and newest messages are reached.
  Future<void> fetchInitial() async {


Thinking more about this, does fetchInitial actually need to be changed at all in order to fix the issue? I think it doesn't, and we can do without the added complexity here. In the case where the first (i.e. "initial") fetch comes back with no visible messages, we'll still fetch and refetch appropriately because of the _fetchMoreIfNeeded call in the widget code's _modelChanged callback.

It's true that fetchOlder and fetchNewer used to have asserts that messages was not empty, and I guess the loop in fetchInitial would've helped satisfy those. But I think those asserts were only needed in a world where we chose the anchor for fetchOlder/fetchNewer by looking at the messages list itself. Now that we use these new oldestFetchedMessageId/newestFetchedMessageId fields to choose the anchor, we can cheerfully assert on those instead, as it looks like you've already done, making fetchOlder and fetchNewer cheerfully accept the case where messages is empty.

Yeah, removing the loop inside fetchInitial will certainly simplify the code, but it will have some effects on the UI, for example, if the initial batch has no visible messages, the loading indicator in the middle will suddenly move to the bottom. This change was suggested in #1989 (comment) with the initial motivation for it mentioned in there.

I see; yeah, I think the simplification is worth it and that effect on the UI is OK.

chrisbobbe · 2026-02-07T01:14:48Z

+        // `foundOldest: true` just to avoid having to prepare a fetch-older
+        // response when the model notifies its listeners.


I think "when the model notifies its listeners" sounds pretty opaque. It's about an implementation detail of the fetch loop that's intended to fill the screen with messages, right? How about:

// `foundOldest: true` just to avoid having to prepare a fetch-older // response; the message list would otherwise refetch when it notices // it doesn't have a screenful of messages.

sm-sayedi · 2026-02-07T05:27:47Z

Thanks for the review. New revision pushed.

chrisbobbe · 2026-02-09T21:56:43Z

Thanks! As mentioned above, please go ahead with the fetchInitial simplification discussed at #1989 (comment) , otherwise this LGTM (with my comment at #1989 (comment) as an optional independent improvement, I think).

sm-sayedi · 2026-02-10T04:01:34Z

Thanks! Changes pushed.

(The CI failure seems unrelated and doesn't reproduce locally.)

chrisbobbe · 2026-02-13T00:55:02Z

Great, thanks! LGTM; marking for Greg's review.

gnprice

Thanks @sm-sayedi for fixing these issues, and @chrisbobbe for the previous reviews!

Comments below up through fixing the double-fetch glitch #2104:
9f0f456 msglist [nfc]: Mention fetchNewer in the MessageListView dartdoc
25886d9 msglist [nfc]: Add a new point to _MessageSequence.messages dartdoc
ef20b6d msglist [nfc]: Point the TODO about the double-fetch glitch to an issue
bb83ec0 msglist test: Reproduce the double-fetch glitch
3c4cd60 msglist [nfc]: Extract the "fetch more messages" code in a new method
13678fe msglist: Fix the double-fetch glitch

Let's split out the #2104 part to its own separate PR. It feels like several different things are happening in this PR, and splitting it up will help with managing review. I believe that means these commits:
ef20b6d msglist [nfc]: Point the TODO about the double-fetch glitch to an issue
bb83ec0 msglist test: Reproduce the double-fetch glitch
3c4cd60 msglist [nfc]: Extract the "fetch more messages" code in a new method
13678fe msglist: Fix the double-fetch glitch

gnprice · 2026-02-13T22:57:23Z

  /// That information is expressed in [fetched], [haveOldest], [haveNewest].
  ///
+  /// Also, messages may be excluded if they are in muted conversations.
+  ///


This added line seems out of place. If messages are excluded on account of muting, then that means they don't belong in this list, even conceptually.

gnprice · 2026-02-13T23:01:09Z

-    }, skip: true); // TODO this still reproduces manually, still needs debugging,
-                    // but has become harder to reproduce in a test.


Cool — happy to see this glitch diagnosed and reproduced.

gnprice · 2026-02-13T23:03:37Z

+      // On the next frame, we promptly fetch *a second* batch.
      // This is a glitch and it'd be nicer if we didn't.
      connection.prepare(json: eg.olderGetMessagesResult(anchor: 850, foundOldest: false,
        messages: List.generate(100, (i) => eg.streamMessage(id: 750 + i, sender: eg.selfUser))).toJson());
-      await tester.pump(const Duration(milliseconds: 1));
+      // Allow a delayed frame for the response of the first batch to arrive.
+      await tester.pump(Duration(milliseconds: 1));
+      check(itemCount(tester)).equals(401);


Let's also insert a check that verifies at what step here the second fetch request gets made.

Also, the story here seems a bit ambiguous:

The comment at the start of the range I've highlighted says the second fetch is "on the next frame" after the first fetch starts.

But then the logic at the end of this range seems to be that we wait all the way until the first fetch finishes. This happens to also be the first frame after it starts.

If there are several frames between the first fetch starting and ending (which is going to be the typical case — most of these fetches take much longer than a millisecond), then those two descriptions give different predictions for when the second fetch happens. Which one is the behavior that occurs?

By “on the next frame” I meant the next frame in the test code (which is the delayed pump after which the first batch arrives). And in real scenarios, the second fetch starts just after the first one finishes, not after the first frame post first fetch start. Changed that comment to reflect this fact.

gnprice · 2026-02-13T23:10:43Z

+    if (SchedulerBinding.instance.schedulerPhase == .transientCallbacks) {
+      SchedulerBinding.instance.addPostFrameCallback((_) {


Can you explain why this solves the problem (the double-fetch glitch)? And why .transientCallbacks; are there potentially any other phases where we should do the same thing?

Yeah, so as explained in #2104 description, double-fetch was used to happen whenever the response for a fetch-older request arrived while there was a fling going on in the message list. The fling (which is driven by an animation) causes the scroll position to change, but that change happens in .transientCallbacks phase (the phase where the animations tick, in this case, the animation for the fling). The change in the scroll position eventually calls _handleScrollMetrics (in the transientCallbacks phase) with old scroll metrics (specifically minScrollExtent), which causes a double fetch glitch. Then in the .persistentCallbacks phase, where the message list view is laid out with the newly arrived messages, minScrollExtent will be updated to account for the new messages. The updated minScrollExtent value will then be available in .postFrameCallbacks phase, so that's why we wrap _fetchMoreIfNeeded in addPostFrameCallback. And the check for .transientCallbacks is there because in this specific case, the double-fetch glitch is caused by _handleScrollMetrics being called in .transientCallbacks phase.

are there potentially any other phases where we should do the same thing?

Right now, _handleScrollMetrics is called by _scrollChanged and _handleScrollMetricsNotification.

_scrollChanged is called through ScrollPosition.setPixels. Its dartdoc says:
"This should only be called by the current [ScrollActivity], either during the transient callback phase or in response to user input." So it's called either during .transientCallbacks or .idle (user input handlers are executed here) phase.

_handleScrollMetricsNotification is called through NotificationListener.onNotification. Its dartdoc says: "Notifications vary in terms of when they are dispatched. There are two main possibilities: dispatch between frames, and dispatch during layout." So it's called either during .idle or .persistentCallbacks phase. _handleScrollMetricsNotification is specifically called on ScrollMetricsNotification, which in turn is only dispatched in .idle phase — see ScrollPosition.didUpdateScrollMetrics and its call site.

So besides .transientCallbacks, .idle is the only other phase where _handleScrollMetrics could be called. Should we also check for the .idle phase, given that we still haven't experienced any double-fetch glitch caused by it?

gnprice · 2026-02-13T23:13:07Z

-      model.fetchNewer();
+    if (SchedulerBinding.instance.schedulerPhase == .transientCallbacks) {
+      SchedulerBinding.instance.addPostFrameCallback((_) {
+        if (scrollController.hasClients) {


What's the reasoning for this condition?

In this case I think it doesn't really make any difference, but it's a safety check. Maybe there is a possibility that scrollController can lose clients post frame; I'm not sure. The same check is necessary for another addPostFrameCallback in one of the next commits, though, so I thought it would be a good idea to add it here too.

gnprice · 2026-02-13T23:14:15Z

    });

-    testWidgets('observe double-fetch glitch', (tester) async {
+    testWidgets('do not observe double-fetch glitch', (tester) async {


nit: tighten

Suggested change

testWidgets('do not observe double-fetch glitch', (tester) async {

testWidgets('no double-fetch glitch', (tester) async {

gnprice · 2026-02-13T23:15:55Z

+
+      // Check there is no double-fetch glitch to fetch another batch.
+      check(connection.takeRequests().lastOrNull).isNull();
+      check(messageListItemCount(tester)).equals(401);


This check is redundant with the one a few lines above; nothing's changed in between them.

gnprice · 2026-02-13T23:19:30Z

+      // Check there is no double-fetch glitch to fetch another batch.
+      check(connection.takeRequests().lastOrNull).isNull();


This doesn't quite reassure me that there isn't a double fetch; it feels like with this check there still might be a second fetch coming shortly after the end of the test body.

A fix:

Suggested change

// Check there is no double-fetch glitch to fetch another batch.

check(connection.takeRequests().lastOrNull).isNull();

// Check there is no double-fetch glitch to fetch another batch.

await tester.pumpAndSettle();

check(connection.takeRequests().lastOrNull).isNull();

Generally we avoid pumpAndSettle, because it's imprecise about the behavior. But in a negative context like this, imprecise is fine and even good: we want to be sure there's no fetch now, no fetch next frame, and no fetch however far into the future things continue to happen.

sm-sayedi · 2026-02-19T20:29:45Z

Thanks @gnprice for the review. Revision pushed, PTAL.

Opened a new PR (#2162) for those four commits about fixing the double-fetch glitch and rebased this PR atop that.

gnprice · 2026-03-03T01:54:38Z

Sorry this has lingered a bit. My attention has been pretty occupied the past couple of weeks with getting end-to-end-encrypted notifications (#1764) into shape, since that's a big project and also has an upcoming deadline — we want to get it out for the next server release, #release management > 12.0 release @ 💬.

Hopefully that'll be more settled down within the next week or two. Because the logic involved in this PR is somewhat tricky, I think the next time I'll find the attention again to properly review it is after that.

sm-sayedi · 2026-03-03T05:20:26Z

Thanks for the update, and no worries at all! Looking forward to #1764 being completed.

I am also working on the upstream PR for #1100, and it's taking a little more time than expected, maybe because it's my first upstream PR 😀 So I'll take this opportunity and have complete focus on that.

Fixes: zulip#2104

These are: itemCount, findPlaceholder, and findLoadingIndicator. Also, rename itemCount to messageListItemCount, to make it clearer what it refers to in the upcoming tests.

Fixes: zulip#1256

Previously, if the initial batch came up empty, then the history was considered empty too. But now, the initial batch could be empty because of the messages in muted conversations while there may still be other messages in history.

sm-sayedi force-pushed the 1256-fetch-older-fails branch 5 times, most recently from f920082 to a53cbfa Compare November 20, 2025 19:20

sm-sayedi marked this pull request as ready for review November 20, 2025 19:23

sm-sayedi force-pushed the 1256-fetch-older-fails branch from a53cbfa to ef1fcc3 Compare November 20, 2025 19:50

sm-sayedi added the maintainer review PR ready for review by Zulip maintainers label Nov 20, 2025

sm-sayedi requested a review from chrisbobbe November 20, 2025 19:51

sm-sayedi assigned chrisbobbe Nov 20, 2025

sm-sayedi force-pushed the 1256-fetch-older-fails branch from ef1fcc3 to 127f690 Compare November 20, 2025 20:02

chrisbobbe reviewed Nov 21, 2025

View reviewed changes

gnprice reviewed Nov 26, 2025

View reviewed changes

sm-sayedi force-pushed the 1256-fetch-older-fails branch from 127f690 to c43eaad Compare January 9, 2026 20:28

sm-sayedi changed the title ~~msglist: Fetch newer messages despite previous muted batch~~ msglist: Fetch more messages despite previous muted batch Jan 9, 2026

sm-sayedi force-pushed the 1256-fetch-older-fails branch 2 times, most recently from 958fe11 to 8c76491 Compare January 9, 2026 21:24

sm-sayedi requested a review from chrisbobbe January 9, 2026 21:29

sm-sayedi commented Jan 9, 2026

View reviewed changes

sm-sayedi force-pushed the 1256-fetch-older-fails branch from 8c76491 to d0f5c23 Compare January 23, 2026 04:08

chrisbobbe reviewed Jan 24, 2026

View reviewed changes

sm-sayedi mentioned this pull request Jan 27, 2026

msglist: Fix the double-fetch glitch #2104

Open

sm-sayedi force-pushed the 1256-fetch-older-fails branch 2 times, most recently from 64c0a6d to fb2b549 Compare January 27, 2026 21:17

sm-sayedi requested a review from chrisbobbe January 27, 2026 21:24

chrisbobbe reviewed Feb 7, 2026

View reviewed changes

sm-sayedi force-pushed the 1256-fetch-older-fails branch from fb2b549 to 631245d Compare February 7, 2026 05:26

sm-sayedi requested a review from chrisbobbe February 7, 2026 05:28

sm-sayedi force-pushed the 1256-fetch-older-fails branch from 631245d to bc56fea Compare February 10, 2026 03:59

sm-sayedi force-pushed the 1256-fetch-older-fails branch from bc56fea to a6f8c27 Compare February 10, 2026 04:47

chrisbobbe requested a review from gnprice February 13, 2026 00:55

chrisbobbe assigned gnprice and unassigned chrisbobbe Feb 13, 2026

chrisbobbe added integration review Added by maintainers when PR may be ready for integration and removed maintainer review PR ready for review by Zulip maintainers labels Feb 13, 2026

gnprice reviewed Feb 13, 2026

View reviewed changes

sm-sayedi force-pushed the 1256-fetch-older-fails branch from a6f8c27 to ac16e96 Compare February 19, 2026 20:27

sm-sayedi requested a review from gnprice February 19, 2026 20:29

gnprice mentioned this pull request Mar 3, 2026

msglist: Fix the double-fetch glitch #2162

Open

sm-sayedi force-pushed the 1256-fetch-older-fails branch from ac16e96 to 66a57b6 Compare March 12, 2026 16:08

sm-sayedi force-pushed the 1256-fetch-older-fails branch from 66a57b6 to 8f66810 Compare April 8, 2026 05:27

sm-sayedi added 8 commits May 8, 2026 11:29

msglist [nfc]: Point the TODO about the double-fetch glitch to an issue

87b5a2d

msglist test: Reproduce the double-fetch glitch

a55e7ba

msglist [nfc]: Extract the "fetch more messages" code in a new method

104e7f3

msglist: Fix the double-fetch glitch

458d216

Fixes: zulip#2104

msglist [nfc]: Mention fetchNewer in the MessageListView dartdoc

d3650dd

msglist test [nfc]: Move a few helpers up to the top, for reusability

296c089

These are: itemCount, findPlaceholder, and findLoadingIndicator. Also, rename itemCount to messageListItemCount, to make it clearer what it refers to in the upcoming tests.

msglist: Fetch more messages despite previous muted batch

e2f03c2

Fixes: zulip#1256

sm-sayedi force-pushed the 1256-fetch-older-fails branch from 8f66810 to 2ad98f2 Compare May 8, 2026 07:00

-  /// The ID of the oldest known message so far in this narrow.
-  ///
-  /// This will be `null` if no messages of this narrow are fetched yet.
-  /// Having a non-null value for this doesn't always mean [haveOldest] is `true`.
-  ///
-  /// The related message may not appear in [messages] because it
-  /// is muted in one way or another.
-  int? get oldMessageId => _oldMessageId;
+  /// The ID of the oldest message fetched so far in this narrow.
+  ///
+  /// This is used as the anchor for fetching the next batch of older messages
+  /// and will be `null` if no messages of this narrow have been fetched yet.
+  ///
+  /// A message with this ID might not appear in [messages]:
+  /// - The message may be in a muted conversation.
+  /// - The message may have been moved or deleted after it was fetched.
+  int? get oldestFetchedMessageId => _oldestFetchedMessageId;

		} while (visibleMessageCount < kMessageListFetchBatchSize / 2
		&& this.generation == generation);

		/// This makes sure there is at least one non-muted message fetched; if any.
		/// It may do so my repeatedly calling [fetchOlder] and [fetchNewer].

		if (this.generation == generation) {
		if (this.generation == generation && !partOfInitialFetch) {

		if (itemCount(tester)! > 101) break; // The loading indicator appeared.
		if (connection.takeRequests().singleOrNull != null) break;

		// `foundOldest: true` just to avoid having to prepare a fetch-older
		// response when the model notifies its listeners.

		}, skip: true); // TODO this still reproduces manually, still needs debugging,
		// but has become harder to reproduce in a test.

		if (SchedulerBinding.instance.schedulerPhase == .transientCallbacks) {
		SchedulerBinding.instance.addPostFrameCallback((_) {

	testWidgets('do not observe double-fetch glitch', (tester) async {
	testWidgets('no double-fetch glitch', (tester) async {

		// Check there is no double-fetch glitch to fetch another batch.
		check(connection.takeRequests().lastOrNull).isNull();

Conversation

sm-sayedi commented Nov 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

First batch - Empty

Before

After

First batch - Less than a screenful

Before

After

Uh oh!

chrisbobbe left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gnprice left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sm-sayedi Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sm-sayedi commented Jan 9, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sm-sayedi Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chrisbobbe left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sm-sayedi Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sm-sayedi commented Nov 15, 2025 •

edited

Loading

chrisbobbe left a comment •

edited

Loading

sm-sayedi Nov 26, 2025 •

edited

Loading

sm-sayedi Jan 9, 2026 •

edited

Loading

sm-sayedi Jan 27, 2026 •

edited

Loading

chrisbobbe Feb 9, 2026 •

edited

Loading