Skip to content

msglist: Fetch more messages despite previous muted batch#1989

Open
sm-sayedi wants to merge 8 commits into
zulip:mainfrom
sm-sayedi:1256-fetch-older-fails
Open

msglist: Fetch more messages despite previous muted batch#1989
sm-sayedi wants to merge 8 commits into
zulip:mainfrom
sm-sayedi:1256-fetch-older-fails

Conversation

@sm-sayedi
Copy link
Copy Markdown
Collaborator

@sm-sayedi sm-sayedi commented Nov 15, 2025

Fixes: #1256

Rebased atop #2162.

Screen recordings

First batch - Empty

Before

Empty.first.batch.-.Before.mov

After

Empty.first.batch.-.After.mov

First batch - Less than a screenful

Before

First.batch.with.few.messages.-.Before.mov

After

First.batch.with.few.messages.-.After.mov

@sm-sayedi sm-sayedi force-pushed the 1256-fetch-older-fails branch 5 times, most recently from f920082 to a53cbfa Compare November 20, 2025 19:20
@sm-sayedi sm-sayedi marked this pull request as ready for review November 20, 2025 19:23
@sm-sayedi sm-sayedi force-pushed the 1256-fetch-older-fails branch from a53cbfa to ef1fcc3 Compare November 20, 2025 19:50
@sm-sayedi sm-sayedi added the maintainer review PR ready for review by Zulip maintainers label Nov 20, 2025
@sm-sayedi sm-sayedi requested a review from chrisbobbe November 20, 2025 19:51
@sm-sayedi sm-sayedi force-pushed the 1256-fetch-older-fails branch from ef1fcc3 to 127f690 Compare November 20, 2025 20:02
Copy link
Copy Markdown
Collaborator

@chrisbobbe chrisbobbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Here's a review of the first three commits:

542996e msglist [nfc]: Mention fetchNewer in the MessageListView dartdoc
ce56456 msglist [nfc]: Add a new point to _MessageSequence.messages dartdoc
3c10bed msglist [nfc]: Remove one nested try block in _fetchMore

and a partial review of the fourth:

03d61ca msglist: Fetch newer messages despite previous muted batch

For that fourth commit, can you say briefly in the commit message why it's not a complete fix for the issue, to help orient the reader to what comes next?

Comment thread lib/model/message_list.dart Outdated

processResult(result);
} catch (e) {
hasFetchError = true;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

msglist [nfc]: Remove one nested try block in `_fetchMore`

The nested `try` block doesn't seem to be making any difference,
so good to remove.

This isn't NFC: now, hasFetchError is about what happens when data is fetched and processed (with processResult), not just about what happens when data is fetched. That doesn't seem desirable, just from looking at its name.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, thanks for the catch. Dropped the commit.

Comment thread lib/model/message_list.dart Outdated
Comment on lines +148 to +155
/// The ID of the oldest known message so far in this narrow.
///
/// This will be `null` if no messages of this narrow are fetched yet.
/// Having a non-null value for this doesn't always mean [haveOldest] is `true`.
///
/// The related message may not appear in [messages] because it
/// is muted in one way or another.
int? get oldMessageId => _oldMessageId;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ID of the oldest known message so far in this narrow.

There isn't necessarily a message with this ID in the narrow, though, right? In a quick skim, I don't see event-handling code to update _oldMessageId when the corresponding message is deleted or moved out of the narrow. We should avoid saying the message is in the narrow when it might not be.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid saying the message is in the narrow when it might not be.

Some nits in this revision:

  • The dartdoc summary line still says the inaccurate thing.
  • "A non-null value for this doesn't always mean [haveOldest] is true." → This is written to prevent a misunderstanding that I think isn't very likely; does that seem right? Maybe instead of this line, we can clarify more directly by making this getter's name more explicit, like oldestFetchedMessageId (which might be helpful in any case).
  • I'd like to remove "that's fine for what this is used for" because it can't be verified except by reading the code that consumes it. (An alternative would be to restrict its allowed usages by adding some words to the dartdoc.)
  • A few wording nits for clarity; see below.
Suggested change
/// The ID of the oldest known message so far in this narrow.
///
/// This will be `null` if no messages of this narrow are fetched yet.
/// Having a non-null value for this doesn't always mean [haveOldest] is `true`.
///
/// The related message may not appear in [messages] because it
/// is muted in one way or another.
int? get oldMessageId => _oldMessageId;
/// The ID of the oldest message fetched so far in this narrow.
///
/// This is used as the anchor for fetching the next batch of older messages
/// and will be `null` if no messages of this narrow have been fetched yet.
///
/// A message with this ID might not appear in [messages]:
/// - The message may be in a muted conversation.
/// - The message may have been moved or deleted after it was fetched.
int? get oldestFetchedMessageId => _oldestFetchedMessageId;

(Similarly for the corresponding newet-fetched-message code.)

Copy link
Copy Markdown
Member

@gnprice gnprice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Similar to #1951 (review), here's some high-level comments before I go on vacation.

Comment thread lib/widgets/message_list.dart Outdated
Comment on lines +1219 to +1221
// TODO perhaps offer mark-as-read even when not done fetching?
MarkAsReadWidget(narrow: widget.narrow),
if (model.messages.isNotEmpty)
MarkAsReadWidget(narrow: widget.narrow),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the relationship of this change to the main changes happening in this commit? What's the user-visible change in behavior it causes?

Copy link
Copy Markdown
Collaborator Author

@sm-sayedi sm-sayedi Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So with the new changes in the commit, if the initial newest batch is all muted (model.messages.isEmpty), and then when older messages are being fetched, the "Mark as read" widget will be shown along with a progress indicator and no messages, so I thought it may be a good idea to hide "Mark as read" until there are visible messages populated.

Before (this one-line change) After (this one-line change)
Screenshot 2025-11-26 at 7 30 45 AM Screenshot 2025-11-26 at 7 32 13 AM

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I see.

What does that situation look like before this branch / before even the first batch is fetched? I believe we show a different progress indicator (centered), and don't show the mark-read button.

Can we have that behavior continue when we've had some fetch requests finish, but haven't yet actually found any messages? I think I'd like to think of that state as equivalent to when the initial fetch is still going.

IOW I'd like to think of it as "still working on fetching messages", like in the doc comment on MessageListView.fetched (in main):

  /// Whether [messages] and [items] represent the results of a fetch.
  ///
  /// This allows the UI to distinguish "still working on fetching messages"
  /// from "there are in fact no messages here".
  bool get fetched => switch (_status) {
    FetchingStatus.unstarted || FetchingStatus.fetchInitial => false,
    _ => true,
  };

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think that will be an improvement. It will also avoid the abrupt change from a centered progress indicator to a bottom-aligned indicator.

Comment thread lib/model/message_list.dart Outdated
Comment on lines +229 to +231
FetchingInitialStatus _fetchInitialStatus = .unstarted;
FetchingMoreStatus _fetchOlderStatus = .unstarted;
FetchingMoreStatus _fetchNewerStatus = .unstarted;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another benefit of this change is that if there's a backoff in one
direction, it will not affect the other one.

Hmm — that sounds like a problem, not a benefit. 🙂 The fetches are all going to the same server, so if one type of fetch has trouble that suggests we should hold off on our next requests, then the same need applies to requests at the other end of the list.

(In fact ideally we'd be sharing backoff information across all requests on a given account. See also #946. But these are potentially some of the more frequent requests, so it's good that we at least share it among these.)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More broadly:

But imagine if the number of messages in the initial batch occupies less
than a screenful, and then `fetchOlder` returns no messages or a few
messages that combined with the initial batch messages are still less
than a screenful; in that case, there will be no change in the scroll
metrics to call `fetchNewer`.

This change feels like the wrong layer for solving this problem.

When the fetchOlder returns, that will notify the view-model's listeners, right? What if we have the _MessageListState react to that by looking to see if more fetching is needed, and calling fetchOlder/fetchNewer if so?

IOW, we could have its _modelChanged method call the same logic that's at the end of _handleScrollMetrics. Probably that means pulling that part out as its own smaller helper method.

Comment thread lib/model/message_list.dart Outdated
Comment on lines +967 to +968
} while (visibleMessageCount < kMessageListFetchBatchSize / 2
&& this.generation == generation);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have the widget state react in the way I suggested in my last comment just now, does that also let us skip these loops? We'd have the state retain responsibility for taking the initiative to call fetchOlder and fetchNewer when potentially needed.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we can do that; removed them in the new revision.

@sm-sayedi sm-sayedi force-pushed the 1256-fetch-older-fails branch from 127f690 to c43eaad Compare January 9, 2026 20:28
@sm-sayedi sm-sayedi changed the title msglist: Fetch newer messages despite previous muted batch msglist: Fetch more messages despite previous muted batch Jan 9, 2026
@sm-sayedi sm-sayedi force-pushed the 1256-fetch-older-fails branch 2 times, most recently from 958fe11 to 8c76491 Compare January 9, 2026 21:24
@sm-sayedi
Copy link
Copy Markdown
Collaborator Author

Thanks @chrisbobbe and @gnprice for the previous reviews. Pushed a new revision, PTAL.

@sm-sayedi sm-sayedi requested a review from chrisbobbe January 9, 2026 21:29
Comment thread test/widgets/message_list_test.dart Outdated
check(messageListItemCount(tester)).equals(501);
});

testWidgets('observe double-fetch glitch', (tester) async {
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case is now very similar to the test case above ("basic") because of the new double-fetch glitch in _modelChanged. Should we remove this one then?

Comment thread test/widgets/message_list_test.dart Outdated
check(messageListItemCount(tester)).equals(2 + 2 + 98);
});

testWidgets('mid-history and fetch-older with too few messages, fetch-newer request is made', (tester) async {
Copy link
Copy Markdown
Collaborator Author

@sm-sayedi sm-sayedi Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is for handling an edge case, the handling of which required a lot of changes in the first revision of this PR, before Greg suggested an alternative, straightforward method of solving that problem. I explained that edge case in the description of a commit message, which is not included now.

This change is necessary when there is a need to fetch more messages
in both directions, older and newer, and when fetching in
one direction avoids fetching in another direction at the same time,
because of the `if (busyFetchingMore) return` line in
both `fetchOlder` and `fetchNewer`.

This scenario happens when a conversation is opened in its first unread,
such that the number of messages in the initial batch is so low (because
they're muted in one way or another) that it's already past the certain
point where the scroll metrics listener in the widget code triggers both
`fetchOlder` and `fetchNewer`. In 2025-11, that code first calls
`fetchOlder` then `fetchNewer`, and for the reason mentioned above,
`fetchNewer` will not fetch any new messages. But that's fine, because
as soon as older messages from `fetchOlder` arrives, there will be
a change in the scroll metrics, so `fetchNewer` will be called again,
fetching new messages.

But imagine if the number of messages in the initial batch occupies less
than a screenful, and then `fetchOlder` returns no messages or a few
messages that combined with the initial batch messages are still less
than a screenful; in that case, there will be no change in the scroll
metrics to call `fetchNewer`.

With the three fetch request types being separated, especially the two
request types for older and newer messages, each direction can fetch
more messages independently without interfering with one another.

@sm-sayedi sm-sayedi force-pushed the 1256-fetch-older-fails branch from 8c76491 to d0f5c23 Compare January 23, 2026 04:08
Copy link
Copy Markdown
Collaborator

@chrisbobbe chrisbobbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Comments below, from reading everything except the tests.

Comment thread lib/widgets/message_list.dart Outdated
Comment on lines +1036 to +1054
// TODO: This ends up firing a second time shortly after we fetch a batch.
// The result is that each time we decide to fetch a batch, we end up
// fetching two batches in quick succession. This is basically harmless
// but makes things a bit more complicated to reason about.
// The cause seems to be that this gets called again with maxScrollExtent
// still not yet updated to account for the newly-added messages.
// This ends up firing a second time shortly after we fetch a batch while
// there is a fling going on.
// The result is that each time we decide to fetch a batch, we end up
// fetching two batches in quick succession. This is basically harmless
// but makes things a bit more complicated to reason about.
// The cause is that this gets called again with minScrollExtent
// still not yet updated to account for the newly-added messages.
// This relates to how [SchedulerBinding] executes different tasks when
// producing a new frame, like first executing transient callbacks
// (typically ticking animations) followed by persistent callbacks
// (typically the build/layout/paint pipeline), and so on.
// So when there is a new message batch received, the related widgets are
// marked dirty for the next frame. With the ongoing fling, the underlying
// animation registers transient callback(s) for [ScrollPosition.setPixels]
// to be executed in the transient callbacks phase at the start of
// the frame. It will then notify its listeners, eventually calling
// `_scrollChanged` and in turn current method with old minScrollExtent,
// causing the second batch fetch. Then in the persistent callbacks phase,
// minScrollExtent will be updated, effective in the next frame.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

msglist [nfc]: Explain the reasoning behind the double-fetch glitch

Ideally we'd fix the glitch, right? (The old comment is a TODO, saying "This is basically harmless but makes things a bit more complicated to reason about.")

I haven't digested the new comment yet (I'm kind of getting lost in it), but can we use the details you've learned to fix the glitch? In that case I think a new issue is a better place to put them, and we can just point the existing TODO to that issue (and fix anything that's wrong; I see you've done s/maxScrollExtent/minScrollExtent/ which might have been intentional).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #2104.

return findScrollView(tester).controller;
}

int? messageListItemCount(WidgetTester tester) =>
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

msglist test [nfc]: Move a few helpers to the top, for reusability

These are: itemCount, findPlaceholder, and findLoadingIndicator.

Also, rename itemCount to messageListItemCount, to make it easier what
it is about in the upcoming tests.

Commit-message nit: I think are some missing words in that last sentence?

Copy link
Copy Markdown
Collaborator Author

@sm-sayedi sm-sayedi Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not sure about the missing words — the phrasing may have been unclear. Reworded it a little; hope that’s better now 🙂

Also, rename itemCount to messageListItemCount, to make it clearer
what it refers to in the upcoming tests.

Comment thread lib/model/message_list.dart Outdated
Comment on lines +148 to +155
/// The ID of the oldest known message so far in this narrow.
///
/// This will be `null` if no messages of this narrow are fetched yet.
/// Having a non-null value for this doesn't always mean [haveOldest] is `true`.
///
/// The related message may not appear in [messages] because it
/// is muted in one way or another.
int? get oldMessageId => _oldMessageId;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid saying the message is in the narrow when it might not be.

Some nits in this revision:

  • The dartdoc summary line still says the inaccurate thing.
  • "A non-null value for this doesn't always mean [haveOldest] is true." → This is written to prevent a misunderstanding that I think isn't very likely; does that seem right? Maybe instead of this line, we can clarify more directly by making this getter's name more explicit, like oldestFetchedMessageId (which might be helpful in any case).
  • I'd like to remove "that's fine for what this is used for" because it can't be verified except by reading the code that consumes it. (An alternative would be to restrict its allowed usages by adding some words to the dartdoc.)
  • A few wording nits for clarity; see below.
Suggested change
/// The ID of the oldest known message so far in this narrow.
///
/// This will be `null` if no messages of this narrow are fetched yet.
/// Having a non-null value for this doesn't always mean [haveOldest] is `true`.
///
/// The related message may not appear in [messages] because it
/// is muted in one way or another.
int? get oldMessageId => _oldMessageId;
/// The ID of the oldest message fetched so far in this narrow.
///
/// This is used as the anchor for fetching the next batch of older messages
/// and will be `null` if no messages of this narrow have been fetched yet.
///
/// A message with this ID might not appear in [messages]:
/// - The message may be in a muted conversation.
/// - The message may have been moved or deleted after it was fetched.
int? get oldestFetchedMessageId => _oldestFetchedMessageId;

(Similarly for the corresponding newet-fetched-message code.)

Comment thread lib/model/message_list.dart Outdated
Comment on lines +847 to +848
/// This makes sure there is at least one non-muted message fetched; if any.
/// It may do so my repeatedly calling [fetchOlder] and [fetchNewer].
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// This makes sure there is at least one non-muted message fetched; if any.
/// It may do so my repeatedly calling [fetchOlder] and [fetchNewer].
/// If the results don't include at least one non-muted message,
/// this will call [fetchOlder] and/or [fetchNewer]
/// until one is found or the narrow's oldest and newest messages are reached.

Comment thread lib/model/message_list.dart Outdated
Comment on lines +130 to +132
/// This also may or may not represent all the message history that
/// conceptually belongs in this narrow because some messages might be
/// muted in one way or another and they may not appear in the message list.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:

Suggested change
/// This also may or may not represent all the message history that
/// conceptually belongs in this narrow because some messages might be
/// muted in one way or another and they may not appear in the message list.
/// Also, messages may be excluded if they are in muted conversations.

Comment thread lib/model/message_list.dart Outdated
Comment on lines +906 to +910
while (messages.isEmpty && !(haveOldest && haveNewest)) {
await fetchOlder(partOfInitialFetch: true);
if (messages.isNotEmpty) break;
await fetchNewer(partOfInitialFetch: true);
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably also want to exit the loop if fetchNewer finds some non-muted messages.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that’s already covered by the loop condition — it breaks as soon as messages aren’t empty 🙂 But if that’s good for explicitness, I’ll be happy to add it.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah you're right, that makes sense.

Comment thread lib/model/message_list.dart Outdated
Comment on lines +1001 to +1049
if (this.generation == generation) {
if (this.generation == generation && !partOfInitialFetch) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the initial-fetch case, if one of these fetch-older/fetch-newer requests fails, I think we still want to use BackoffMachine to delay the next attempt, right?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When writing this, I was thinking about using backoff, but then looking at #945 (PR #1050) where the backoff logic was introduced, the motivation there was to avoid the retry storm caused by a failed fetchOlder (or fetchNewer) request in the widgets code, specifically by the listener for the scroll state. In this (initial-fetch) case though, there is no code path (at least as of now) to cause a retry storm. The while loop will break promptly whenever one of the fetch-older or fetch-newer requests fails. So I think handling of this is similar to the case when the initial-fetch request fails. How about adding a TODO pointing to #2085?

Comment thread lib/widgets/message_list.dart Outdated
Comment on lines +1055 to +1057
// This relates to `_modelChanged` (and in turn the current method) being
// called right away as a listener of the model, before the next frame
// being drawn with the newly-added messages, updating minScrollExtent.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. If we want this condition check to run after a frame is drawn with the newly-added messages, how about putting this model.fetchOlder call in a SchedulerBinding.instance.addPostFrameCallback? I wonder if that might actually be the fix for both double-fetch glitches?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, addPostFrameCallback can avoid these glitches. Fix included in the new revision.

@sm-sayedi sm-sayedi force-pushed the 1256-fetch-older-fails branch 2 times, most recently from 64c0a6d to fb2b549 Compare January 27, 2026 21:17
@sm-sayedi sm-sayedi requested a review from chrisbobbe January 27, 2026 21:24
@sm-sayedi
Copy link
Copy Markdown
Collaborator Author

Thanks Chris for the review. Pushed new changes, PTAL.

Copy link
Copy Markdown
Collaborator

@chrisbobbe chrisbobbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! A few comments below, including one that should help reduce complexity. 🙂

Comment thread test/widgets/message_list_test.dart Outdated
Comment on lines +828 to +830
if (itemCount(tester)! > 101) break; // The loading indicator appeared.
if (connection.takeRequests().singleOrNull != null) break;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would .isNotEmpty be equivalent to .singleOrNull != null here?

Comment thread test/widgets/message_list_test.dart Outdated
});

testWidgets('observe double-fetch glitch', (tester) async {
testWidgets('do not observe double-fetch glitch', (tester) async {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a comment like

// Regression test for: https://github.com/zulip/zulip-flutter/issues/2104

at the top of the test body.

Comment thread lib/model/message_list.dart Outdated
Comment on lines +906 to +910
while (messages.isEmpty && !(haveOldest && haveNewest)) {
await fetchOlder(partOfInitialFetch: true);
if (messages.isNotEmpty) break;
await fetchNewer(partOfInitialFetch: true);
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah you're right, that makes sense.

Comment thread lib/widgets/message_list.dart Outdated
Comment on lines -924 to -912
if (model.messages.isEmpty && model.haveOldest && model.haveNewest) {
// If the fetch came up empty, there's nothing to read,
// so opening the keyboard won't be bothersome and could be helpful.
// It's definitely helpful if we got here from the new-DM page.
MessageListPage.ancestorOf(context)
.composeBoxState?.controller.requestFocusIfUnfocused();
}
_prevFetched = model.fetched;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting; I agree that it does seem better to not "request-focus-if-unfocused" except when we know there aren't any messages to show in the narrow's entire history.

It looks like the condition on _prevFetched was to make sure this "autofocus" behavior was only triggered once per message list. With that removed, I think that means we're allowing an "autofocus" multiple times: for example, each time addOutboxMessage is called during a time when the whole history is fetched and there are no visible messages. I think we probably don't want that? If you'd like to make a change here, can you do it in separate commit(s)?

Possibly the stateful variable should have different semantics, for example with a name like _hasAutofocused rather than trying to make it about the state of message fetches.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this change was necessary in one of the first revisions, where there was no loop in the fetchInitial method for finding at least one non-muted message. Instead, at that time, we relied on fetchOlder/Newer to find one. In that case, when the initial batch came up empty, the compose box would autofocus even though technically there were more messages to fetch. But right now where we have the loop in fetchInitial, model.fetched would become true if either there is at least one message or there are no messages in the history at all. So we'll not autofocus unnecessarily.

But if the change still seems safer, I'll be happy to add it.

Copy link
Copy Markdown
Collaborator

@chrisbobbe chrisbobbe Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think something like this could be helpful, but done in a separate commit (before or after):

    if (model.messages.isEmpty && model.haveNewest && model.haveOldest && !_hasAutofocused) {
      // If there are no messages to show in the whole history,
      // opening the keyboard won't be bothersome and could be helpful.
      // It's definitely helpful if we got here from the new-DM page.
      MessageListPage.ancestorOf(context)
        .composeBoxState?.controller.requestFocusIfUnfocused();
      _hasAutofocused = true;
    }

Comment thread lib/widgets/message_list.dart Outdated
Comment on lines +903 to +922
if (model.fetched) {
SchedulerBinding.instance.addPostFrameCallback((_) {
if (scrollController.hasClients) {
// This is needed here in order to fetch more messages if the
// visible (non-muted) messages of the previous batch combined
// with the already fetched messages are less than a screenful.
// In such case, there will be no change in the scroll metrics to
// fire a notification that will fetch more messages.
// Also, this should be called post frame, otherwise it will end up
// fetching an additional batch shortly after we fetch a batch,
// even if we don't need the additinal batch at the moment.
// The cause is that this will be called again with `ScrollMetrics`
// still not yet updated to account for the newly-added messages.
// This relates to `_modelChanged` called right away as a listener of
// the model, before the next frame being drawn with the newly-added
// messages, updating `ScrollMetrics`.
_fetchMoreIfNeeded(scrollController.position);
}
});
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since _fetchMoreIfNeeded causes an assert(fetched) (in fetchOlder/fetchNewer), it should run synchronously after the model.fetched check, without an async gap in between. The fetched flag can change from true to false, e.g. when MessageListView.renarrowAndFetch is called. I'm not sure if the post-frame callback could actually realistically race with that happening, but better to write the code so that's more clearly not a class of possible bugs.

Also here's a draft of a comment that highlights the fetch-in-loop behavior caused by this code, and I think reads a little more smoothly:

    SchedulerBinding.instance.addPostFrameCallback((_) {
      if (!model.fetched || !scrollController.hasClients) {
        return;
      }

      // If fetchInitial or fetchOlder/fetchNewer
      // haven't filled model.messages with any visible (i.e. unmuted) messages,
      // or anyway not enough to fill the screen, fetch again.
      // If we're in a long run of muted messages, this has the effect of 
      // fetching in a loop until we've either fetched the narrow's whole history
      // or we've filled the screen with visible messages,
      // without needing user scroll input between iterations.
      //
      // The right time for the "if-needed" check is
      // after the current model change has been laid out
      // and the scroll metrics have been updated,
      // so that when we receive and lay out a screenful of messages,
      // we don't fetch again unnecessarily.
      // That's why we do it in a post-frame callback.
      _fetchMoreIfNeeded(scrollController.position);
    });

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I like this version. It’s safer to have model.fetched check inside addPostFrameCallback.

Comment on lines 843 to 848
/// Fetch messages, starting from scratch.
///
/// If the results don't include at least one non-muted message,
/// this will call [fetchOlder] and/or [fetchNewer]
/// until one is found or the narrow's oldest and newest messages are reached.
Future<void> fetchInitial() async {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking more about this, does fetchInitial actually need to be changed at all in order to fix the issue? I think it doesn't, and we can do without the added complexity here. In the case where the first (i.e. "initial") fetch comes back with no visible messages, we'll still fetch and refetch appropriately because of the _fetchMoreIfNeeded call in the widget code's _modelChanged callback.

It's true that fetchOlder and fetchNewer used to have asserts that messages was not empty, and I guess the loop in fetchInitial would've helped satisfy those. But I think those asserts were only needed in a world where we chose the anchor for fetchOlder/fetchNewer by looking at the messages list itself. Now that we use these new oldestFetchedMessageId/newestFetchedMessageId fields to choose the anchor, we can cheerfully assert on those instead, as it looks like you've already done, making fetchOlder and fetchNewer cheerfully accept the case where messages is empty.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, removing the loop inside fetchInitial will certainly simplify the code, but it will have some effects on the UI, for example, if the initial batch has no visible messages, the loading indicator in the middle will suddenly move to the bottom. This change was suggested in #1989 (comment) with the initial motivation for it mentioned in there.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see; yeah, I think the simplification is worth it and that effect on the UI is OK.

Comment thread test/widgets/message_list_test.dart Outdated
Comment on lines +1782 to +1783
// `foundOldest: true` just to avoid having to prepare a fetch-older
// response when the model notifies its listeners.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "when the model notifies its listeners" sounds pretty opaque. It's about an implementation detail of the fetch loop that's intended to fill the screen with messages, right? How about:

        // `foundOldest: true` just to avoid having to prepare a fetch-older
        // response; the message list would otherwise refetch when it notices
        // it doesn't have a screenful of messages.

@sm-sayedi sm-sayedi force-pushed the 1256-fetch-older-fails branch from fb2b549 to 631245d Compare February 7, 2026 05:26
@sm-sayedi
Copy link
Copy Markdown
Collaborator Author

sm-sayedi commented Feb 7, 2026

Thanks for the review. New revision pushed.

@sm-sayedi sm-sayedi requested a review from chrisbobbe February 7, 2026 05:28
@chrisbobbe
Copy link
Copy Markdown
Collaborator

chrisbobbe commented Feb 9, 2026

Thanks! As mentioned above, please go ahead with the fetchInitial simplification discussed at #1989 (comment) , otherwise this LGTM (with my comment at #1989 (comment) as an optional independent improvement, I think).

@sm-sayedi sm-sayedi force-pushed the 1256-fetch-older-fails branch from 631245d to bc56fea Compare February 10, 2026 03:59
@sm-sayedi
Copy link
Copy Markdown
Collaborator Author

sm-sayedi commented Feb 10, 2026

Thanks! Changes pushed.

(The CI failure seems unrelated and doesn't reproduce locally.)

@sm-sayedi sm-sayedi force-pushed the 1256-fetch-older-fails branch from bc56fea to a6f8c27 Compare February 10, 2026 04:47
@chrisbobbe
Copy link
Copy Markdown
Collaborator

Great, thanks! LGTM; marking for Greg's review.

@chrisbobbe chrisbobbe requested a review from gnprice February 13, 2026 00:55
@chrisbobbe chrisbobbe assigned gnprice and unassigned chrisbobbe Feb 13, 2026
@chrisbobbe chrisbobbe added integration review Added by maintainers when PR may be ready for integration and removed maintainer review PR ready for review by Zulip maintainers labels Feb 13, 2026
Copy link
Copy Markdown
Member

@gnprice gnprice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sm-sayedi for fixing these issues, and @chrisbobbe for the previous reviews!

Comments below up through fixing the double-fetch glitch #2104:
9f0f456 msglist [nfc]: Mention fetchNewer in the MessageListView dartdoc
25886d9 msglist [nfc]: Add a new point to _MessageSequence.messages dartdoc
ef20b6d msglist [nfc]: Point the TODO about the double-fetch glitch to an issue
bb83ec0 msglist test: Reproduce the double-fetch glitch
3c4cd60 msglist [nfc]: Extract the "fetch more messages" code in a new method
13678fe msglist: Fix the double-fetch glitch

Let's split out the #2104 part to its own separate PR. It feels like several different things are happening in this PR, and splitting it up will help with managing review. I believe that means these commits:
ef20b6d msglist [nfc]: Point the TODO about the double-fetch glitch to an issue
bb83ec0 msglist test: Reproduce the double-fetch glitch
3c4cd60 msglist [nfc]: Extract the "fetch more messages" code in a new method
13678fe msglist: Fix the double-fetch glitch

Comment thread lib/model/message_list.dart Outdated
Comment on lines +124 to +131
/// That information is expressed in [fetched], [haveOldest], [haveNewest].
///
/// Also, messages may be excluded if they are in muted conversations.
///
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This added line seems out of place. If messages are excluded on account of muting, then that means they don't belong in this list, even conceptually.

Comment on lines -840 to -841
}, skip: true); // TODO this still reproduces manually, still needs debugging,
// but has become harder to reproduce in a test.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool — happy to see this glitch diagnosed and reproduced.

Comment thread test/widgets/message_list_test.dart Outdated
Comment on lines +833 to +839
// On the next frame, we promptly fetch *a second* batch.
// This is a glitch and it'd be nicer if we didn't.
connection.prepare(json: eg.olderGetMessagesResult(anchor: 850, foundOldest: false,
messages: List.generate(100, (i) => eg.streamMessage(id: 750 + i, sender: eg.selfUser))).toJson());
await tester.pump(const Duration(milliseconds: 1));
// Allow a delayed frame for the response of the first batch to arrive.
await tester.pump(Duration(milliseconds: 1));
check(itemCount(tester)).equals(401);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also insert a check that verifies at what step here the second fetch request gets made.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the story here seems a bit ambiguous:

  • The comment at the start of the range I've highlighted says the second fetch is "on the next frame" after the first fetch starts.
  • But then the logic at the end of this range seems to be that we wait all the way until the first fetch finishes. This happens to also be the first frame after it starts.

If there are several frames between the first fetch starting and ending (which is going to be the typical case — most of these fetches take much longer than a millisecond), then those two descriptions give different predictions for when the second fetch happens. Which one is the behavior that occurs?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By “on the next frame” I meant the next frame in the test code (which is the delayed pump after which the first batch arrives). And in real scenarios, the second fetch starts just after the first one finishes, not after the first frame post first fetch start. Changed that comment to reflect this fact.

Comment on lines +1066 to +1067
if (SchedulerBinding.instance.schedulerPhase == .transientCallbacks) {
SchedulerBinding.instance.addPostFrameCallback((_) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why this solves the problem (the double-fetch glitch)? And why .transientCallbacks; are there potentially any other phases where we should do the same thing?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, so as explained in #2104 description, double-fetch was used to happen whenever the response for a fetch-older request arrived while there was a fling going on in the message list. The fling (which is driven by an animation) causes the scroll position to change, but that change happens in .transientCallbacks phase (the phase where the animations tick, in this case, the animation for the fling). The change in the scroll position eventually calls _handleScrollMetrics (in the transientCallbacks phase) with old scroll metrics (specifically minScrollExtent), which causes a double fetch glitch. Then in the .persistentCallbacks phase, where the message list view is laid out with the newly arrived messages, minScrollExtent will be updated to account for the new messages. The updated minScrollExtent value will then be available in .postFrameCallbacks phase, so that's why we wrap _fetchMoreIfNeeded in addPostFrameCallback. And the check for .transientCallbacks is there because in this specific case, the double-fetch glitch is caused by _handleScrollMetrics being called in .transientCallbacks phase.

are there potentially any other phases where we should do the same thing?

Right now, _handleScrollMetrics is called by _scrollChanged and _handleScrollMetricsNotification.

  • _scrollChanged is called through ScrollPosition.setPixels. Its dartdoc says:
    "This should only be called by the current [ScrollActivity], either during the transient callback phase or in response to user input." So it's called either during .transientCallbacks or .idle (user input handlers are executed here) phase.
  • _handleScrollMetricsNotification is called through NotificationListener.onNotification. Its dartdoc says: "Notifications vary in terms of when they are dispatched. There are two main possibilities: dispatch between frames, and dispatch during layout." So it's called either during .idle or .persistentCallbacks phase. _handleScrollMetricsNotification is specifically called on ScrollMetricsNotification, which in turn is only dispatched in .idle phase — see ScrollPosition.didUpdateScrollMetrics and its call site.

So besides .transientCallbacks, .idle is the only other phase where _handleScrollMetrics could be called. Should we also check for the .idle phase, given that we still haven't experienced any double-fetch glitch caused by it?

model.fetchNewer();
if (SchedulerBinding.instance.schedulerPhase == .transientCallbacks) {
SchedulerBinding.instance.addPostFrameCallback((_) {
if (scrollController.hasClients) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reasoning for this condition?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case I think it doesn't really make any difference, but it's a safety check. Maybe there is a possibility that scrollController can lose clients post frame; I'm not sure. The same check is necessary for another addPostFrameCallback in one of the next commits, though, so I thought it would be a good idea to add it here too.

Comment thread test/widgets/message_list_test.dart Outdated
});

testWidgets('observe double-fetch glitch', (tester) async {
testWidgets('do not observe double-fetch glitch', (tester) async {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: tighten

Suggested change
testWidgets('do not observe double-fetch glitch', (tester) async {
testWidgets('no double-fetch glitch', (tester) async {

Comment thread test/widgets/message_list_test.dart Outdated

// Check there is no double-fetch glitch to fetch another batch.
check(connection.takeRequests().lastOrNull).isNull();
check(messageListItemCount(tester)).equals(401);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is redundant with the one a few lines above; nothing's changed in between them.

Comment thread test/widgets/message_list_test.dart Outdated
Comment on lines +839 to +840
// Check there is no double-fetch glitch to fetch another batch.
check(connection.takeRequests().lastOrNull).isNull();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't quite reassure me that there isn't a double fetch; it feels like with this check there still might be a second fetch coming shortly after the end of the test body.

A fix:

Suggested change
// Check there is no double-fetch glitch to fetch another batch.
check(connection.takeRequests().lastOrNull).isNull();
// Check there is no double-fetch glitch to fetch another batch.
await tester.pumpAndSettle();
check(connection.takeRequests().lastOrNull).isNull();

Generally we avoid pumpAndSettle, because it's imprecise about the behavior. But in a negative context like this, imprecise is fine and even good: we want to be sure there's no fetch now, no fetch next frame, and no fetch however far into the future things continue to happen.

@sm-sayedi sm-sayedi force-pushed the 1256-fetch-older-fails branch from a6f8c27 to ac16e96 Compare February 19, 2026 20:27
@sm-sayedi
Copy link
Copy Markdown
Collaborator Author

Thanks @gnprice for the review. Revision pushed, PTAL.

Opened a new PR (#2162) for those four commits about fixing the double-fetch glitch and rebased this PR atop that.

@sm-sayedi sm-sayedi requested a review from gnprice February 19, 2026 20:29
@gnprice
Copy link
Copy Markdown
Member

gnprice commented Mar 3, 2026

Sorry this has lingered a bit. My attention has been pretty occupied the past couple of weeks with getting end-to-end-encrypted notifications (#1764) into shape, since that's a big project and also has an upcoming deadline — we want to get it out for the next server release, #release management > 12.0 release @ 💬.

Hopefully that'll be more settled down within the next week or two. Because the logic involved in this PR is somewhat tricky, I think the next time I'll find the attention again to properly review it is after that.

@sm-sayedi
Copy link
Copy Markdown
Collaborator Author

Thanks for the update, and no worries at all! Looking forward to #1764 being completed.

I am also working on the upstream PR for #1100, and it's taking a little more time than expected, maybe because it's my first upstream PR 😀 So I'll take this opportunity and have complete focus on that.

@sm-sayedi sm-sayedi force-pushed the 1256-fetch-older-fails branch from ac16e96 to 66a57b6 Compare March 12, 2026 16:08
@sm-sayedi sm-sayedi force-pushed the 1256-fetch-older-fails branch from 66a57b6 to 8f66810 Compare April 8, 2026 05:27
sm-sayedi added 8 commits May 8, 2026 11:29
These are: itemCount, findPlaceholder, and findLoadingIndicator.

Also, rename itemCount to messageListItemCount, to make it clearer
what it refers to in the upcoming tests.
Previously, if the initial batch came up empty, then the history was
considered empty too. But now, the initial batch could be empty
because of the messages in muted conversations while there may still be
other messages in history.
@sm-sayedi sm-sayedi force-pushed the 1256-fetch-older-fails branch from 8f66810 to 2ad98f2 Compare May 8, 2026 07:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integration review Added by maintainers when PR may be ready for integration

Projects

None yet

Development

Successfully merging this pull request may close these issues.

msglist: Fetch-older is defeated by a run of 100+ muted messages

3 participants