msglist: Fetch more messages despite previous muted batch#1989
msglist: Fetch more messages despite previous muted batch#1989sm-sayedi wants to merge 8 commits into
Conversation
f920082 to
a53cbfa
Compare
a53cbfa to
ef1fcc3
Compare
ef1fcc3 to
127f690
Compare
There was a problem hiding this comment.
Thanks! Here's a review of the first three commits:
542996e msglist [nfc]: Mention fetchNewer in the MessageListView dartdoc
ce56456 msglist [nfc]: Add a new point to _MessageSequence.messages dartdoc
3c10bed msglist [nfc]: Remove one nested try block in _fetchMore
and a partial review of the fourth:
03d61ca msglist: Fetch newer messages despite previous muted batch
For that fourth commit, can you say briefly in the commit message why it's not a complete fix for the issue, to help orient the reader to what comes next?
|
|
||
| processResult(result); | ||
| } catch (e) { | ||
| hasFetchError = true; |
There was a problem hiding this comment.
msglist [nfc]: Remove one nested try block in `_fetchMore`
The nested `try` block doesn't seem to be making any difference,
so good to remove.
This isn't NFC: now, hasFetchError is about what happens when data is fetched and processed (with processResult), not just about what happens when data is fetched. That doesn't seem desirable, just from looking at its name.
There was a problem hiding this comment.
Indeed, thanks for the catch. Dropped the commit.
| /// The ID of the oldest known message so far in this narrow. | ||
| /// | ||
| /// This will be `null` if no messages of this narrow are fetched yet. | ||
| /// Having a non-null value for this doesn't always mean [haveOldest] is `true`. | ||
| /// | ||
| /// The related message may not appear in [messages] because it | ||
| /// is muted in one way or another. | ||
| int? get oldMessageId => _oldMessageId; |
There was a problem hiding this comment.
The ID of the oldest known message so far in this narrow.
There isn't necessarily a message with this ID in the narrow, though, right? In a quick skim, I don't see event-handling code to update _oldMessageId when the corresponding message is deleted or moved out of the narrow. We should avoid saying the message is in the narrow when it might not be.
There was a problem hiding this comment.
We should avoid saying the message is in the narrow when it might not be.
Some nits in this revision:
- The dartdoc summary line still says the inaccurate thing.
- "A non-null value for this doesn't always mean [haveOldest] is
true." → This is written to prevent a misunderstanding that I think isn't very likely; does that seem right? Maybe instead of this line, we can clarify more directly by making this getter's name more explicit, likeoldestFetchedMessageId(which might be helpful in any case). - I'd like to remove "that's fine for what this is used for" because it can't be verified except by reading the code that consumes it. (An alternative would be to restrict its allowed usages by adding some words to the dartdoc.)
- A few wording nits for clarity; see below.
| /// The ID of the oldest known message so far in this narrow. | |
| /// | |
| /// This will be `null` if no messages of this narrow are fetched yet. | |
| /// Having a non-null value for this doesn't always mean [haveOldest] is `true`. | |
| /// | |
| /// The related message may not appear in [messages] because it | |
| /// is muted in one way or another. | |
| int? get oldMessageId => _oldMessageId; | |
| /// The ID of the oldest message fetched so far in this narrow. | |
| /// | |
| /// This is used as the anchor for fetching the next batch of older messages | |
| /// and will be `null` if no messages of this narrow have been fetched yet. | |
| /// | |
| /// A message with this ID might not appear in [messages]: | |
| /// - The message may be in a muted conversation. | |
| /// - The message may have been moved or deleted after it was fetched. | |
| int? get oldestFetchedMessageId => _oldestFetchedMessageId; |
(Similarly for the corresponding newet-fetched-message code.)
gnprice
left a comment
There was a problem hiding this comment.
Thanks!
Similar to #1951 (review), here's some high-level comments before I go on vacation.
| // TODO perhaps offer mark-as-read even when not done fetching? | ||
| MarkAsReadWidget(narrow: widget.narrow), | ||
| if (model.messages.isNotEmpty) | ||
| MarkAsReadWidget(narrow: widget.narrow), |
There was a problem hiding this comment.
What's the relationship of this change to the main changes happening in this commit? What's the user-visible change in behavior it causes?
There was a problem hiding this comment.
So with the new changes in the commit, if the initial newest batch is all muted (model.messages.isEmpty), and then when older messages are being fetched, the "Mark as read" widget will be shown along with a progress indicator and no messages, so I thought it may be a good idea to hide "Mark as read" until there are visible messages populated.
| Before (this one-line change) | After (this one-line change) |
|---|---|
![]() |
![]() |
There was a problem hiding this comment.
Hmm, I see.
What does that situation look like before this branch / before even the first batch is fetched? I believe we show a different progress indicator (centered), and don't show the mark-read button.
Can we have that behavior continue when we've had some fetch requests finish, but haven't yet actually found any messages? I think I'd like to think of that state as equivalent to when the initial fetch is still going.
IOW I'd like to think of it as "still working on fetching messages", like in the doc comment on MessageListView.fetched (in main):
/// Whether [messages] and [items] represent the results of a fetch.
///
/// This allows the UI to distinguish "still working on fetching messages"
/// from "there are in fact no messages here".
bool get fetched => switch (_status) {
FetchingStatus.unstarted || FetchingStatus.fetchInitial => false,
_ => true,
};There was a problem hiding this comment.
Yeah, I think that will be an improvement. It will also avoid the abrupt change from a centered progress indicator to a bottom-aligned indicator.
| FetchingInitialStatus _fetchInitialStatus = .unstarted; | ||
| FetchingMoreStatus _fetchOlderStatus = .unstarted; | ||
| FetchingMoreStatus _fetchNewerStatus = .unstarted; |
There was a problem hiding this comment.
Another benefit of this change is that if there's a backoff in one
direction, it will not affect the other one.
Hmm — that sounds like a problem, not a benefit. 🙂 The fetches are all going to the same server, so if one type of fetch has trouble that suggests we should hold off on our next requests, then the same need applies to requests at the other end of the list.
(In fact ideally we'd be sharing backoff information across all requests on a given account. See also #946. But these are potentially some of the more frequent requests, so it's good that we at least share it among these.)
There was a problem hiding this comment.
More broadly:
But imagine if the number of messages in the initial batch occupies less
than a screenful, and then `fetchOlder` returns no messages or a few
messages that combined with the initial batch messages are still less
than a screenful; in that case, there will be no change in the scroll
metrics to call `fetchNewer`.
This change feels like the wrong layer for solving this problem.
When the fetchOlder returns, that will notify the view-model's listeners, right? What if we have the _MessageListState react to that by looking to see if more fetching is needed, and calling fetchOlder/fetchNewer if so?
IOW, we could have its _modelChanged method call the same logic that's at the end of _handleScrollMetrics. Probably that means pulling that part out as its own smaller helper method.
| } while (visibleMessageCount < kMessageListFetchBatchSize / 2 | ||
| && this.generation == generation); |
There was a problem hiding this comment.
If we have the widget state react in the way I suggested in my last comment just now, does that also let us skip these loops? We'd have the state retain responsibility for taking the initiative to call fetchOlder and fetchNewer when potentially needed.
There was a problem hiding this comment.
Yeah, we can do that; removed them in the new revision.
127f690 to
c43eaad
Compare
958fe11 to
8c76491
Compare
|
Thanks @chrisbobbe and @gnprice for the previous reviews. Pushed a new revision, PTAL. |
| check(messageListItemCount(tester)).equals(501); | ||
| }); | ||
|
|
||
| testWidgets('observe double-fetch glitch', (tester) async { |
There was a problem hiding this comment.
This test case is now very similar to the test case above ("basic") because of the new double-fetch glitch in _modelChanged. Should we remove this one then?
| check(messageListItemCount(tester)).equals(2 + 2 + 98); | ||
| }); | ||
|
|
||
| testWidgets('mid-history and fetch-older with too few messages, fetch-newer request is made', (tester) async { |
There was a problem hiding this comment.
This test is for handling an edge case, the handling of which required a lot of changes in the first revision of this PR, before Greg suggested an alternative, straightforward method of solving that problem. I explained that edge case in the description of a commit message, which is not included now.
This change is necessary when there is a need to fetch more messages
in both directions, older and newer, and when fetching in
one direction avoids fetching in another direction at the same time,
because of the `if (busyFetchingMore) return` line in
both `fetchOlder` and `fetchNewer`.
This scenario happens when a conversation is opened in its first unread,
such that the number of messages in the initial batch is so low (because
they're muted in one way or another) that it's already past the certain
point where the scroll metrics listener in the widget code triggers both
`fetchOlder` and `fetchNewer`. In 2025-11, that code first calls
`fetchOlder` then `fetchNewer`, and for the reason mentioned above,
`fetchNewer` will not fetch any new messages. But that's fine, because
as soon as older messages from `fetchOlder` arrives, there will be
a change in the scroll metrics, so `fetchNewer` will be called again,
fetching new messages.
But imagine if the number of messages in the initial batch occupies less
than a screenful, and then `fetchOlder` returns no messages or a few
messages that combined with the initial batch messages are still less
than a screenful; in that case, there will be no change in the scroll
metrics to call `fetchNewer`.
With the three fetch request types being separated, especially the two
request types for older and newer messages, each direction can fetch
more messages independently without interfering with one another.
8c76491 to
d0f5c23
Compare
chrisbobbe
left a comment
There was a problem hiding this comment.
Thanks! Comments below, from reading everything except the tests.
| // TODO: This ends up firing a second time shortly after we fetch a batch. | ||
| // The result is that each time we decide to fetch a batch, we end up | ||
| // fetching two batches in quick succession. This is basically harmless | ||
| // but makes things a bit more complicated to reason about. | ||
| // The cause seems to be that this gets called again with maxScrollExtent | ||
| // still not yet updated to account for the newly-added messages. | ||
| // This ends up firing a second time shortly after we fetch a batch while | ||
| // there is a fling going on. | ||
| // The result is that each time we decide to fetch a batch, we end up | ||
| // fetching two batches in quick succession. This is basically harmless | ||
| // but makes things a bit more complicated to reason about. | ||
| // The cause is that this gets called again with minScrollExtent | ||
| // still not yet updated to account for the newly-added messages. | ||
| // This relates to how [SchedulerBinding] executes different tasks when | ||
| // producing a new frame, like first executing transient callbacks | ||
| // (typically ticking animations) followed by persistent callbacks | ||
| // (typically the build/layout/paint pipeline), and so on. | ||
| // So when there is a new message batch received, the related widgets are | ||
| // marked dirty for the next frame. With the ongoing fling, the underlying | ||
| // animation registers transient callback(s) for [ScrollPosition.setPixels] | ||
| // to be executed in the transient callbacks phase at the start of | ||
| // the frame. It will then notify its listeners, eventually calling | ||
| // `_scrollChanged` and in turn current method with old minScrollExtent, | ||
| // causing the second batch fetch. Then in the persistent callbacks phase, | ||
| // minScrollExtent will be updated, effective in the next frame. |
There was a problem hiding this comment.
msglist [nfc]: Explain the reasoning behind the double-fetch glitch
Ideally we'd fix the glitch, right? (The old comment is a TODO, saying "This is basically harmless but makes things a bit more complicated to reason about.")
I haven't digested the new comment yet (I'm kind of getting lost in it), but can we use the details you've learned to fix the glitch? In that case I think a new issue is a better place to put them, and we can just point the existing TODO to that issue (and fix anything that's wrong; I see you've done s/maxScrollExtent/minScrollExtent/ which might have been intentional).
| return findScrollView(tester).controller; | ||
| } | ||
|
|
||
| int? messageListItemCount(WidgetTester tester) => |
There was a problem hiding this comment.
msglist test [nfc]: Move a few helpers to the top, for reusability
These are: itemCount, findPlaceholder, and findLoadingIndicator.
Also, rename itemCount to messageListItemCount, to make it easier what
it is about in the upcoming tests.
Commit-message nit: I think are some missing words in that last sentence?
There was a problem hiding this comment.
I’m not sure about the missing words — the phrasing may have been unclear. Reworded it a little; hope that’s better now 🙂
Also, rename itemCount to messageListItemCount, to make it clearer
what it refers to in the upcoming tests.
| /// The ID of the oldest known message so far in this narrow. | ||
| /// | ||
| /// This will be `null` if no messages of this narrow are fetched yet. | ||
| /// Having a non-null value for this doesn't always mean [haveOldest] is `true`. | ||
| /// | ||
| /// The related message may not appear in [messages] because it | ||
| /// is muted in one way or another. | ||
| int? get oldMessageId => _oldMessageId; |
There was a problem hiding this comment.
We should avoid saying the message is in the narrow when it might not be.
Some nits in this revision:
- The dartdoc summary line still says the inaccurate thing.
- "A non-null value for this doesn't always mean [haveOldest] is
true." → This is written to prevent a misunderstanding that I think isn't very likely; does that seem right? Maybe instead of this line, we can clarify more directly by making this getter's name more explicit, likeoldestFetchedMessageId(which might be helpful in any case). - I'd like to remove "that's fine for what this is used for" because it can't be verified except by reading the code that consumes it. (An alternative would be to restrict its allowed usages by adding some words to the dartdoc.)
- A few wording nits for clarity; see below.
| /// The ID of the oldest known message so far in this narrow. | |
| /// | |
| /// This will be `null` if no messages of this narrow are fetched yet. | |
| /// Having a non-null value for this doesn't always mean [haveOldest] is `true`. | |
| /// | |
| /// The related message may not appear in [messages] because it | |
| /// is muted in one way or another. | |
| int? get oldMessageId => _oldMessageId; | |
| /// The ID of the oldest message fetched so far in this narrow. | |
| /// | |
| /// This is used as the anchor for fetching the next batch of older messages | |
| /// and will be `null` if no messages of this narrow have been fetched yet. | |
| /// | |
| /// A message with this ID might not appear in [messages]: | |
| /// - The message may be in a muted conversation. | |
| /// - The message may have been moved or deleted after it was fetched. | |
| int? get oldestFetchedMessageId => _oldestFetchedMessageId; |
(Similarly for the corresponding newet-fetched-message code.)
| /// This makes sure there is at least one non-muted message fetched; if any. | ||
| /// It may do so my repeatedly calling [fetchOlder] and [fetchNewer]. |
There was a problem hiding this comment.
| /// This makes sure there is at least one non-muted message fetched; if any. | |
| /// It may do so my repeatedly calling [fetchOlder] and [fetchNewer]. | |
| /// If the results don't include at least one non-muted message, | |
| /// this will call [fetchOlder] and/or [fetchNewer] | |
| /// until one is found or the narrow's oldest and newest messages are reached. |
| /// This also may or may not represent all the message history that | ||
| /// conceptually belongs in this narrow because some messages might be | ||
| /// muted in one way or another and they may not appear in the message list. |
There was a problem hiding this comment.
How about:
| /// This also may or may not represent all the message history that | |
| /// conceptually belongs in this narrow because some messages might be | |
| /// muted in one way or another and they may not appear in the message list. | |
| /// Also, messages may be excluded if they are in muted conversations. |
| while (messages.isEmpty && !(haveOldest && haveNewest)) { | ||
| await fetchOlder(partOfInitialFetch: true); | ||
| if (messages.isNotEmpty) break; | ||
| await fetchNewer(partOfInitialFetch: true); | ||
| } |
There was a problem hiding this comment.
We probably also want to exit the loop if fetchNewer finds some non-muted messages.
There was a problem hiding this comment.
I think that’s already covered by the loop condition — it breaks as soon as messages aren’t empty 🙂 But if that’s good for explicitness, I’ll be happy to add it.
There was a problem hiding this comment.
Ah yeah you're right, that makes sense.
| if (this.generation == generation) { | ||
| if (this.generation == generation && !partOfInitialFetch) { |
There was a problem hiding this comment.
In the initial-fetch case, if one of these fetch-older/fetch-newer requests fails, I think we still want to use BackoffMachine to delay the next attempt, right?
There was a problem hiding this comment.
When writing this, I was thinking about using backoff, but then looking at #945 (PR #1050) where the backoff logic was introduced, the motivation there was to avoid the retry storm caused by a failed fetchOlder (or fetchNewer) request in the widgets code, specifically by the listener for the scroll state. In this (initial-fetch) case though, there is no code path (at least as of now) to cause a retry storm. The while loop will break promptly whenever one of the fetch-older or fetch-newer requests fails. So I think handling of this is similar to the case when the initial-fetch request fails. How about adding a TODO pointing to #2085?
| // This relates to `_modelChanged` (and in turn the current method) being | ||
| // called right away as a listener of the model, before the next frame | ||
| // being drawn with the newly-added messages, updating minScrollExtent. |
There was a problem hiding this comment.
Interesting. If we want this condition check to run after a frame is drawn with the newly-added messages, how about putting this model.fetchOlder call in a SchedulerBinding.instance.addPostFrameCallback? I wonder if that might actually be the fix for both double-fetch glitches?
There was a problem hiding this comment.
Yeah, addPostFrameCallback can avoid these glitches. Fix included in the new revision.
64c0a6d to
fb2b549
Compare
|
Thanks Chris for the review. Pushed new changes, PTAL. |
chrisbobbe
left a comment
There was a problem hiding this comment.
Thanks! A few comments below, including one that should help reduce complexity. 🙂
| if (itemCount(tester)! > 101) break; // The loading indicator appeared. | ||
| if (connection.takeRequests().singleOrNull != null) break; |
There was a problem hiding this comment.
Would .isNotEmpty be equivalent to .singleOrNull != null here?
| }); | ||
|
|
||
| testWidgets('observe double-fetch glitch', (tester) async { | ||
| testWidgets('do not observe double-fetch glitch', (tester) async { |
There was a problem hiding this comment.
Let's add a comment like
// Regression test for: https://github.com/zulip/zulip-flutter/issues/2104at the top of the test body.
| while (messages.isEmpty && !(haveOldest && haveNewest)) { | ||
| await fetchOlder(partOfInitialFetch: true); | ||
| if (messages.isNotEmpty) break; | ||
| await fetchNewer(partOfInitialFetch: true); | ||
| } |
There was a problem hiding this comment.
Ah yeah you're right, that makes sense.
| if (model.messages.isEmpty && model.haveOldest && model.haveNewest) { | ||
| // If the fetch came up empty, there's nothing to read, | ||
| // so opening the keyboard won't be bothersome and could be helpful. | ||
| // It's definitely helpful if we got here from the new-DM page. | ||
| MessageListPage.ancestorOf(context) | ||
| .composeBoxState?.controller.requestFocusIfUnfocused(); | ||
| } | ||
| _prevFetched = model.fetched; |
There was a problem hiding this comment.
Interesting; I agree that it does seem better to not "request-focus-if-unfocused" except when we know there aren't any messages to show in the narrow's entire history.
It looks like the condition on _prevFetched was to make sure this "autofocus" behavior was only triggered once per message list. With that removed, I think that means we're allowing an "autofocus" multiple times: for example, each time addOutboxMessage is called during a time when the whole history is fetched and there are no visible messages. I think we probably don't want that? If you'd like to make a change here, can you do it in separate commit(s)?
Possibly the stateful variable should have different semantics, for example with a name like _hasAutofocused rather than trying to make it about the state of message fetches.
There was a problem hiding this comment.
Actually, this change was necessary in one of the first revisions, where there was no loop in the fetchInitial method for finding at least one non-muted message. Instead, at that time, we relied on fetchOlder/Newer to find one. In that case, when the initial batch came up empty, the compose box would autofocus even though technically there were more messages to fetch. But right now where we have the loop in fetchInitial, model.fetched would become true if either there is at least one message or there are no messages in the history at all. So we'll not autofocus unnecessarily.
But if the change still seems safer, I'll be happy to add it.
There was a problem hiding this comment.
I think something like this could be helpful, but done in a separate commit (before or after):
if (model.messages.isEmpty && model.haveNewest && model.haveOldest && !_hasAutofocused) {
// If there are no messages to show in the whole history,
// opening the keyboard won't be bothersome and could be helpful.
// It's definitely helpful if we got here from the new-DM page.
MessageListPage.ancestorOf(context)
.composeBoxState?.controller.requestFocusIfUnfocused();
_hasAutofocused = true;
}| if (model.fetched) { | ||
| SchedulerBinding.instance.addPostFrameCallback((_) { | ||
| if (scrollController.hasClients) { | ||
| // This is needed here in order to fetch more messages if the | ||
| // visible (non-muted) messages of the previous batch combined | ||
| // with the already fetched messages are less than a screenful. | ||
| // In such case, there will be no change in the scroll metrics to | ||
| // fire a notification that will fetch more messages. | ||
| // Also, this should be called post frame, otherwise it will end up | ||
| // fetching an additional batch shortly after we fetch a batch, | ||
| // even if we don't need the additinal batch at the moment. | ||
| // The cause is that this will be called again with `ScrollMetrics` | ||
| // still not yet updated to account for the newly-added messages. | ||
| // This relates to `_modelChanged` called right away as a listener of | ||
| // the model, before the next frame being drawn with the newly-added | ||
| // messages, updating `ScrollMetrics`. | ||
| _fetchMoreIfNeeded(scrollController.position); | ||
| } | ||
| }); | ||
| } |
There was a problem hiding this comment.
Since _fetchMoreIfNeeded causes an assert(fetched) (in fetchOlder/fetchNewer), it should run synchronously after the model.fetched check, without an async gap in between. The fetched flag can change from true to false, e.g. when MessageListView.renarrowAndFetch is called. I'm not sure if the post-frame callback could actually realistically race with that happening, but better to write the code so that's more clearly not a class of possible bugs.
Also here's a draft of a comment that highlights the fetch-in-loop behavior caused by this code, and I think reads a little more smoothly:
SchedulerBinding.instance.addPostFrameCallback((_) {
if (!model.fetched || !scrollController.hasClients) {
return;
}
// If fetchInitial or fetchOlder/fetchNewer
// haven't filled model.messages with any visible (i.e. unmuted) messages,
// or anyway not enough to fill the screen, fetch again.
// If we're in a long run of muted messages, this has the effect of
// fetching in a loop until we've either fetched the narrow's whole history
// or we've filled the screen with visible messages,
// without needing user scroll input between iterations.
//
// The right time for the "if-needed" check is
// after the current model change has been laid out
// and the scroll metrics have been updated,
// so that when we receive and lay out a screenful of messages,
// we don't fetch again unnecessarily.
// That's why we do it in a post-frame callback.
_fetchMoreIfNeeded(scrollController.position);
});There was a problem hiding this comment.
Thanks, I like this version. It’s safer to have model.fetched check inside addPostFrameCallback.
| /// Fetch messages, starting from scratch. | ||
| /// | ||
| /// If the results don't include at least one non-muted message, | ||
| /// this will call [fetchOlder] and/or [fetchNewer] | ||
| /// until one is found or the narrow's oldest and newest messages are reached. | ||
| Future<void> fetchInitial() async { |
There was a problem hiding this comment.
Thinking more about this, does fetchInitial actually need to be changed at all in order to fix the issue? I think it doesn't, and we can do without the added complexity here. In the case where the first (i.e. "initial") fetch comes back with no visible messages, we'll still fetch and refetch appropriately because of the _fetchMoreIfNeeded call in the widget code's _modelChanged callback.
It's true that fetchOlder and fetchNewer used to have asserts that messages was not empty, and I guess the loop in fetchInitial would've helped satisfy those. But I think those asserts were only needed in a world where we chose the anchor for fetchOlder/fetchNewer by looking at the messages list itself. Now that we use these new oldestFetchedMessageId/newestFetchedMessageId fields to choose the anchor, we can cheerfully assert on those instead, as it looks like you've already done, making fetchOlder and fetchNewer cheerfully accept the case where messages is empty.
There was a problem hiding this comment.
Yeah, removing the loop inside fetchInitial will certainly simplify the code, but it will have some effects on the UI, for example, if the initial batch has no visible messages, the loading indicator in the middle will suddenly move to the bottom. This change was suggested in #1989 (comment) with the initial motivation for it mentioned in there.
There was a problem hiding this comment.
I see; yeah, I think the simplification is worth it and that effect on the UI is OK.
| // `foundOldest: true` just to avoid having to prepare a fetch-older | ||
| // response when the model notifies its listeners. |
There was a problem hiding this comment.
I think "when the model notifies its listeners" sounds pretty opaque. It's about an implementation detail of the fetch loop that's intended to fill the screen with messages, right? How about:
// `foundOldest: true` just to avoid having to prepare a fetch-older
// response; the message list would otherwise refetch when it notices
// it doesn't have a screenful of messages.fb2b549 to
631245d
Compare
|
Thanks for the review. New revision pushed. |
|
Thanks! As mentioned above, please go ahead with the |
631245d to
bc56fea
Compare
|
Thanks! Changes pushed. (The CI failure seems unrelated and doesn't reproduce locally.) |
bc56fea to
a6f8c27
Compare
|
Great, thanks! LGTM; marking for Greg's review. |
gnprice
left a comment
There was a problem hiding this comment.
Thanks @sm-sayedi for fixing these issues, and @chrisbobbe for the previous reviews!
Comments below up through fixing the double-fetch glitch #2104:
9f0f456 msglist [nfc]: Mention fetchNewer in the MessageListView dartdoc
25886d9 msglist [nfc]: Add a new point to _MessageSequence.messages dartdoc
ef20b6d msglist [nfc]: Point the TODO about the double-fetch glitch to an issue
bb83ec0 msglist test: Reproduce the double-fetch glitch
3c4cd60 msglist [nfc]: Extract the "fetch more messages" code in a new method
13678fe msglist: Fix the double-fetch glitch
Let's split out the #2104 part to its own separate PR. It feels like several different things are happening in this PR, and splitting it up will help with managing review. I believe that means these commits:
ef20b6d msglist [nfc]: Point the TODO about the double-fetch glitch to an issue
bb83ec0 msglist test: Reproduce the double-fetch glitch
3c4cd60 msglist [nfc]: Extract the "fetch more messages" code in a new method
13678fe msglist: Fix the double-fetch glitch
| /// That information is expressed in [fetched], [haveOldest], [haveNewest]. | ||
| /// | ||
| /// Also, messages may be excluded if they are in muted conversations. | ||
| /// |
There was a problem hiding this comment.
This added line seems out of place. If messages are excluded on account of muting, then that means they don't belong in this list, even conceptually.
| }, skip: true); // TODO this still reproduces manually, still needs debugging, | ||
| // but has become harder to reproduce in a test. |
There was a problem hiding this comment.
Cool — happy to see this glitch diagnosed and reproduced.
| // On the next frame, we promptly fetch *a second* batch. | ||
| // This is a glitch and it'd be nicer if we didn't. | ||
| connection.prepare(json: eg.olderGetMessagesResult(anchor: 850, foundOldest: false, | ||
| messages: List.generate(100, (i) => eg.streamMessage(id: 750 + i, sender: eg.selfUser))).toJson()); | ||
| await tester.pump(const Duration(milliseconds: 1)); | ||
| // Allow a delayed frame for the response of the first batch to arrive. | ||
| await tester.pump(Duration(milliseconds: 1)); | ||
| check(itemCount(tester)).equals(401); |
There was a problem hiding this comment.
Let's also insert a check that verifies at what step here the second fetch request gets made.
There was a problem hiding this comment.
Also, the story here seems a bit ambiguous:
- The comment at the start of the range I've highlighted says the second fetch is "on the next frame" after the first fetch starts.
- But then the logic at the end of this range seems to be that we wait all the way until the first fetch finishes. This happens to also be the first frame after it starts.
If there are several frames between the first fetch starting and ending (which is going to be the typical case — most of these fetches take much longer than a millisecond), then those two descriptions give different predictions for when the second fetch happens. Which one is the behavior that occurs?
There was a problem hiding this comment.
By “on the next frame” I meant the next frame in the test code (which is the delayed pump after which the first batch arrives). And in real scenarios, the second fetch starts just after the first one finishes, not after the first frame post first fetch start. Changed that comment to reflect this fact.
| if (SchedulerBinding.instance.schedulerPhase == .transientCallbacks) { | ||
| SchedulerBinding.instance.addPostFrameCallback((_) { |
There was a problem hiding this comment.
Can you explain why this solves the problem (the double-fetch glitch)? And why .transientCallbacks; are there potentially any other phases where we should do the same thing?
There was a problem hiding this comment.
Yeah, so as explained in #2104 description, double-fetch was used to happen whenever the response for a fetch-older request arrived while there was a fling going on in the message list. The fling (which is driven by an animation) causes the scroll position to change, but that change happens in .transientCallbacks phase (the phase where the animations tick, in this case, the animation for the fling). The change in the scroll position eventually calls _handleScrollMetrics (in the transientCallbacks phase) with old scroll metrics (specifically minScrollExtent), which causes a double fetch glitch. Then in the .persistentCallbacks phase, where the message list view is laid out with the newly arrived messages, minScrollExtent will be updated to account for the new messages. The updated minScrollExtent value will then be available in .postFrameCallbacks phase, so that's why we wrap _fetchMoreIfNeeded in addPostFrameCallback. And the check for .transientCallbacks is there because in this specific case, the double-fetch glitch is caused by _handleScrollMetrics being called in .transientCallbacks phase.
are there potentially any other phases where we should do the same thing?
Right now, _handleScrollMetrics is called by _scrollChanged and _handleScrollMetricsNotification.
_scrollChangedis called throughScrollPosition.setPixels. Its dartdoc says:
"This should only be called by the current [ScrollActivity], either during the transient callback phase or in response to user input." So it's called either during.transientCallbacksor.idle(user input handlers are executed here) phase._handleScrollMetricsNotificationis called throughNotificationListener.onNotification. Its dartdoc says: "Notifications vary in terms of when they are dispatched. There are two main possibilities: dispatch between frames, and dispatch during layout." So it's called either during.idleor.persistentCallbacksphase._handleScrollMetricsNotificationis specifically called onScrollMetricsNotification, which in turn is only dispatched in.idlephase — seeScrollPosition.didUpdateScrollMetricsand its call site.
So besides .transientCallbacks, .idle is the only other phase where _handleScrollMetrics could be called. Should we also check for the .idle phase, given that we still haven't experienced any double-fetch glitch caused by it?
| model.fetchNewer(); | ||
| if (SchedulerBinding.instance.schedulerPhase == .transientCallbacks) { | ||
| SchedulerBinding.instance.addPostFrameCallback((_) { | ||
| if (scrollController.hasClients) { |
There was a problem hiding this comment.
What's the reasoning for this condition?
There was a problem hiding this comment.
In this case I think it doesn't really make any difference, but it's a safety check. Maybe there is a possibility that scrollController can lose clients post frame; I'm not sure. The same check is necessary for another addPostFrameCallback in one of the next commits, though, so I thought it would be a good idea to add it here too.
| }); | ||
|
|
||
| testWidgets('observe double-fetch glitch', (tester) async { | ||
| testWidgets('do not observe double-fetch glitch', (tester) async { |
There was a problem hiding this comment.
nit: tighten
| testWidgets('do not observe double-fetch glitch', (tester) async { | |
| testWidgets('no double-fetch glitch', (tester) async { |
|
|
||
| // Check there is no double-fetch glitch to fetch another batch. | ||
| check(connection.takeRequests().lastOrNull).isNull(); | ||
| check(messageListItemCount(tester)).equals(401); |
There was a problem hiding this comment.
This check is redundant with the one a few lines above; nothing's changed in between them.
| // Check there is no double-fetch glitch to fetch another batch. | ||
| check(connection.takeRequests().lastOrNull).isNull(); |
There was a problem hiding this comment.
This doesn't quite reassure me that there isn't a double fetch; it feels like with this check there still might be a second fetch coming shortly after the end of the test body.
A fix:
| // Check there is no double-fetch glitch to fetch another batch. | |
| check(connection.takeRequests().lastOrNull).isNull(); | |
| // Check there is no double-fetch glitch to fetch another batch. | |
| await tester.pumpAndSettle(); | |
| check(connection.takeRequests().lastOrNull).isNull(); |
Generally we avoid pumpAndSettle, because it's imprecise about the behavior. But in a negative context like this, imprecise is fine and even good: we want to be sure there's no fetch now, no fetch next frame, and no fetch however far into the future things continue to happen.
a6f8c27 to
ac16e96
Compare
|
Sorry this has lingered a bit. My attention has been pretty occupied the past couple of weeks with getting end-to-end-encrypted notifications (#1764) into shape, since that's a big project and also has an upcoming deadline — we want to get it out for the next server release, #release management > 12.0 release @ 💬. Hopefully that'll be more settled down within the next week or two. Because the logic involved in this PR is somewhat tricky, I think the next time I'll find the attention again to properly review it is after that. |
ac16e96 to
66a57b6
Compare
66a57b6 to
8f66810
Compare
These are: itemCount, findPlaceholder, and findLoadingIndicator. Also, rename itemCount to messageListItemCount, to make it clearer what it refers to in the upcoming tests.
Previously, if the initial batch came up empty, then the history was considered empty too. But now, the initial batch could be empty because of the messages in muted conversations while there may still be other messages in history.
8f66810 to
2ad98f2
Compare


Fixes: #1256
Rebased atop #2162.
Screen recordings
First batch - Empty
Before
Empty.first.batch.-.Before.mov
After
Empty.first.batch.-.After.mov
First batch - Less than a screenful
Before
First.batch.with.few.messages.-.Before.mov
After
First.batch.with.few.messages.-.After.mov