Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Edit: display warning for large @-mentions #3767

Merged
merged 4 commits into from
Apr 12, 2024
Merged

Edit: display warning for large @-mentions #3767

merged 4 commits into from
Apr 12, 2024

Conversation

abeatrix
Copy link
Contributor

@abeatrix abeatrix commented Apr 10, 2024

The changes in this pull request address an issue where the total size of selected context items could exceed the character budget for the active model when importing through @-mentions.

The Edit Command Menu has been updated to:

  1. Track the total size of the selected context items and compare it against the maximum character limit for the active model.
  2. Display a "Large File Warning" label in the quick pick items if the size of a selected context item would cause the total to exceed the character budget.
  3. Automatically remove any selected context items that are no longer included in the current user input.

Additionally, the filterLargeFiles function in editor-context.ts has been updated to store the file size in the ContextItemFile object, which is used in the getInput function to determine if a file is too large.

These changes help ensure that the user is not presented with context items that would cause the input to exceed the character limit, providing a better user experience and preventing potential issues with the editor's functionality.

Test plan

image

image

Follow-up

  • Support @-mentions with range for Edit

@abeatrix abeatrix requested a review from umpox April 10, 2024 19:11
Copy link
Contributor

@dominiccooney dominiccooney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned about the scalability of this approach. Because stat is cheap but tokenization is not. Once we move to token counting this may need to be very different.

So let's stop investing more in synchronous size checks, until we work out how to scale to (more) accurate token counting. Some ideas:

  • Have a fast path for files which are definitely small enough, or definitely too large (for example, work out the smallest/largest byte -> token count expansion ahead of time and do a quick file size check to exclude tokenizing the largest files)
  • Cache the token counts
  • Do the token counts lazily, prioritizing items the user has already selected (which means we need to show errors after the user has chosen something) or that are visible in the quickpick (can we get what's visible from the VSCode QuickPick?)
  • Get a smarter tokenizer, for example if we can stream, we could stop tokenizing as soon as we know something is too large

@abeatrix
Copy link
Contributor Author

I'm concerned about the scalability of this approach. Because stat is cheap but tokenization is not. Once we move to token counting this may need to be very different.

So let's stop investing more in synchronous size checks, until we work out how to scale to (more) accurate token counting. Some ideas:

  • Have a fast path for files which are definitely small enough, or definitely too large (for example, work out the smallest/largest byte -> token count expansion ahead of time and do a quick file size check to exclude tokenizing the largest files)
  • Cache the token counts
  • Do the token counts lazily, prioritizing items the user has already selected (which means we need to show errors after the user has chosen something) or that are visible in the quickpick (can we get what's visible from the VSCode QuickPick?)
  • Get a smarter tokenizer, for example if we can stream, we could stop tokenizing as soon as we know something is too large

This is a follow-up on #3619 (comment)
I wanted to update this here first, so that I can standardize it in my WIP PR for expanding the context window and using token as size.
But I just realized that was asking to add support for entering range and not displaying large file warning 🤣

@dominiccooney
Copy link
Contributor

@abeatrix

This is a follow-up on #3619 (comment)

Makes sense. It threw me for a loop in the context of this token counting change.

Copy link
Contributor

@umpox umpox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@abeatrix abeatrix merged commit 214ce05 into main Apr 12, 2024
19 checks passed
@abeatrix abeatrix deleted the bee/edit-large branch April 12, 2024 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants