Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

method to propagate backpressure from parser handler #3218

Closed
jerch opened this issue Jan 15, 2021 · 12 comments · Fixed by #3222
Closed

method to propagate backpressure from parser handler #3218

jerch opened this issue Jan 15, 2021 · 12 comments · Fixed by #3222
Assignees
Labels
area/parser type/enhancement Features or improvements to existing features
Milestone

Comments

@jerch
Copy link
Member

jerch commented Jan 15, 2021

First some background on this idea (coming from the image PR):

Currently SIXELs are decoded synchronously and the image decoding can correctly slow down the whole input processing chain, which leads to a drop of callback calls in WriteBuffer. Ok - sync processing works.

Ofc image decoding creates quite alot fuzz on the main thread and prolly should be done in a worker (for PNG/JPEG it has to be done async anyway). And thats where the data pressure hell begins - the SIXEL decoder has a throughput of ~50 MB/s while the DCS parser path operates at ~250 MB/s, means the terminal floods the decoder with 200 MB more data than it can process in a second (worst case). Currently there is no way to inform the WriteBuffer asynchronously about that data pressure condition from within the terminal.

Repro:
Doing cat *.sixel in a folder full of sixel images with my async decoder triggers the issue. Same with mpv with their new sixel video output above a certain output size.

In general:
This is currently a non-issue for our normal sync handler actions, as sync code always "blocks" and would indicate data pressure by a lower callback rate in WriteBuffer. But any async handler code with lower throughput than the parser itself will run into this. It is def. not only a problem of image processing, it just happened to be the first complicated async handler.

Idea:
It would be good to have some way to pause/resume the data chunk processing on WriteBuffer from a parser handler. Given that flow control is properly set up for the terminal itself, this enables correct backpressure signalling from async parser handler code as well.

Up for discussion.

Edit: I dont need this fine-grained to a single input buffer position (although this would solve many issues around async handlers in general, the needed stack unwinding is not worth it), I think our already established chunk border stops are good enough.

@jerch jerch added type/enhancement Features or improvements to existing features area/parser labels Jan 15, 2021
@Tyriar
Copy link
Member

Tyriar commented Jan 15, 2021

To clarify, this isn't about async/sync writes getting out of order, it's about locking up the main thread when writing a lot of data where a handler (csi handler?) is slow to parse.

@jerch
Copy link
Member Author

jerch commented Jan 15, 2021

Yes, the latter point would be a reason to offload the hard work to a worker (as with the image processing example). But of course this applies to any handler that takes really long.

The execution order or state synchronisation is a different issue that would arise from worker usage, if the worker wants to place data back in the terminal state. Then this is a serious issue on its own, but not what I am talking here about.

Lemme illustrate this further - given you have some CSI handler, that invokes an async worker interface with portions of the terminal data. This sounds easy to do:

const fancyStuff = new Worker('process_fancy_stuff.js');

term.registerCsiHandler({some_symbols}, params => {
  // offload hard work
  fancyStuff.postMessage(someLongishTermData); // <-- Where does the data end up if worker is slow?
});

This works reliable, as long as it is guaranteed, that the incoming data rate is lower than the data rate of fancyStuff- the message queue will be drained in time. Not so for a slower processing in the worker (and most likely the case for any async part in handlers) - the queue will grow without a chance to stop input processing up to the csi handler, thus more and more data might arrive.

Thats the point where I think pausing the WriteBuffer from within the terminal helps to keep high input under control. By the way - it is the logical coninuation of the flow control matter to async handlers (much same as with pty-node --> xterm.js, but now xterm.js --> async handler).

@Tyriar
Copy link
Member

Tyriar commented Jan 15, 2021

Oh the csi handlers are synchronous atm, I think the bit I wasn't expected was the API is this:

registerCsiHandler(id: IFunctionIdentifier, callback: (params: (number | number[])[]) => boolean): IDisposable;

Not this:

registerCsiHandler(id: IFunctionIdentifier, callback: (params: (number | number[])[]) => boolean | Promise<boolean>): IDisposable;

Or:

registerCsiHandler(
  id: IFunctionIdentifier,
  callback: (
    params: (number | number[])[],
    resultCallback: (result: boolean) => void) => void
  ): IDisposable;

The right fix is to block writing completely as above isn't it? That seems more reliable/easier to use than adding a pause/resume mechanism.

@jerch
Copy link
Member Author

jerch commented Jan 16, 2021

Do you have a use case for a handler being promise / callback result based? If so, then we prolly should approach the issue with real in-band blocking, otherwise we cannot guarantee handler execution order anymore.

My use case here is more relaxed in this regard, it does not depend on correct execution order down to a single handler + its result, it only needs the blocking semantics on the data pressure part. Promises would also solve this, but they come to a really high price - we would have to reshape the whole parser chain to async support with promises. Following my speed tests here, promises at least triple the runtime compared to pure sync code (actually it is more like 3-5 times slower depending on the promises "granularity").

But there is hope for a middle ground - those tests also show that async code based on direct microtask callbacks is less painful only doubling the runtime. So if you need true in-order execution async support with blocking semantics, we might be able to get the best of both worlds (sync + async) with that.

@jerch
Copy link
Member Author

jerch commented Jan 16, 2021

Just had another look at the input chain. The sync part down to individual handlers roughly looks like this:

-->  InputHandler.parse
  for subchunk in chunk do:
    --> UTF32.decode
    --> EscapeSequenceParser.parse
      for codepoint in chunk do:
        action, state = TRANSITION[codepoint]
        # some more state collecting here
        collected = ...
        if action do:
          for handler in handlers[action, collected] do:
            --> handler                   // <---- now this might be async eventually :O
      collect on exit
  collect on exit

Thats not a very deep callstack yet, so yeah it prolly can be reshaped into async support with blocking semantics with an exceptable effort. Next question here would be whether to go with promises or with manual restarts. Both have pros and cons:

Promises:

  • genuine JS way to solve those issues (pro for easy coding / comprehension)
  • viral, introducing promises deeper in a sync callstack force a rewrite to promises of the higher ups as well if execution order shall be preserved (which is the case here)
  • high runtime costs on promise creation and handling (3-5 times slower)

manual restarting:

  • pretty much like stack unwinding (which is not directly possible in JS)
  • unintuitive, not a typical JS paradigm (very C-ish), needs custom interface
  • also viral, needs rewrite of every call on the callstack to preserve and exit at random states (here handler --> EscapeSequenceParser.parse --> InputHandler.parse), reentering continues from preserved state
  • very fast and optimizable (a restart will be treated as a sync code block again)

At the current state I am biased towards the manual restart, I really fear a bad drop of the throughput with a pure promise based variant. Ofc this needs testing, still I'd expect promises being 3-5 times slower, and the manual restart somewhere 1.5-2 times slower. There is another point that adds towards the munal restart - EscapeSequenceParser.parse is already streamlined, thus resuming from random positions is already there. It would just need some additional handler return value eval plus resuming in InputHandler.parse.

Edit:
Note that I am just talking about the inner works of the callstack above. Even with manual resuming within these, the handler callbacks could be done with promise interfaces later on.

Edit2:
There is another way to solve this - with generators. Sadly I have not much experience with them in JS, neither coding/debugging-wise nor perf-wise. Last I looked them up (2 years ago) they were still a nightmare on the debugging side and not speedy either.

@jerch
Copy link
Member Author

jerch commented Jan 16, 2021

Did a quick rewrite of both parse methods to promises and awaiting handler invocations. These are the results:

   Context "out-test/benchmark/Terminal.benchmark.js"
      Context "Terminal: ls -lR /usr/lib"
         Context "write"
            Case "#1" : 5 runs - average throughput: 5.97 MB/s
         Context "writeUtf8"
            Case "#1" : 5 runs - average throughput: 6.81 MB/s

Compared to purely sync exection:

  Context "out-test/benchmark/Terminal.benchmark.js"
      Context "Terminal: ls -lR /usr/lib"
         Context "write"
            Case "#1" : 5 runs - average throughput: 24.49 MB/s
         Context "writeUtf8"
            Case "#1" : 5 runs - average throughput: 26.23 MB/s

Quite underwhelming - promises on InputHandler.parse --> EscapeSequenceParser.parse make input processing ~4 times slower. (And thats only for those two parse methods as I did not promisify any of the handlers.) 😿

@jerch
Copy link
Member Author

jerch commented Jan 17, 2021

I think the following would be possible as optional async interface:

registerCsiHandler(
  id: IFunctionIdentifier,
  callback: (
    params: (number | number[])[],
    block: () => ((result:boolean) => void)) => void
  ): IDisposable;

and used as something like this:

const asyncHandler = term.registerCsiHandler({...}, (params, block) => {
  // block() creates the needed state preservation in the background
  // and returns a function to continue later from
  const unblock = block();
  // do something async, unblock when done
  new Promise(res => { /* async stuff */ }).then(() => unblock(true));
});

Something like that should be possible with manual resuming of the parser parts without polluting everything with promises. Only a few handlers that might need the async capabilities and call block would trigger a full stop on the chain, pretty much like async "promise islands" between sync input processing.
Ofc I have not tested that yet, a rewrite with manual resuming is much more involved.

@jerch
Copy link
Member Author

jerch commented Jan 18, 2021

A (not so) quick rewrite with manual resuming shows these results:

   Context "out-test/benchmark/Terminal.benchmark.js"
      Context "Terminal: ls -lR /usr/lib"
         Context "write"
            Case "#1" : 5 runs - average throughput: 16.21 MB/s
         Context "writeUtf8"
            Case "#1" : 5 runs - average throughput: 17.41 MB/s

(Note: As nodejs is missing a native queueMicrotask implementation, this was done with a hack with process.nextTick, thus is less reliable in comparison with in-browser results, which tends to be faster than the nodejs hack.)

To get an impact on throughput I had to mark SGR as async, which is quite often in my ls -lR output. Without any async handler the whole input processing stays in sync code. Thats imho an advantage, since we only have sync handlers currently.

Also after some refactoring the handler interface looks pretty easy:

registerCsiHandler(
  id: IFunctionIdentifier,
  callback: (params: (number | number[])[]) => boolean | Promise<boolean>
): IDisposable;

Imho those results with manual resuming are quite promising, given that almost every line in the test output contains at least one SGR command. Ofc this was just a hack on the parser and the inputhandler, before getting anywhere this needs proper testing.

@Tyriar
Copy link
Member

Tyriar commented Jan 19, 2021

So using the boolean | Promise<boolean> is a good option then as this is only expected to be used in embedder handlers?

I know we've had the promise discussion before and I'm fine to leave them out because of that, I only mentioned them because the T | Promise<T> pattern is pretty great when used in VS Code's API. It's much simpler from an embedder perspective compared to an optional async callback, additionally I would expect these async handlers to be used sparingly.

@jerch
Copy link
Member Author

jerch commented Jan 19, 2021

T | Promise<T> should be possible. And yes, async handlers should be used with great care only, they will slow down the input processing alot, no matter how we deal with the state internally in the end. This mainly results from the needed "code slicing" and re-scheduling of the continuation (+ promise creation in case of async parse methods) - the higher the async load is, the higher gets this "bookkeeping".

Need a few more tests with the conceptual ideas before getting down to a PR.

@mofux
Copy link
Contributor

mofux commented Jan 19, 2021

I can't provide much technical input as I'm not very familiar with the parser, but I enjoyed watching this issue on its journey over last couple of days. Thanks for sharing all the updates, it really helped me to get an understanding of the technical solution you are striving for. I think your latest iteration looks pretty good.

@jerch
Copy link
Member Author

jerch commented Jan 20, 2021

Have a prototype up for ESC and CSI async handlers with these results (again with SGR marked as async):

   Context "out-test/benchmark/Terminal.benchmark.js"
      Context "Terminal: ls -lR /usr/lib"
         Context "write"
            Case "#1" : 30 runs - average throughput: 20.47 MB/s
         Context "writeUtf8"
            Case "#1" : 30 runs - average throughput: 21.29 MB/s

Note the higher run count, seems the v8 optimizer has much more trouble to optimize the parse slices, as it converges slower to stable numbers than for sync code (at the normal 5 runs the numbers are around 16.5 MB/s resp. 19 MB/s).

There are several things complicating a straight forward async support across handler types:

  • setHandler vs. addHandler on the parser with different return semantics, kinda the result of the parser evolution while a proper rewrite of inputhandler methods never took place. We should simplify that to just one interface upfront to avoid unnecessary complexity in the async handling.
  • DCS and OSC support is more involved (needs changes on subparsers, not done yet).
  • WriteBuffer.writeSync - This simply does not work anymore with possibly async handlers in between. Used alot by test code, refactoring would touch many of the core test cases. Imho we should fix that soon, for now it could stay as it is under the assumption, that a certain test only uses sync handlers (which is the case currently).
  • Renderer has no FPS tracking on its own. Not really urgent, but it creates nonsense screen updates for long running async handlers, that schedule their work with proper setTimeout slices. Here the code in WriteBuffer does not know about a previous screen update and would schedule another one.

@jerch jerch mentioned this issue Jan 20, 2021
20 tasks
@jerch jerch self-assigned this Feb 1, 2021
@Tyriar Tyriar added this to the 4.12.0 milestone Mar 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/parser type/enhancement Features or improvements to existing features
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants