method to propagate backpressure from parser handler #3218

jerch · 2021-01-15T12:59:04Z

First some background on this idea (coming from the image PR):

Currently SIXELs are decoded synchronously and the image decoding can correctly slow down the whole input processing chain, which leads to a drop of callback calls in WriteBuffer. Ok - sync processing works.

Ofc image decoding creates quite alot fuzz on the main thread and prolly should be done in a worker (for PNG/JPEG it has to be done async anyway). And thats where the data pressure hell begins - the SIXEL decoder has a throughput of ~50 MB/s while the DCS parser path operates at ~250 MB/s, means the terminal floods the decoder with 200 MB more data than it can process in a second (worst case). Currently there is no way to inform the WriteBuffer asynchronously about that data pressure condition from within the terminal.

Repro:
Doing cat *.sixel in a folder full of sixel images with my async decoder triggers the issue. Same with mpv with their new sixel video output above a certain output size.

In general:
This is currently a non-issue for our normal sync handler actions, as sync code always "blocks" and would indicate data pressure by a lower callback rate in WriteBuffer. But any async handler code with lower throughput than the parser itself will run into this. It is def. not only a problem of image processing, it just happened to be the first complicated async handler.

Idea:
It would be good to have some way to pause/resume the data chunk processing on WriteBuffer from a parser handler. Given that flow control is properly set up for the terminal itself, this enables correct backpressure signalling from async parser handler code as well.

Up for discussion.

Edit: I dont need this fine-grained to a single input buffer position (although this would solve many issues around async handlers in general, the needed stack unwinding is not worth it), I think our already established chunk border stops are good enough.

The text was updated successfully, but these errors were encountered:

Tyriar · 2021-01-15T16:45:58Z

To clarify, this isn't about async/sync writes getting out of order, it's about locking up the main thread when writing a lot of data where a handler (csi handler?) is slow to parse.

jerch · 2021-01-15T19:41:46Z

Yes, the latter point would be a reason to offload the hard work to a worker (as with the image processing example). But of course this applies to any handler that takes really long.

The execution order or state synchronisation is a different issue that would arise from worker usage, if the worker wants to place data back in the terminal state. Then this is a serious issue on its own, but not what I am talking here about.

Lemme illustrate this further - given you have some CSI handler, that invokes an async worker interface with portions of the terminal data. This sounds easy to do:

const fancyStuff = new Worker('process_fancy_stuff.js');

term.registerCsiHandler({some_symbols}, params => {
  // offload hard work
  fancyStuff.postMessage(someLongishTermData); // <-- Where does the data end up if worker is slow?
});

This works reliable, as long as it is guaranteed, that the incoming data rate is lower than the data rate of fancyStuff- the message queue will be drained in time. Not so for a slower processing in the worker (and most likely the case for any async part in handlers) - the queue will grow without a chance to stop input processing up to the csi handler, thus more and more data might arrive.

Thats the point where I think pausing the WriteBuffer from within the terminal helps to keep high input under control. By the way - it is the logical coninuation of the flow control matter to async handlers (much same as with pty-node --> xterm.js, but now xterm.js --> async handler).

Tyriar · 2021-01-15T19:53:11Z

Oh the csi handlers are synchronous atm, I think the bit I wasn't expected was the API is this:

registerCsiHandler(id: IFunctionIdentifier, callback: (params: (number | number[])[]) => boolean): IDisposable;

Not this:

registerCsiHandler(id: IFunctionIdentifier, callback: (params: (number | number[])[]) => boolean | Promise<boolean>): IDisposable;

Or:

registerCsiHandler(
  id: IFunctionIdentifier,
  callback: (
    params: (number | number[])[],
    resultCallback: (result: boolean) => void) => void
  ): IDisposable;

The right fix is to block writing completely as above isn't it? That seems more reliable/easier to use than adding a pause/resume mechanism.

jerch · 2021-01-16T12:08:11Z

Do you have a use case for a handler being promise / callback result based? If so, then we prolly should approach the issue with real in-band blocking, otherwise we cannot guarantee handler execution order anymore.

My use case here is more relaxed in this regard, it does not depend on correct execution order down to a single handler + its result, it only needs the blocking semantics on the data pressure part. Promises would also solve this, but they come to a really high price - we would have to reshape the whole parser chain to async support with promises. Following my speed tests here, promises at least triple the runtime compared to pure sync code (actually it is more like 3-5 times slower depending on the promises "granularity").

But there is hope for a middle ground - those tests also show that async code based on direct microtask callbacks is less painful only doubling the runtime. So if you need true in-order execution async support with blocking semantics, we might be able to get the best of both worlds (sync + async) with that.

jerch · 2021-01-16T13:24:03Z

Just had another look at the input chain. The sync part down to individual handlers roughly looks like this:

-->  InputHandler.parse
  for subchunk in chunk do:
    --> UTF32.decode
    --> EscapeSequenceParser.parse
      for codepoint in chunk do:
        action, state = TRANSITION[codepoint]
        # some more state collecting here
        collected = ...
        if action do:
          for handler in handlers[action, collected] do:
            --> handler                   // <---- now this might be async eventually :O
      collect on exit
  collect on exit

Thats not a very deep callstack yet, so yeah it prolly can be reshaped into async support with blocking semantics with an exceptable effort. Next question here would be whether to go with promises or with manual restarts. Both have pros and cons:

Promises:

genuine JS way to solve those issues (pro for easy coding / comprehension)
viral, introducing promises deeper in a sync callstack force a rewrite to promises of the higher ups as well if execution order shall be preserved (which is the case here)
high runtime costs on promise creation and handling (3-5 times slower)

manual restarting:

pretty much like stack unwinding (which is not directly possible in JS)
unintuitive, not a typical JS paradigm (very C-ish), needs custom interface
also viral, needs rewrite of every call on the callstack to preserve and exit at random states (here handler --> EscapeSequenceParser.parse --> InputHandler.parse), reentering continues from preserved state
very fast and optimizable (a restart will be treated as a sync code block again)

At the current state I am biased towards the manual restart, I really fear a bad drop of the throughput with a pure promise based variant. Ofc this needs testing, still I'd expect promises being 3-5 times slower, and the manual restart somewhere 1.5-2 times slower. There is another point that adds towards the munal restart - EscapeSequenceParser.parse is already streamlined, thus resuming from random positions is already there. It would just need some additional handler return value eval plus resuming in InputHandler.parse.

Edit:
Note that I am just talking about the inner works of the callstack above. Even with manual resuming within these, the handler callbacks could be done with promise interfaces later on.

Edit2:
There is another way to solve this - with generators. Sadly I have not much experience with them in JS, neither coding/debugging-wise nor perf-wise. Last I looked them up (2 years ago) they were still a nightmare on the debugging side and not speedy either.

jerch · 2021-01-16T23:27:38Z

Did a quick rewrite of both parse methods to promises and awaiting handler invocations. These are the results:

   Context "out-test/benchmark/Terminal.benchmark.js"
      Context "Terminal: ls -lR /usr/lib"
         Context "write"
            Case "#1" : 5 runs - average throughput: 5.97 MB/s
         Context "writeUtf8"
            Case "#1" : 5 runs - average throughput: 6.81 MB/s

Compared to purely sync exection:

  Context "out-test/benchmark/Terminal.benchmark.js"
      Context "Terminal: ls -lR /usr/lib"
         Context "write"
            Case "#1" : 5 runs - average throughput: 24.49 MB/s
         Context "writeUtf8"
            Case "#1" : 5 runs - average throughput: 26.23 MB/s

Quite underwhelming - promises on InputHandler.parse --> EscapeSequenceParser.parse make input processing ~4 times slower. (And thats only for those two parse methods as I did not promisify any of the handlers.) 😿

jerch · 2021-01-17T22:42:20Z

I think the following would be possible as optional async interface:

registerCsiHandler(
  id: IFunctionIdentifier,
  callback: (
    params: (number | number[])[],
    block: () => ((result:boolean) => void)) => void
  ): IDisposable;

and used as something like this:

const asyncHandler = term.registerCsiHandler({...}, (params, block) => {
  // block() creates the needed state preservation in the background
  // and returns a function to continue later from
  const unblock = block();
  // do something async, unblock when done
  new Promise(res => { /* async stuff */ }).then(() => unblock(true));
});

Something like that should be possible with manual resuming of the parser parts without polluting everything with promises. Only a few handlers that might need the async capabilities and call block would trigger a full stop on the chain, pretty much like async "promise islands" between sync input processing.
Ofc I have not tested that yet, a rewrite with manual resuming is much more involved.

jerch · 2021-01-18T15:50:29Z

A (not so) quick rewrite with manual resuming shows these results:

   Context "out-test/benchmark/Terminal.benchmark.js"
      Context "Terminal: ls -lR /usr/lib"
         Context "write"
            Case "#1" : 5 runs - average throughput: 16.21 MB/s
         Context "writeUtf8"
            Case "#1" : 5 runs - average throughput: 17.41 MB/s

(Note: As nodejs is missing a native queueMicrotask implementation, this was done with a hack with process.nextTick, thus is less reliable in comparison with in-browser results, which tends to be faster than the nodejs hack.)

To get an impact on throughput I had to mark SGR as async, which is quite often in my ls -lR output. Without any async handler the whole input processing stays in sync code. Thats imho an advantage, since we only have sync handlers currently.

Also after some refactoring the handler interface looks pretty easy:

registerCsiHandler(
  id: IFunctionIdentifier,
  callback: (params: (number | number[])[]) => boolean | Promise<boolean>
): IDisposable;

Imho those results with manual resuming are quite promising, given that almost every line in the test output contains at least one SGR command. Ofc this was just a hack on the parser and the inputhandler, before getting anywhere this needs proper testing.

Tyriar · 2021-01-19T11:26:12Z

So using the boolean | Promise<boolean> is a good option then as this is only expected to be used in embedder handlers?

I know we've had the promise discussion before and I'm fine to leave them out because of that, I only mentioned them because the T | Promise<T> pattern is pretty great when used in VS Code's API. It's much simpler from an embedder perspective compared to an optional async callback, additionally I would expect these async handlers to be used sparingly.

jerch · 2021-01-19T12:04:10Z

T | Promise<T> should be possible. And yes, async handlers should be used with great care only, they will slow down the input processing alot, no matter how we deal with the state internally in the end. This mainly results from the needed "code slicing" and re-scheduling of the continuation (+ promise creation in case of async parse methods) - the higher the async load is, the higher gets this "bookkeeping".

Need a few more tests with the conceptual ideas before getting down to a PR.

mofux · 2021-01-19T15:34:24Z

I can't provide much technical input as I'm not very familiar with the parser, but I enjoyed watching this issue on its journey over last couple of days. Thanks for sharing all the updates, it really helped me to get an understanding of the technical solution you are striving for. I think your latest iteration looks pretty good.

jerch · 2021-01-20T15:05:41Z

Have a prototype up for ESC and CSI async handlers with these results (again with SGR marked as async):

   Context "out-test/benchmark/Terminal.benchmark.js"
      Context "Terminal: ls -lR /usr/lib"
         Context "write"
            Case "#1" : 30 runs - average throughput: 20.47 MB/s
         Context "writeUtf8"
            Case "#1" : 30 runs - average throughput: 21.29 MB/s

Note the higher run count, seems the v8 optimizer has much more trouble to optimize the parse slices, as it converges slower to stable numbers than for sync code (at the normal 5 runs the numbers are around 16.5 MB/s resp. 19 MB/s).

There are several things complicating a straight forward async support across handler types:

setHandler vs. addHandler on the parser with different return semantics, kinda the result of the parser evolution while a proper rewrite of inputhandler methods never took place. We should simplify that to just one interface upfront to avoid unnecessary complexity in the async handling.
DCS and OSC support is more involved (needs changes on subparsers, not done yet).
WriteBuffer.writeSync - This simply does not work anymore with possibly async handlers in between. Used alot by test code, refactoring would touch many of the core test cases. Imho we should fix that soon, for now it could stay as it is under the assumption, that a certain test only uses sync handlers (which is the case currently).
Renderer has no FPS tracking on its own. Not really urgent, but it creates nonsense screen updates for long running async handlers, that schedule their work with proper setTimeout slices. Here the code in WriteBuffer does not know about a previous screen update and would schedule another one.

jerch added type/enhancement Features or improvements to existing features area/parser labels Jan 15, 2021

jerch mentioned this issue Jan 20, 2021

async parser handler support #3222

Merged

20 tasks

jerch self-assigned this Feb 1, 2021

Tyriar added this to the 4.12.0 milestone Mar 15, 2021

jerch closed this as completed in #3222 Mar 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

method to propagate backpressure from parser handler #3218

method to propagate backpressure from parser handler #3218

jerch commented Jan 15, 2021 •

edited

Loading

Tyriar commented Jan 15, 2021

jerch commented Jan 15, 2021

Tyriar commented Jan 15, 2021

jerch commented Jan 16, 2021

jerch commented Jan 16, 2021 •

edited

Loading

jerch commented Jan 16, 2021 •

edited

Loading

jerch commented Jan 17, 2021 •

edited

Loading

jerch commented Jan 18, 2021

Tyriar commented Jan 19, 2021

jerch commented Jan 19, 2021 •

edited

Loading

mofux commented Jan 19, 2021 •

edited

Loading

jerch commented Jan 20, 2021 •

edited

Loading

method to propagate backpressure from parser handler #3218

method to propagate backpressure from parser handler #3218

Comments

jerch commented Jan 15, 2021 • edited Loading

Tyriar commented Jan 15, 2021

jerch commented Jan 15, 2021

Tyriar commented Jan 15, 2021

jerch commented Jan 16, 2021

jerch commented Jan 16, 2021 • edited Loading

jerch commented Jan 16, 2021 • edited Loading

jerch commented Jan 17, 2021 • edited Loading

jerch commented Jan 18, 2021

Tyriar commented Jan 19, 2021

jerch commented Jan 19, 2021 • edited Loading

mofux commented Jan 19, 2021 • edited Loading

jerch commented Jan 20, 2021 • edited Loading

jerch commented Jan 15, 2021 •

edited

Loading

jerch commented Jan 16, 2021 •

edited

Loading

jerch commented Jan 16, 2021 •

edited

Loading

jerch commented Jan 17, 2021 •

edited

Loading

jerch commented Jan 19, 2021 •

edited

Loading

mofux commented Jan 19, 2021 •

edited

Loading

jerch commented Jan 20, 2021 •

edited

Loading