Possible to provide case-level setup/teardown fns? #8

lynaghk · 2013-11-07T19:53:11Z

I'm trying to performance test FUSE bindings to Clojure, and have a few questions about perforate:

I already have a with-mounted-fs macro, and would prefer to continue to use that; is there any way I can do so without including the mounting time inside of the benchmark?
For example, I can write a plain criterium test (with qb being the quick bench macro):
```
(with-mounted-fs (context (atom {"foo" (mmapped-file big-data-file)})) mountpoint
 (qb "reading 100MB through FUSE via mmapped file"
    (Files/readAllBytes (.toPath (File. mountpoint "foo")))))
```
Is it possible to have case-level setup/teardown fns?

Maybe both of these issues could be solved by introducing a symbol that specifies which subform you actually want benchmarked?
That way consumers could continue to use their own contextual macros (e.g. with-open).

The text was updated successfully, but these errors were encountered:

davidsantiago · 2013-11-07T21:30:24Z

I think you'll have problems with this sort of thing in the way perforate is written. I made it as a thin layer on Criterium, where the data that comes out would get managed and displayed for you. So it just takes a function and gives that function (closure, really) to Criterium, as Criterium wants. I've tried to make it handle as many situations as possible, but I've come to think that it's not sufficient for everything, and it has added a lot of complexity to the interface, trying to make it all fit into that framework. I know exactly how to fix it for a version 2.0, but I don't have the time to make that happen (so far). Sorry!

lynaghk · 2013-11-07T21:39:42Z

If you're open to collaborate on a version 2.0, I'd love to hear your design and take a crack at it.

davidsantiago · 2013-11-07T22:49:52Z

Oh, absolutely. Here's a sketch:

The main idea is to give finer control over what gets benchmarked by providing a timer object (call it Timer) that you can stop and restart as your function runs. This should provide the full generality to time only the things you are interested in. By separating out the timing from the execution/testing harness and giving explicit control, you no longer have to twist code into pretzels to fit it into the framework of a single function whose entire runtime is what is measured. Do all the startup/teardown you want, whenever, as long as you're stopping and restarting the timer to clip it away.

Another idea that I think is important is adding the notion of number of repetitions into your benchmark itself. Most small functions are probably too small to get a reliable timing accuracy on a JVM, which has a timing accuracy no better than a microsecond in any case I've ever seen. Small functions need to be run multiple times to get a reliable measurement. As perforate currently is, I always manually set up a loop in the benchmark and then that is what gets timed and reported and I have to remember to manually reverse that back out into a "time per op." It would be better to have the framework tell you how many times it wants you to do something, and then write your benchmark to do it that many times, at which point it will calculate everything "per op" for you.

This has a secondary benefit, which is that it allows for adaptive sampling, where the framework can run your functions until it is getting stable timings for each op and then stop. As it works now, once you've written your benchmark to do something say 100 times, to overcome the low-accuracy timer, if that becomes a longish operation, Criterium doesn't know how to multiply by 100, so it doesn't realize it is getting 100x the sample information that it is actually seeing, and it insists on long runs which take a good deal of time, until it is satisfied with its sample sizes. This way, perforate could start things off with say N=100, and then try N=200 and see that if there is not much timing variance between them, it can just stop and provide the answer, instead of running for many seconds on each test because it wants several hundred samples of something that is already repeating the operation 100 times. I think Criterium's very long runtimes is one of the huge practical issues with it, currently.

So this could look something like this (warning, text-field coding):

(defgoal FUSE-bench "A simple benchmark for FUSE bindings to Clojure.")

(defcase FUSE-bench :with-startup
  "reading 100MB through FUSE via mmapped file, including one-time startup/teardown"
  [n] ;; <- Perforate requests you do your thing n times.
 ;; Lack of timer argument means the entire runtime will be timed, including the startup and teardown! 
 ;; Still, the ability to leave off the timer arg should be nice for simple cases that don't do startup/teardown.
 (with-mounted-fs (context (atom {"foo" (mmapped-file big-data-file)})) mountpoint
   (dotimes [_ n]
     (Files/readAllBytes (.toPath (File. mountpoint "foo"))))))

(defcase FUSE-bench :just-core-ops
  "reading 100MB through FUSE via mmapped file, without one-time startup/teardown"
  [n t] ;; <- The Timer object, must be started to time anything.
  (with-mounted-fs (context (atom {"foo" (mmapped-file big-data-file)})) mountpoint
   (start t)  ;; There's also stop/reset available for more complex situations, but if a case returns, 
                ;; timer is stopped automatically. Can start/stop/start as much as you want.
   (dotimes [_ n]
     (Files/readAllBytes (.toPath (File. mountpoint "foo"))))))

As you can see, there is a great deal of granularity possible now, and the cases read more like regular functions that just happen to have timing directives in them to chisel out what you are interested in. No closure contortions or special setup/teardown functions needed. The one rub is that doing it this way means that perforate now has to keep track of the timings itself, and then report those to Criterium via its benchmark-stats function, instead of having Criterium run the function and track those stats. I think this is a worthwhile tradeoff, and still retains Criterium for the statistical heavy lifting. Just pulls out the job of running the benchmarks and timing them.

I think this design is much simpler both in the macro logic and also for the user, but I'm sure there'll be wrinkles here and there. It's quite a bit of work, in any case. I'm interested in your thoughts!

saulshanabrook · 2017-08-15T00:59:59Z

@davidsantiago This seems a lot like how Go benchmarks work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible to provide case-level setup/teardown fns? #8

Possible to provide case-level setup/teardown fns? #8

lynaghk commented Nov 7, 2013

davidsantiago commented Nov 7, 2013

lynaghk commented Nov 7, 2013

davidsantiago commented Nov 7, 2013

saulshanabrook commented Aug 15, 2017

Possible to provide case-level setup/teardown fns? #8

Possible to provide case-level setup/teardown fns? #8

Comments

lynaghk commented Nov 7, 2013

davidsantiago commented Nov 7, 2013

lynaghk commented Nov 7, 2013

davidsantiago commented Nov 7, 2013

saulshanabrook commented Aug 15, 2017