Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible to provide case-level setup/teardown fns? #8

Open
lynaghk opened this issue Nov 7, 2013 · 4 comments
Open

Possible to provide case-level setup/teardown fns? #8

lynaghk opened this issue Nov 7, 2013 · 4 comments

Comments

@lynaghk
Copy link

lynaghk commented Nov 7, 2013

I'm trying to performance test FUSE bindings to Clojure, and have a few questions about perforate:

  1. I already have a with-mounted-fs macro, and would prefer to continue to use that; is there any way I can do so without including the mounting time inside of the benchmark?
    For example, I can write a plain criterium test (with qb being the quick bench macro):

    (with-mounted-fs (context (atom {"foo" (mmapped-file big-data-file)})) mountpoint
     (qb "reading 100MB through FUSE via mmapped file"
        (Files/readAllBytes (.toPath (File. mountpoint "foo")))))
  2. Is it possible to have case-level setup/teardown fns?

Maybe both of these issues could be solved by introducing a symbol that specifies which subform you actually want benchmarked?
That way consumers could continue to use their own contextual macros (e.g. with-open).

@davidsantiago
Copy link
Owner

I think you'll have problems with this sort of thing in the way perforate is written. I made it as a thin layer on Criterium, where the data that comes out would get managed and displayed for you. So it just takes a function and gives that function (closure, really) to Criterium, as Criterium wants. I've tried to make it handle as many situations as possible, but I've come to think that it's not sufficient for everything, and it has added a lot of complexity to the interface, trying to make it all fit into that framework. I know exactly how to fix it for a version 2.0, but I don't have the time to make that happen (so far). Sorry!

@lynaghk
Copy link
Author

lynaghk commented Nov 7, 2013

If you're open to collaborate on a version 2.0, I'd love to hear your design and take a crack at it.

@davidsantiago
Copy link
Owner

Oh, absolutely. Here's a sketch:

The main idea is to give finer control over what gets benchmarked by providing a timer object (call it Timer) that you can stop and restart as your function runs. This should provide the full generality to time only the things you are interested in. By separating out the timing from the execution/testing harness and giving explicit control, you no longer have to twist code into pretzels to fit it into the framework of a single function whose entire runtime is what is measured. Do all the startup/teardown you want, whenever, as long as you're stopping and restarting the timer to clip it away.

Another idea that I think is important is adding the notion of number of repetitions into your benchmark itself. Most small functions are probably too small to get a reliable timing accuracy on a JVM, which has a timing accuracy no better than a microsecond in any case I've ever seen. Small functions need to be run multiple times to get a reliable measurement. As perforate currently is, I always manually set up a loop in the benchmark and then that is what gets timed and reported and I have to remember to manually reverse that back out into a "time per op." It would be better to have the framework tell you how many times it wants you to do something, and then write your benchmark to do it that many times, at which point it will calculate everything "per op" for you.

This has a secondary benefit, which is that it allows for adaptive sampling, where the framework can run your functions until it is getting stable timings for each op and then stop. As it works now, once you've written your benchmark to do something say 100 times, to overcome the low-accuracy timer, if that becomes a longish operation, Criterium doesn't know how to multiply by 100, so it doesn't realize it is getting 100x the sample information that it is actually seeing, and it insists on long runs which take a good deal of time, until it is satisfied with its sample sizes. This way, perforate could start things off with say N=100, and then try N=200 and see that if there is not much timing variance between them, it can just stop and provide the answer, instead of running for many seconds on each test because it wants several hundred samples of something that is already repeating the operation 100 times. I think Criterium's very long runtimes is one of the huge practical issues with it, currently.

So this could look something like this (warning, text-field coding):

(defgoal FUSE-bench "A simple benchmark for FUSE bindings to Clojure.")

(defcase FUSE-bench :with-startup
  "reading 100MB through FUSE via mmapped file, including one-time startup/teardown"
  [n] ;; <- Perforate requests you do your thing n times.
 ;; Lack of timer argument means the entire runtime will be timed, including the startup and teardown! 
 ;; Still, the ability to leave off the timer arg should be nice for simple cases that don't do startup/teardown.
 (with-mounted-fs (context (atom {"foo" (mmapped-file big-data-file)})) mountpoint
   (dotimes [_ n]
     (Files/readAllBytes (.toPath (File. mountpoint "foo"))))))

(defcase FUSE-bench :just-core-ops
  "reading 100MB through FUSE via mmapped file, without one-time startup/teardown"
  [n t] ;; <- The Timer object, must be started to time anything.
  (with-mounted-fs (context (atom {"foo" (mmapped-file big-data-file)})) mountpoint
   (start t)  ;; There's also stop/reset available for more complex situations, but if a case returns, 
                ;; timer is stopped automatically. Can start/stop/start as much as you want.
   (dotimes [_ n]
     (Files/readAllBytes (.toPath (File. mountpoint "foo"))))))

As you can see, there is a great deal of granularity possible now, and the cases read more like regular functions that just happen to have timing directives in them to chisel out what you are interested in. No closure contortions or special setup/teardown functions needed. The one rub is that doing it this way means that perforate now has to keep track of the timings itself, and then report those to Criterium via its benchmark-stats function, instead of having Criterium run the function and track those stats. I think this is a worthwhile tradeoff, and still retains Criterium for the statistical heavy lifting. Just pulls out the job of running the benchmarks and timing them.

I think this design is much simpler both in the macro logic and also for the user, but I'm sure there'll be wrinkles here and there. It's quite a bit of work, in any case. I'm interested in your thoughts!

@saulshanabrook
Copy link

@davidsantiago This seems a lot like how Go benchmarks work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants