Skip to content

Commit 007b5f6

Browse files
committed
more notes on benchmarks
1 parent 66ccacd commit 007b5f6

File tree

2 files changed

+17
-17
lines changed

2 files changed

+17
-17
lines changed

_posts/2014-07-15-parallel-distance-matrix.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -60,14 +60,15 @@ js_distance <- function(mat) {
6060

6161
Here is a re-implementation of `js_distance` using Rcpp. Note that this
6262
doesn't yet take advantage of parallel processing, but still yields an
63-
approximately 35x speedup over the original R version.
63+
approximately 50x speedup over the original R version on a 2.6GHz Haswell
64+
MacBook Pro.
6465

6566
Abstractly, a Distance function will take two vectors in R<sup>J</sup> and
6667
return a value in R<sup>+</sup>. In this implementation, we don't support
6768
arbitrary distance metrics, i.e., the JSD code computes the values from
6869
within the parallel kernel.
6970

70-
Our distance function `kl_divergence` is defined below and takes three
71+
Our distance function `kl_divergence` is defined below and takes three
7172
parameters: iterators to the beginning and end of vector 1 and an iterator to
7273
the beginning of vector 2 (the end position of vector2 is implied by the end
7374
position of vector1).
@@ -168,9 +169,9 @@ parallel code is almost identical to the serial code. The main difference is
168169
that the outer loop starts with the `begin` index passed to the worker
169170
function rather than 0.
170171

171-
Parallelizing in this case has big payoff: we observe performance of about 6x
172-
the serial version on a machine with 4 cores (8 with hyperthreading). Here is
173-
the definition of the `JsDistance` function object:
172+
Parallelizing in this case has a big payoff: we observe performance of about
173+
5.5x the serial version on a 2.6GHz Haswell MacBook Pro with 4 cores (8 with
174+
hyperthreading). Here is the definition of the `JsDistance` function object:
174175

175176
{% highlight cpp %}
176177
// [[Rcpp::depends(RcppParallel)]]
@@ -274,10 +275,9 @@ res[,1:4]
274275
1 js_distance(m) 3 35.560 323.273
275276
</pre>
276277

277-
The serial Rcpp version yields a more than 50x speedup over straight R code.
278-
On a machine with 4 cores (8 with hyperthreading) the parallel Rcpp version
279-
provides another 5.5x speedup, amounting to a total gain of over 300x
280-
compared to the original R version.
278+
The serial Rcpp version yields a more than 50x speedup over straight R code.
279+
The parallel Rcpp version provides another 5.5x speedup, amounting to a total
280+
gain of over 300x compared to the original R version.
281281

282282
Note that performance gains will typically be 30-50% less on Windows systems
283283
as a result of less sophisticated thread scheduling (RcppParallel does not

src/2014-07-15-parallel-distance-matrix.cpp

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -63,14 +63,15 @@ js_distance <- function(mat) {
6363
*
6464
* Here is a re-implementation of `js_distance` using Rcpp. Note that this
6565
* doesn't yet take advantage of parallel processing, but still yields an
66-
* approximately 35x speedup over the original R version.
66+
* approximately 50x speedup over the original R version on a 2.6GHz Haswell
67+
* MacBook Pro.
6768
*
6869
* Abstractly, a Distance function will take two vectors in R<sup>J</sup> and
6970
* return a value in R<sup>+</sup>. In this implementation, we don't support
7071
* arbitrary distance metrics, i.e., the JSD code computes the values from
7172
* within the parallel kernel.
7273
*
73-
* Our distance function `kl_divergence` is defined below and takes three
74+
* Our distance function `kl_divergence` is defined below and takes three
7475
* parameters: iterators to the beginning and end of vector 1 and an iterator to
7576
* the beginning of vector 2 (the end position of vector2 is implied by the end
7677
* position of vector1).
@@ -173,9 +174,9 @@ NumericMatrix rcpp_js_distance(NumericMatrix mat) {
173174
* that the outer loop starts with the `begin` index passed to the worker
174175
* function rather than 0.
175176
*
176-
* Parallelizing in this case has big payoff: we observe performance of about 6x
177-
* the serial version on a machine with 4 cores (8 with hyperthreading). Here is
178-
* the definition of the `JsDistance` function object:
177+
* Parallelizing in this case has a big payoff: we observe performance of about
178+
* 5.5x the serial version on a 2.6GHz Haswell MacBook Pro with 4 cores (8 with
179+
* hyperthreading). Here is the definition of the `JsDistance` function object:
179180
*/
180181

181182
// [[Rcpp::depends(RcppParallel)]]
@@ -277,9 +278,8 @@ res[,1:4]
277278

278279
/**
279280
* The serial Rcpp version yields a more than 50x speedup over straight R code.
280-
* On a machine with 4 cores (8 with hyperthreading) the parallel Rcpp version
281-
* provides another 5.5x speedup, amounting to a total gain of over 300x
282-
* compared to the original R version.
281+
* The parallel Rcpp version provides another 5.5x speedup, amounting to a total
282+
* gain of over 300x compared to the original R version.
283283
*
284284
* Note that performance gains will typically be 30-50% less on Windows systems
285285
* as a result of less sophisticated thread scheduling (RcppParallel does not

0 commit comments

Comments
 (0)