@@ -13,10 +13,10 @@ src: 2018-04-18-armadillo-sparse-matrix-performance.Rmd
13
13
14
14
15
15
16
- The Armadillo library provides a great way to manipulate sparse matrices in C++. However, the
17
- performance characteristics of dealing with sparse matrices may be surprising if one is only
18
- familiar with dense matrices. This is a collection of observations on getting best performance with
19
- sparse matrices in Armadillo.
16
+ Besides outstanding support for dense matrices, the Armadillo library also provides a great way to
17
+ manipulate sparse matrices in C++. However, the performance characteristics of dealing with sparse
18
+ matrices may be surprising if one is only familiar with dense matrices. This is a collection of
19
+ observations on getting best performance with sparse matrices in Armadillo.
20
20
21
21
All the timings in this article were generated using Armadillo version 8.500. This version adds a
22
22
number of substantial optimisations for sparse matrix operations, in some cases speeding things up
@@ -105,7 +105,7 @@ system.time(m0 <- a %*% b)
105
105
106
106
<pre class =" output " >
107
107
user system elapsed
108
- 0.221 0.089 0.310
108
+ 0.230 0.091 0.322
109
109
</pre >
110
110
111
111
@@ -118,7 +118,7 @@ system.time(m1 <- mult_sp_sp_to_sp(a, b))
118
118
119
119
<pre class =" output " >
120
120
user system elapsed
121
- 0.368 0.063 0.431
121
+ 0.407 0.036 0.442
122
122
</pre >
123
123
124
124
@@ -131,7 +131,7 @@ system.time(m2 <- mult_sp_den_to_sp(a, b_den))
131
131
132
132
<pre class =" output " >
133
133
user system elapsed
134
- 15.511 0.089 15.600
134
+ 1.081 0.100 1.181
135
135
</pre >
136
136
137
137
@@ -144,7 +144,7 @@ system.time(m3 <- mult_den_sp_to_sp(a_den, b))
144
144
145
145
<pre class =" output " >
146
146
user system elapsed
147
- 0.910 0.092 1.002
147
+ 0.826 0.087 0.913
148
148
</pre >
149
149
150
150
@@ -195,7 +195,7 @@ system.time(m4 <- mult_sp_den_to_sp2(a, b_den))
195
195
196
196
<pre class =" output " >
197
197
user system elapsed
198
- 0.442 0.040 0.483
198
+ 0.401 0.047 0.448
199
199
</pre >
200
200
201
201
@@ -256,7 +256,7 @@ system.time({
256
256
257
257
<pre class =" output " >
258
258
user system elapsed
259
- 1.873 0.004 1.878
259
+ 1.708 0.000 1.707
260
260
</pre >
261
261
262
262
For a large matrix, this takes a not-insignificant amount of time, even on a fast machine. To speed
@@ -304,7 +304,7 @@ system.time({
304
304
305
305
<pre class =" output " >
306
306
user system elapsed
307
- 0.818 0.000 0.817
307
+ 0.766 0.000 0.766
308
308
</pre >
309
309
310
310
The time taken has come down by quite a substantial margin. This reflects the ease of obtaining
@@ -353,7 +353,7 @@ system.time(print(sum_by_row(a)))
353
353
354
354
<pre class =" output " >
355
355
user system elapsed
356
- 0.926 0.000 0.926
356
+ 0.933 0.000 0.935
357
357
</pre >
358
358
359
359
This is again a large improvement. But what if we do the same with column slicing?
@@ -391,7 +391,7 @@ system.time(print(sum_by_col(a_t)))
391
391
392
392
<pre class =" output " >
393
393
user system elapsed
394
- 0.006 0.000 0.006
394
+ 0.005 0.000 0.006
395
395
</pre >
396
396
397
397
Now the time is less than a tenth of a second, which is faster than the original code by roughly
@@ -445,7 +445,7 @@ system.time(print(sum_by_element(a)))
445
445
446
446
<pre class =" output " >
447
447
user system elapsed
448
- 0.388 0.000 0.388
448
+ 0.176 0.000 0.176
449
449
</pre >
450
450
451
451
However, we can still do better. In Armadillo, the iterators for sparse matrix classes iterate only
@@ -502,9 +502,9 @@ microbenchmark(col=sum_by_col(a_t),
502
502
<pre class =" output " >
503
503
Unit: milliseconds
504
504
expr min lq mean median uq max neval
505
- col 5.02286 5.17710 5.34210 5.33575 5.39444 6.02304 20
506
- elem 389.33550 393.98013 402.67589 403.13064 411.42497 420.33654 20
507
- iter 1.01472 1.07613 1.19713 1.18075 1.22931 1.74711 20
505
+ col 4.78921 4.88444 5.05229 4.99184 5.18450 5.50579 20
506
+ elem 172.84830 177.20431 179.87007 179.06447 182.08075 188.11256 20
507
+ iter 1.02268 1.05447 1.12611 1.12627 1.16482 1.30800 20
508
508
</pre >
509
509
510
510
Thus, using iterators represents a greater than three-order-of-magnitude speedup over the original
0 commit comments