title | author | license | tags | summary | layout | src |
---|---|---|---|---|---|---|
Coercion of matrix to sparse matrix (dgCMatrix) and maintaining dimnames. |
Søren Højsgaard |
GPL (>= 2) |
eigen matrix sparse |
We illustrate 1) a fast way of coercing a dense matrix to a sparse matrix and 2) how to copy the dimnames from the dense to the sparse matrix. |
post |
2013-01-20-sparse-matrix-coercion.Rmd |
Consider the following matrix
{% highlight r %} nr <- nc <- 6 set.seed <- 123 m <- matrix(sample(c(rep(0,9), 1),nr*nc, replace=T), nrow=nr, ncol=nc) sum(m)/length(m) {% endhighlight %}
[1] 0.1667
{% highlight r %} dimnames(m) <- list(letters[1:nr], letters[1:nc]) m {% endhighlight %}
a b c d e f a 0 0 0 0 0 1 b 0 0 0 1 0 1 c 0 0 0 0 0 0 d 0 0 0 0 0 0 e 1 1 0 0 0 0 f 0 0 0 1 0 0
This matrix can be coerced to a sparse matrix with
{% highlight r %} library("Matrix") {% endhighlight %}
Loading required package: methods
{% highlight r %} M1 <- as(m, "dgCMatrix") M1 {% endhighlight %}
6 x 6 sparse Matrix of class "dgCMatrix" a b c d e f a . . . . . 1 b . . . 1 . 1 c . . . . . . d . . . . . . e 1 1 . . . . f . . . 1 . .
{% highlight r %} str(M1) {% endhighlight %}
Formal class 'dgCMatrix' [package "Matrix"] with 6 slots ..@ i : int [1:6] 4 4 1 5 0 1 ..@ p : int [1:7] 0 1 2 2 4 4 6 ..@ Dim : int [1:2] 6 6 ..@ Dimnames:List of 2 .. ..$ : chr [1:6] "a" "b" "c" "d" ... .. ..$ : chr [1:6] "a" "b" "c" "d" ... ..@ x : num [1:6] 1 1 1 1 1 1 ..@ factors : list()
Using Eigen via RcppEigen we can obtain the coercion as:
{% highlight cpp %} // [[Rcpp::depends(RcppEigen)]]
#include <RcppEigen.h> #include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
SEXP asdgCMatrix_( SEXP XX_ ){
typedef Eigen::SparseMatrix SpMat;
typedef Eigen::MapEigen::MatrixXd MapMatd; // Input: must be double
MapMatd X(Rcpp::as(XX_));
SpMat Xsparse = X.sparseView(); // Output: sparse matrix
S4 Xout(wrap(Xsparse)); // Output: as S4 object
NumericMatrix Xin(XX_); // Copy dimnames
Xout.slot("Dimnames") = clone(List(Xin.attr("dimnames")));
return(Xout);
}
{% endhighlight %}
{% highlight r %} (M2 <- asdgCMatrix_(m * 1.0)) {% endhighlight %}
6 x 6 sparse Matrix of class "dgCMatrix" a b c d e f a . . . . . 1 b . . . 1 . 1 c . . . . . . d . . . . . . e 1 1 . . . . f . . . 1 . .
{% highlight r %} str(M2) {% endhighlight %}
Formal class 'dgCMatrix' [package "Matrix"] with 6 slots ..@ i : int [1:6] 4 4 1 5 0 1 ..@ p : int [1:7] 0 1 2 2 4 4 6 ..@ Dim : int [1:2] 6 6 ..@ Dimnames:List of 2 .. ..$ : chr [1:6] "a" "b" "c" "d" ... .. ..$ : chr [1:6] "a" "b" "c" "d" ... ..@ x : num [1:6] 1 1 1 1 1 1 ..@ factors : list()
{% highlight r %} identical(M1, M2) {% endhighlight %}
[1] TRUE
Compare the performance:
{% highlight r %} cols <- c("test", "replications", "elapsed", "relative", "user.self", "sys.self") rbenchmark::benchmark(asdgCMatrix_(m * 1.0), as(m, "dgCMatrix"), columns=cols, order="relative", replications=1000) {% endhighlight %}
test replications elapsed relative user.self sys.self 1 asdgCMatrix_(m * 1) 1000 0.028 1.00 0.028 0.000 2 as(m, "dgCMatrix") 1000 0.287 10.25 0.284 0.004
For larger matrices the difference in performance gain is smaller:
{% highlight r %}
nr <- nc <- 100 set.seed <- 123 m <- matrix(sample(c(rep(0,9), 1),nr*nc, replace=T), nrow=nr, ncol=nc) rbenchmark::benchmark(asdgCMatrix_(m * 1.0), as(m, "dgCMatrix"), columns=cols, order="relative", replications=1000) {% endhighlight %}
test replications elapsed relative user.self sys.self 1 asdgCMatrix_(m * 1) 1000 0.133 1.000 0.132 0.000 2 as(m, "dgCMatrix") 1000 0.359 2.699 0.356 0.004
{% highlight r %}
nr <- nc <- 1000 set.seed <- 123 m <- matrix(sample(c(rep(0,9), 1),nr*nc, replace=T), nrow=nr, ncol=nc) rbenchmark::benchmark(asdgCMatrix_(m * 1.0), as(m, "dgCMatrix"), columns=cols, order="relative", replications=100) {% endhighlight %}
test replications elapsed relative user.self sys.self 1 asdgCMatrix_(m * 1) 100 1.193 1.00 1.184 0.004 2 as(m, "dgCMatrix") 100 2.303 1.93 2.092 0.204
{% highlight r %}
nr <- nc <- 3000 set.seed <- 123 m <- matrix(sample(c(rep(0,9), 1),nr*nc, replace=T), nrow=nr, ncol=nc) rbenchmark::benchmark(asdgCMatrix_(m * 1.0), as(m, "dgCMatrix"), columns=cols, order="relative", replications=100) {% endhighlight %}
test replications elapsed relative user.self sys.self 1 asdgCMatrix_(m * 1) 100 8.868 1.000 5.82 3.004 2 as(m, "dgCMatrix") 100 23.441 2.643 18.70 4.636
Thanks to Doug Bates for illustrating to me how set the dimnames attribute.