-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathissue123.jmd
145 lines (119 loc) · 6.96 KB
/
issue123.jmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
---
title: "Comments on issue #123"
author: "Douglas Bates"
date: "2018-10-02"
---
[The issue](https://github.com/dmbates/MixedModels.jl/issues/123) concerns missing linear algebra methods for some of the matrix types in the package.
Sadly I haven't addressed the issue earlier and it is now quite old. Things have changed with the release of
```{julia; term=true}
using BlockArrays, DataFrames, InteractiveUtils, LinearAlgebra, MixedModels, Random
versioninfo()
```
and the most recent `MixedModels` release (v1.1.0 is the latest as I write this).
A function to generate an example showing the problem is given in the issue but it uses simulated data without setting the random number seed.
I modified this function to take an `AbstractRNG` as the first argument, as is common now in simulation functions.
```{julia; term=true}
function simulatemodel(rng::AbstractRNG, n::Int, ng::Int)
df = DataFrame(Y = randn(rng, n), X = rand(rng, n), G = rand(rng, 1:ng, n), H = rand(rng, 1:ng, n))
f = @formula(Y ~ 1 + X + (1 + X|H) + (1|G)) # simple model with random slopes and intercepts
m = LinearMixedModel(f, df)
end
```
## Internal representation of a LinearMixedModel
A couple of things about the internal representation of the model in the `LinearMixedModel` type.
The `trms` field is a vector of matrix-like objects constructed from the terms in the model formula.
The random-effects terms are first, ordered by the number of random effects.
These are followed by the model matrix for the fixed-effects and the response, represented as a matrix with only 1 column.
```{julia;term=true}
m1 = simulatemodel(MersenneTwister(1234321), 10_000, 100);
show(typeof.(m1.trms))
show(size.(m1.trms))
```
### The `A` and Λ fields
The `A` field in the `LinearMixedModel` type is a blocked matrix corresponding to `M'M`, where `M` is the horizonal concatenation of these terms.
The size of `M` would be `(10000,303)` so `A` is of size `(303,303)` but divided into blocks corresponding to the terms.
```{julia;term=true}
nblocks(m1.A)
blocksizes(m1.A)
```
`A` is symmetric and sparse. Only the lower triangle is stored explicitly.
```{julia; term=true}
Symmetric(m1.A, :L)
```
There is another matrix, `Λ`, stored implicitly in the random-effects terms and depending on the parameter vector `θ`.
It is block-diagonal with the same block sizes as `A`.
All the off-diagonal blocks are zero.
The rightmost two diagonal blocks, corresponding to the fixed-effects and the response, are always the identity.
The diagonal blocks for the random-effects terms are either a multiple of the identity, for a scalar random-effects term like `(1|G)`, or repetitions of small, lower-triangular matrix, `λ`, for vector-valued random-effects terms like `(1+X | H)`.
The size of `λ` for a vector-valued random effects term is the dimension of the random effects for each level of the grouping factor.
In this case `(1+X | H)` generates a two-dimensional random effect for each of the 100 levels of `H`.
```{julia; term=true}
show(m1.θ)
m1.λ[1]
m1.λ[2]
```
Optimization of the log-likelihood is with respect to the `θ` parameters, as can be seen in the verbose output.
```{julia; term=true}
fit!(m1, true)
show(m1.θ)
m1.λ[1]
m1.λ[2]
```
### The `L` field
The `L` field in a `LinearMixedModel` is the lower-triangular Cholesky factor of
\begin{equation}
\begin{bmatrix}
\Lambda^\prime\mathbf{Z}^\prime\mathbf{Z}\Lambda + \mathbf{I} & \Lambda^\prime\mathbf{Z}^\prime\mathbf{X} & \Lambda^\prime\mathbf{Z}^\prime\mathbf{y} \\
\mathbf{X}^\prime\mathbf{Z}\Lambda & \mathbf{X}^\prime\mathbf{X} & \mathbf{X}^\prime\mathbf{y} \\
\mathbf{y}^\prime\mathbf{Z}\Lambda & \mathbf{y}^\prime\mathbf{X} & \mathbf{y}^\prime\mathbf{y}
\end{bmatrix}
\end{equation}
in the same block pattern as `A`.
```{julia; term=true}
m1.L
```
An inelegant but fast summary of the block sizes and types of the `A` and `L` fields is available as
```{julia; term=true}
describeblocks(m1)
```
For each of the blocks in the lower triangle, the type and size of the block in `A` is listed followed by the type of the `L` block.
The `(1,1)` block is always the biggest block and hence the most important in preserving sparsity.
It will be `Diagonal` if the term with the most random effects is a scalar random-effects term.
For a vector-valued random-effects term, as in this example, it is `UniformBlockDiagonal` which means that it is block diagonal with $k$ diagonal blocks of size $\ell\times\ell$ where $k$ is the number of levels of the grouping factor and $\ell$ is the dimension of the random effects associated with each level.
In this example $k=100$ and $\ell=2$.
Although the `(1,1)` block of `L` always has the same structure as that of `A`, the `(2,2)` block can be subject to "fill-in".
Here the `(2,2)` block of `A` is `Diagonal` but the `(2,2)` block of `L` has non-zero off-diagonal elements and is stored as a full, dense matrix.
Furthermore, the `(2,1)` block is created as a sparse matrix but may not have a high proportion of zeros, in which case it is converted to a dense matrix as has been done here.
The "break-even" point where the complexity of sparse matrix operations is offset by storing and computing with only the non-zeros, as opposed to all the elements of a dense matrix even when most of them as zero, is surprisingly high, especially when using multi-threaded accelerated BLAS (Basic Linear Algebra Subroutines) such as [MKL](https://software.intel.com/en-us/mkl).
### Updating `L`.
Once the model representation has been created, evaluation of the profiled log-likelihood requires only installation of a new value of `θ` and updating of `L`.
The `updateL!` function is now defined (in `src/pls.jl`) as
```{julia;eval=false}
function updateL!(m::LinearMixedModel{T}) where T
trms = m.trms
A = m.A
Ldat = m.L.data
nblk = nblocks(A, 2)
for j in 1:nblk
Ljj = scaleInflate!(Ldat[Block(j, j)], A[Block(j, j)], trms[j])
LjjH = isa(Ljj, Diagonal) ? Ljj : Hermitian(Ljj, :L)
for jj in 1:(j - 1)
rankUpdate!(-one(T), Ldat[Block(j, jj)], LjjH)
end
cholUnblocked!(Ljj, Val{:L})
for i in (j + 1):nblk
Lij = Λc_mul_B!(trms[i], A_mul_Λ!(copyto!(Ldat[Block(i, j)], A[Block(i, j)]), trms[j]))
for jj in 1:(j - 1)
αβA_mul_Bc!(-one(T), Ldat[Block(i, jj)], Ldat[Block(j, jj)], one(T), Lij)
end
rdiv!(Lij, isa(Ljj, Diagonal) ? Ljj : LowerTriangular(Ljj)')
end
end
m
end
```
It relies on having appropriate methods for `scaleInflate!`, `rankUpdate!`, `cholUnblocked!`, `Λc_mul_B!`, `A_mul_Λ!`, `αβA_mul_Bc!` and `rdiv!`.
All of the methods act in-place, as indicated by the `!` at the end of the name.
Making sure that all of the available methods are available is a bit tricky.
I keep accumulating examples but I don't have an exhaustive collection by any means so there may be situations that I have missed.
I appreciate it when users bring attention to a (reproducible, if possible) example that fails because then I know what I need to add.