Skip to content

Commit 2a13fb9

Browse files
committed
Added method description from Soleymani et al. (2017) to the rmarkdown
1 parent d17a678 commit 2a13fb9

6 files changed

+11051
-372
lines changed

DWT-based segmentation.Rmd

+392
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,392 @@
1+
---
2+
title: 'DWT-based segmentation on trajectories of turkey vultures'
3+
author: "Merlin Unterfinger"
4+
date: "`r Sys.Date()`"
5+
output:
6+
html_document:
7+
code_folding: hide
8+
fig_caption: yes
9+
highlight: tango
10+
number_sections: no
11+
theme: journal
12+
toc: yes
13+
toc_depth: 2
14+
toc_float: yes
15+
biblio-style: "apalike"
16+
link-citations: true
17+
filter: pandoc-eqnos
18+
bibliography: References.bib
19+
abstract: "This script implements the procedure from Soleymani et al. (2017) for applying DWT-based segmentation on trajectories of turkey vultures in R. While the original code is written in MatLab, this is a direct portation of the MatLab code to R. Some features of MatLab are not available in R, therefore workarounds and implementations have been done, which might lead to slightly different results."
20+
---
21+
22+
<style>
23+
body {
24+
text-align: justify}
25+
</style>
26+
27+
28+
```{r setup, include=FALSE}
29+
knitr::opts_chunk$set(echo = TRUE, fig.show = 'hold')
30+
rm(list = ls())
31+
cat("\014")
32+
```
33+
34+
# Background
35+
Method description and flowchart by @Soleymani:
36+
37+
"In Step 1, the movement parameter profile (i.e. time series of the chosen movement parameter; speed in our example) is calculated from the raw movement data, acting as the input for the DWT transform. In Step 2, the movement signal is decomposed by the DWT. This transformation yields two outputs: low frequency approximation sub- bands (3a) and high frequency detail sub-bands (3b). These two resulting sets of sub-bands will be used for different purposes in the next stage and therefore the procedure bifurcates into two different branches, along which different analyses will be performed. Peak analysis will be performed on approximation sub-bands (Step 4a) in order to obtain behavioural segments (Step 5a), while thresholding of detail sub-bands (Step 4b) will be used to detect behavioural change points (Step 5b)."
38+
39+
```{r, out.width = '50%', fig.align='center', echo = FALSE}
40+
knitr::include_graphics("figs/flowchart.png")
41+
```
42+
43+
# Data set
44+
"We use four GPS tracks of turkey vultures (C. aura) from the interior North America population to illustrate the segmentation method on real data. As shown in Fig. 3, the migration path extends from Canada to South America across central regions of North America. These birds show several states during their annual migrations: (i) breeding areas in North America, (ii) outbound migration in the Fall from breeding areas to wintering grounds, (iii) tropical wintering grounds in South America, and (iv) return migration to breeding grounds in the Spring. The data were manually classified into the above-mentioned behavioural states (four segments) by domain experts as discussed in @Dodge" [@Soleymani].
45+
[Download link data](http://dx.doi.org/10.5441/001/1.46ft1k05)
46+
47+
```{r, warning = FALSE, message = FALSE}
48+
# Reading the input file (speed and annotations)
49+
rawData <- readr::read_csv("data/Turkey-Vulture-Steamhouse 2.csv")
50+
nanIdx <- which(is.na(rawData$computedSpeed))
51+
speed <- rawData$computedSpeed
52+
states <- rawData$http...vocabulary.movebank.org.extended.raptor.workshop.attributes.migration.state
53+
speed <- speed[-nanIdx]
54+
states <- states[-nanIdx]
55+
statesCh <- numeric(length(speed)) + 1
56+
57+
# Check for empty strings
58+
for (i in 2:length(states))
59+
{
60+
if (states[i] == '') {
61+
states[i]=states[i-1]
62+
}
63+
statesCh[i] = states[i] == states[i-1]
64+
}
65+
statesChange=which(statesCh==0)
66+
statesLen = c(statesChange[1]-1, diff(statesChange),
67+
length(states)-tail(statesChange,1))
68+
tt = seq(1, length(speed))
69+
70+
# Plotting step length profile versus annotations
71+
par(mfrow = c(2, 1), mar=c(0.1,4,0.8,2)+0.1,
72+
oma=c(5,0,3,0)+0.1, xaxt='n')
73+
plot(speed,
74+
type = "l",
75+
xlim=c(1, length(speed)+2000),
76+
col = "blue", ylab = 'Speed(km/h)',
77+
main = "Profile of speed over time")
78+
plot(speed,
79+
type = "p", pch=16,
80+
xlim=c(1, length(speed)+2000),
81+
col = as.factor(states),
82+
ylab = 'Speed(km/h)',
83+
main = "Annotation data")
84+
legend("topright", legend=levels(as.factor(states)),
85+
pch=16, pt.cex=1, cex = 0.5, col=unique(as.factor(states)))
86+
```
87+
88+
# Discrete wavlet transform
89+
"From the raw trajectory a movement parameter profile (i.e. speed in this case) is computed, serving as the input to the DWT analysis ... . The DWT is flexible regarding the input movement signal and different movement metrics can be used for the analysis (e.g. acceleration, turning angles). Based on biological domain knowledge, movement speed seems a reasonable choice to distinguish between migratory versus non-migratory phases and therefore was selected here as the movement signal. In the next step, the movement signal is decomposed through the discrete wavelet transform. To do so, a mother wavelet function needs to be chosen. The family of Daubechies are a well-known class of wavelet function and the order 4 (db4) was chosen according to similar application of DWT in the literature. Other mother wavelet functions are available (e.g. Haar, Morlet, Mexican hat) and their application depends on the shape of the response one is interested in [@boggess]. For instance, certain wavelet functions are able to resolve small, abrupt discontinuities, whereas others may better capture linear changes in the movement. In general, we recommended to use db4 for the application of our method. The outputs of the DWT decomposition are two sets of approximation and detail sub-bands ... ." [@Soleymani].
90+
91+
"For both the approximation and the detail sub-bands, the selection of
92+
appropriate decomposition levels is done by visual inspection of the sub-bands ... ." [@Soleymani].
93+
```{r}
94+
# Specify levels of DWT
95+
L <- 10
96+
97+
# Discrete wavelet transform
98+
sig <- speed
99+
DWT <- wavelets::modwt(sig, filter="d4", n.levels = L)
100+
a <- DWT@V # approximation sub-bands
101+
d <- DWT@W # detail sub-bands
102+
103+
# Plot approximation and detail sub-bands
104+
par(mfrow=c(L+1,1), mar=c(0,4,0,2)+0.1,
105+
oma=c(5,0,3,0)+0.1, xaxt='n')
106+
plot(sig, type = "l", col = "red",ylab = "s")
107+
for (i in 1:L) {
108+
plot(a[[i]], type = "l",
109+
col = "blue",
110+
ylab = paste("a", i, sep=""))
111+
}
112+
title("Signal and approximation sub-bands", outer=TRUE)
113+
plot(sig, type = "l", col = "red",ylab = "s")
114+
for (i in 1:L) {
115+
plot(d[[i]], type = "l",
116+
col = rgb(0.2, 0.6, 0.2),
117+
ylab = paste("d", i, sep=""))
118+
}
119+
title("Signal and detail sub-bands", outer=TRUE)
120+
```
121+
122+
# Behavioural segmentation using the approximation sub-bands
123+
## Peak analysis
124+
Visual inspection of the different approximation sub-bands shows that Level 8 can be an appropriate level in this case because peaks are adequately distinguished from the rest of the signal indicating periodic patterns. (...) at the initial levels of decomposition, the localization effects are not strong enough yet to suggest differences between migration phases, which also make it impossible to apply any peak analysis. However, the localization effect is becoming stronger with increasing levels, which can be clearly seen in the resulting peaks versus flat regions at the Levels 7, 8 and 9. Therefore, the goal is to reach a level where segments related to the same behaviour are expressed in peaks with similar height and widths. Accordingly, level 8 was finally chosen because distinct phases became apparent as the level where peak analysis can potentially distinguish between different segments. (...) The user is recommended to carefully inspect the variation of the resulting peaks and choose the level, as well as peak height, where he/she thinks that it would work best to distinguish among different phases in the data" [@Soleymani].
125+
126+
For this step the function `findpeaks()` has to be loaded. The function is a modification a function from the `pracma` package to get the same outputs as the original `findpeaks()` function from MatLab returns.
127+
128+
```{r}
129+
# Load function 'findpeaks()'
130+
source("R/findpeaks.R")
131+
132+
# Finding the extremum values: in this case it is maxima
133+
minHeight <- 3.5
134+
minDist <- 1000
135+
aBandNumber <- 8
136+
par(mfrow=c(1,1), mar=c(0,4,0,2)+0.1, oma=c(5,0,3,0)+0.1, xaxt='s')
137+
pks <- findpeaks(c(a[[aBandNumber]]), minpeakheight = minHeight, minpeakdistance = minDist, plot = TRUE)
138+
Pks <- pks[,1]
139+
PksLoc <- pks[,2]
140+
PksWid <- pks[,3]
141+
PksPro <- pks[,4]
142+
```
143+
144+
## Segmentation based on the extents of peaks
145+
" ... the peaks related to migratory seasons are well distinguished from the rest of the signal and the combination of the height and the width of the peaks will help to identify the segments." [@Soleymani].
146+
147+
```{r}
148+
PksPoints <- matrix(NA, nrow(pks), 2)
149+
for (i in 1:nrow(pks)) {
150+
widR = round(PksWid[i])
151+
slot = (PksLoc[i]-widR):(PksLoc[i]+widR)
152+
if (tail(slot,1) > length(a[[aBandNumber]])){
153+
fend = which(slot == length(a[[aBandNumber]]))
154+
slot = slot[1:fend]
155+
}
156+
val = Pks[i] - PksPro[i]/2 # the value of half of the height of the peak
157+
z = a[[aBandNumber]][slot]
158+
dist = sort(abs(a[[aBandNumber]][slot] - val))
159+
l1 = which(abs(z-val) == dist[1])
160+
l2 = which(abs(z-val) == dist[2])
161+
for (j in 2:length(dist)) {
162+
l22 = which(abs(z-val) == dist[j])
163+
if (sign(slot[l1] - PksLoc[i]) != sign(slot[l22] - PksLoc[i])) {
164+
l2 = l22
165+
break
166+
}
167+
}
168+
PksPoints[i,] = sort(c(slot[l1], slot[l2]))
169+
}
170+
seg = PksPoints[1,]
171+
for (i in 2:nrow(PksPoints)) {
172+
seg = c(seg, PksPoints[i,])
173+
}
174+
fseg = c(1, seg, length(a[[aBandNumber]]))
175+
segments <- numeric(length(a[[aBandNumber]]))
176+
for (i in 1:length(a[[aBandNumber]])) {
177+
for (j in 2:length(fseg)) {
178+
segments[fseg[j-1]:fseg[j]] = j-1;
179+
}
180+
}
181+
```
182+
183+
## Results
184+
```{r}
185+
par(mfrow = c(4, 1),mar = c(1, 4, 1, 2) + 0.1,
186+
oma = c(5, 0, 3, 0) + 0.1, xaxt = 'n')
187+
188+
# Ploting the original signal
189+
plot(sig,
190+
type = "l", col = "blue",
191+
xlim = c(1, length(speed) + 2000),
192+
ylab = 'Speed(km/h)',
193+
main = "a) Speed profile of Steamhouse 2")
194+
195+
# Approximation sub-band and the detected peaks
196+
plot(a[[aBandNumber]],
197+
type = "l", col = "blue",
198+
xlim = c(1, length(speed) + 2000),
199+
ylab = 'Approximation sub-band',
200+
main = paste("b) Detected peaks at approximation sub-band of level",
201+
aBandNumber))
202+
points(pks[, 2], pks[, 1] + 0.25, pch = 25, bg = "blue")
203+
204+
205+
# Results of segmentation based on the extents (widths) of peaks
206+
palette(c("red", "green3", "blue", "cyan"))
207+
plot(speed,
208+
type = "p", pch = 16,
209+
xlim = c(1, length(speed) + 2000),
210+
col = as.factor(segments),
211+
ylab = 'Speed(km/h)',
212+
main = "c) Segmentation results based on DWT")
213+
legend("topright",
214+
legend = levels(as.factor(segments)),
215+
pch = 16, pt.cex = 1, cex = 0.5,
216+
col = unique(as.factor(segments)))
217+
218+
# Annotations
219+
plot(speed,
220+
type = "p", pch = 16,
221+
xlim = c(1, length(speed) + 2000),
222+
col = as.factor(states),
223+
xlab = 'Years', ylab = 'Speed(km/h)',
224+
main = "d) Annotated data")
225+
legend("topright", legend = levels(as.factor(states)),
226+
pch = 16, pt.cex = 1, cex = 0.5,
227+
col = unique(as.factor(states)))
228+
```
229+
230+
# Change point analysis using the detail sub-bands
231+
## Choose detail sub-band
232+
"Since the detail sub-bands contain the high-frequency information, the analysis based on these sub-bands focuses on the detection of fine-scale changes in movement. Same as with the approximation, first the relevant level for the detail sub-bands needs to be selected. This is also done by visual inspection of the detail sub-bands and further informed by previous knowledge regarding at which level biologically-driven changes may occur. (...) at Level 5 one can start to observe the different regimes of values for migratory and non-migratory phases, as well as some finer-scale changes within these phases. Therefore level 5 is selected for further analysis" [@Soleymani].
233+
```{r}
234+
dBandNumber <- 5
235+
det <- d[[dBandNumber]]
236+
par(mfrow=c(1,1), mar=c(1,4,1,2)+0.1,
237+
oma=c(5,0,3,0)+0.1, xaxt='s')
238+
plot(det,
239+
type = "l",
240+
col = rgb(0.2, 0.6, 0.2),
241+
ylab = 'Approximation sub-band',
242+
main = paste("Detail sub-band", dBandNumber))
243+
```
244+
245+
## Initial segments based on thresholding
246+
"The next step is to identify change points by setting a threshold for the difference between the wavelet coefficients ... . The value of this threshold also depends on the data used. For example in this case, the detail coefficients in the non-migratory season are around zero at the selected Level 5, whereas the values in the migratory seasons are much higher. Therefore a value of 1.0 was selected to identify the change points, that is, every point whose coefficient value is 1.0 units higher than the value of its preceding point" [@Soleymani].
247+
248+
```{r}
249+
# Labeling based on thresholding neighboring fixes passing the threshold
250+
segL <- c(det[1,1], 1)
251+
id <- 1
252+
f <- as.numeric(diff(abs(det[,1])) > 1)
253+
f <- c(0, f)
254+
for (i in (2:length(det))) {
255+
if (f[i] == 0) {
256+
segLb <- cbind(det[i,1], id)
257+
} else if (f[i] == 1 && f[i] != f[i-1]) {
258+
segLb <- cbind(det[i,1], id+1)
259+
id <- id+1
260+
} else {
261+
segLb <- cbind(det[i,1], id)
262+
}
263+
segL <- rbind(segL, segLb)
264+
}
265+
```
266+
267+
## Segmentation based on stitching the detail sub-segments
268+
"An optional concatenation step to obtain fine scale behavioural segments can be applied. This makes it possible to better match the results to the expected duration of the patterns that are sought (i.e. migratory patterns). In our case, all the segments with a length shorter than 500 fixes (i.e. corresponding to ~20 days) were concatenated to their preceding segment" [@Soleymani].
269+
270+
```{r}
271+
# Detecting frequency breaks
272+
frqdif <- which(diff(abs(det))>1)
273+
frqdif <- c(frqdif, length(det))
274+
275+
# Detecting the initial segments based on thresholding
276+
seg <- list()
277+
seg[[1]] <- det[1:frqdif[1]]
278+
for (j in 2:length(frqdif)) {
279+
if (frqdif[j]-frqdif[j-1]==1) {
280+
seg[[j]] <- det[frqdif[j]]
281+
} else {
282+
seg[[j]] <- det[(frqdif[j-1]+1):frqdif[j]]
283+
}
284+
}
285+
286+
# Removing short segments (i.e. <500 fixes)
287+
seglen <- sapply(seg, length)
288+
segLongInd <- which(seglen>500)
289+
290+
# Concatenating the segments
291+
segs1 <- list()
292+
segs2 <- list()
293+
for (p in 1:(length(segLongInd)-1)) {
294+
sumLen1 <- seglen[segLongInd[p]]
295+
ids1 <- p*(numeric(sumLen1)+1)
296+
segs1[[p]] <- cbind(seg[[segLongInd[p]]], ids1)
297+
sumLen2 <- seglen[segLongInd[p]+1]
298+
segsBTW <- seg[[(segLongInd[p]+1)]]
299+
from <- (segLongInd[p]+2)
300+
to <- (segLongInd[p+1]-1)
301+
if (from <= to) {
302+
for (q in from:to) {
303+
sumLen2 <- sumLen2+seglen[q]
304+
segsBTW <- c(segsBTW, seg[[q]])
305+
}
306+
}
307+
ids2 <- (p+50)*(numeric(sumLen2)+1)
308+
segs2[[p]] <- cbind(segsBTW, ids2)
309+
if (length(segs2[[p]])<100) {
310+
ids2=p*((numeric(sumLen2))+1)
311+
segs1[[p]] <- rbind(segs1[[p]], cbind(segsBTW, ids2))
312+
segs2[[p]] <- numeric(0)
313+
}
314+
}
315+
segsEnd <- seg[[segLongInd[p+1]]]
316+
sumLenEnd <- seglen[segLongInd[p+1]]
317+
from <- (segLongInd[p+1]+1)
318+
to <- (length(seg))
319+
if (from <= to) {
320+
for (k in from:to) {
321+
sumLenEnd <- sumLenEnd+seglen[k]
322+
segsEnd <- c(segsEnd, seg[[k]])
323+
}
324+
}
325+
idsEnd <- (p+1)*(numeric(sumLenEnd)+1)
326+
segs1[[p+1]] <- cbind(segsEnd, idsEnd)
327+
segs2[[p+1]] <- numeric(0)
328+
segments <- rbind(segs1[[1]], segs2[[1]])
329+
for (m in 2:length(segs1)) {
330+
segNext <- rbind(segs1[[m]], segs2[[m]])
331+
segments <- rbind(segments, segNext)
332+
}
333+
segments <- cbind(segments, numeric(nrow(segments)))
334+
segments[1,3] <- 1
335+
for (i in 2:nrow(segments)) {
336+
segID <- segments[i-1,3]
337+
if (segments[i,2] == segments[i-1,2]) {
338+
segments[i,3] <- segID
339+
} else {
340+
segments[i,3] <- segments[i-1,3] + 1
341+
segID <- segID + 1
342+
}
343+
}
344+
```
345+
346+
## Results
347+
Same as for the approximation sub-bands, the user is recommended to carefully inspect the variation of the resulting detail sub-bands and choose the level where he/she hypothesizes that different phases in the data are consistent with biological knowledge of the system. Setting the threshold for identifying change points is also dependent on the extent to which finer-scale changes are to be recovered. The concatenation step is totally optional, with the length of segments for concatenation be specified by considering a minimum duration for the patterns of interest. That is in our case, a migration period cannot be shorter than 20 days. Therefore 500 fixes (indicating 20 days) were considered as the minimum length where all the migratory patterns as well as some additional segments were identified" [@Soleymani].
348+
349+
```{r}
350+
# Figure 6
351+
par(mfrow=c(4,1), mar=c(1,4,1,2)+0.1,
352+
oma=c(5,0,3,0)+0.1, xaxt='n')
353+
354+
# Overlaying detail sub-band over the states
355+
plot(tt, det,
356+
type = "p", pch=16,
357+
xlim=c(1, length(speed)+2000),
358+
col = as.factor(states),
359+
ylab = 'Detail sub-band',
360+
main = "a) Detail sub-bands over the annotated states")
361+
legend("topright", legend=levels(as.factor(states)),
362+
pch=16, pt.cex=1, cex = 0.5, col=unique(as.factor(states)))
363+
lines(tt, det)
364+
365+
# Plotting initial segments based on thresholding
366+
plot(tt, speed,
367+
type = "p", pch=16,
368+
xlim=c(1, length(speed)+2000),
369+
col = as.factor(segL[,2]),
370+
ylab = 'Speed(km/h)', xlab = 'Years',
371+
main = "b) Initial segments based on thresholding")
372+
373+
# Results of segmentation based stitching the detail sub-segments
374+
plot(tt, speed,
375+
type = "p", pch=16,
376+
xlim=c(1, length(speed)+2000),
377+
col = as.factor(segments[,3]),
378+
ylab = 'Speed(km/h)',
379+
main = "c) Segmentation based on stitching the detail sub-segments")
380+
381+
# Annotations
382+
plot(tt, speed,
383+
type = "p", pch=16,
384+
xlim=c(1, length(speed)+2000),
385+
col = as.factor(states),
386+
xlab = 'Years', ylab = 'Speed(km/h)',
387+
main = "d) Annotated data")
388+
legend("topright", legend=levels(as.factor(states)),
389+
pch=16, pt.cex=1, cex = 0.5, col=unique(as.factor(states)))
390+
```
391+
392+
# References

0 commit comments

Comments
 (0)