-
Notifications
You must be signed in to change notification settings - Fork 0
/
exercise-sheet-8.Rmd
160 lines (92 loc) · 4.06 KB
/
exercise-sheet-8.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
```{r, include=FALSE}
source("custom_functions.R")
library(flextable)
library(officer)
```
---
title: "Exercise sheet 8: Data Driven Life Sciences"
---
---------------------------------
# Exercise 1
### 1a)
::: {.question data-latex=""}
Arrange the following terms into their correct order in the Illumina sequencing method and describe each of them briefly:
- bridge amplification
- deblocking
- library preparation
- annealing of template strands to flow cell
- fluorescence detection
:::
#### {.tabset}
##### Hide
##### Solution
::: {.answer data-latex=""}
**1. Library preparation:**
A sequencing *library* gets *prepared* from a sample by fragmenting the original DNA and adding Illumina-specific adapter sequences to both ends of the fragments. The *library* is what gets read during sequencing.
**2. Template strand annealing**
The single-stranded library fragments are used as *template strands* in the sequencing and are *annealed* to primer sequences, which are bound to the *flow cell* and are complementary to the adapter sequences of the fragments.
**3. Bridge amplification**
After complementary strands have been synthesized and the templates been washed off, the now flow cell-bound fragments are *amplified* in several cycles of so-called *bridge-amplification* to form fragment colonies, or *clusters* on the flow cell to guarantee a detectable fluorescence signal during sequencing.
**4. Fluorescence detection**
Illumina-sequencing is a form of *sequencing-by-synthesis* in which the nucleotides incorporated into the growing strand are detected via attached *fluorophores*. After the first $3$ steps, the following steps are iterated to sequence the entire read:
Modified nucleotides, containing a fluorescent group, are used to extend the strand, their blocking groups are cleaved from their 3`-OH groups.
**5. Deblocking**
*Deblocking* is the removal of the fluorophore (blocking group). It is necessary before a new round of elongation by one nucleotide can begin.
More information about this topic can be found on the [Illumina Webpage](https://www.illumina.com/science/technology/next-generation-sequencing/sequencing-technology.html).
:::
#### {-}
# Exercise 2
```{r, echo=FALSE, out.width="75%", fig.align='center'}
knitr::include_graphics("figures/sheet-9/crossword.png")
```
### 2a)
::: {.question data-latex=""}
**Solve the crossword puzzle!**
Horizontal:
- 3. Added to DNA fragments during library preparation.
- 8. Illumina way of determining the order of nucleotides in a DNA strand. (3 words)
- 9. ChIP-Seq can be used for sequencing DNA regions that are bound by these.
- 11. The alphabet of life.
- 12. Formed by bridge-amplification on Illumina flow-cells.
- 13. Flowcell surface filled with these 2 different DNA molecules.
- 15. Measure to asses the quality of the identification of nucleobases generated by automated DNA sequencing. (3 words)
Vertical:
- 1. Dideoxynucleosidetriphosphates (abbrev.)
- 2. Process of determining positions of reads on the reference genome.
- 4. Gene expression can be measured using this. (abbrev. hyph.)
- 5. The process of making many copies of a piece of DNA.
- 6. Found in pairs in DNA.
- 7. Chemical group attached to nucleotides to monitor incorporation into DNA.
- 10. File format used to store sequence information.
- 14. Breakthrough sequencing method (abbrev.)
:::
#### {.tabset}
##### Hide
##### Solution
::: {.answer data-latex=""}
```{r, echo=FALSE, out.width="75%", fig.align='center'}
knitr::include_graphics("figures/sheet-9/crossword_solved.png")
```
:::
#### {-}
# Exercise 3
#### {.tabset}
### 3a)
::: {.question data-latex=""}
You want to determine how many reads $N$ are needed to achieve a coverage depth $C$ of 20X when sequencing reads for *Escherichia coli*.
The length of the reads $L$ is 30nt and the *E. coli* genome $G$ is approximately 4.6 million bases long.
:::
#### {.tabset}
##### Hide
##### Formula
::: {.answer data-latex=""}
$$
N = \frac{C\times G}{L}
$$
:::
##### Solution
::: {.answer data-latex=""}
$$
N = \frac{20\times 4600000}{30} \approx 3066667 \text{ reads}
$$
:::