Skip to content

Commit a775e0d

Browse files
committed
Updating example with intro
1 parent 4c2baff commit a775e0d

File tree

2 files changed

+149
-7
lines changed

2 files changed

+149
-7
lines changed

extended_example.Rmd

+71-3
Original file line numberDiff line numberDiff line change
@@ -7,17 +7,83 @@ output:
77
toc: TRUE
88
---
99

10+
# Introduction
11+
12+
In this extended example I go through every step to produce a relative
13+
abundance barplot that represents bacterial communities living in
14+
individual hosts.
15+
16+
Bacterial communities are everywhere, and when we characterize them
17+
it is important to describe they overall taxonomic structure as that
18+
gives us clues as to what types of functions might be performed by the
19+
community. At the same time, it is important to show the variability
20+
of these communities and thus it is useful to plot them at the lowest
21+
aggregation level possible.
22+
23+
In this example, I utilize data from a big experiment that
24+
was published [here](https://www.nature.com/articles/nature11237). In
25+
that experiment, we planted individual *Arabidopsis thaliana* plants in
26+
individual pots. The pots each had one of two types of natural soil,
27+
each from two different seasons. We planted eight different accessions in
28+
each of those soils, and plants were harvested at two developmental stages.
29+
Additionally we had unplanted soil only pots, these are the "soil" samples
30+
in the example. For each individual plant, we harvested two fractions
31+
(i.e. E & R), one which we called the Endophytic Compartment (E) and
32+
corresponds to the interior of the root after removing the outer cell wall,
33+
and another which we called Rhizosphere (R) which is the soil within 1mm of
34+
the plant root. So **E** samples contains bacteria inside the root, and **R**
35+
samples contain bacteria immediately surrounding the root.
36+
37+
Plant-bacteria interactions in the root are incredibly important because
38+
the root is both the gut and the brain of the plant. Microbes there
39+
can benefit of the products of the plant photosynthesis as sources
40+
of nutrition, and can also provide chemistry that the plant couldn't perform
41+
by itself. However, the story is more complicated because microbial competition
42+
and the plant immune system also provide a fertile evolutionary environment
43+
for antagonistic interactions.
44+
45+
Ultimately understanding and being able to manipulate plant-bacteria
46+
interactions has a lot of implications as hunger is one of the most pressing
47+
problems of humanity with 800 million people living in hunger. Agriculture
48+
is our only sustainable tool against hunger, and even though it currently
49+
employs a quarter of the World population it has not been enough to tackle
50+
this challenge.
51+
52+
# Getting ready
53+
54+
First you need to get the data. If you haven't check the
55+
[README](https://github.com/surh/scip_barplot/blob/master/README.md) file
56+
of the GitHub repository of the workshop. It is also recommended that
57+
you watch the YouTube [video](https://www.youtube.com/watch?v=siIoupAnILk)
58+
that runs through the example. Finally, you will need to install the
59+
`tidyverse` package.
60+
61+
Once you have everything you need, start an R session and load the tidyverse
62+
package:
63+
1064
```{r}
1165
library(tidyverse)
1266
```
1367

1468
# Read data
1569

70+
First read the OTU table. You may need to change the file path
71+
to wherever you downloaded the files in your machine.
72+
1673
```{r}
1774
Tab <- read_tsv("data/rhizo/otu_table.tsv")
1875
Tab
1976
```
2077

78+
The code above reads the file into a `tibble`, which is a type
79+
of `data.frame` that has some neat additional properties. You
80+
don't need to concern yourself too much with the differences.
81+
82+
The code above also produces a warning, indicating that `read_tsv`
83+
tried to guess the types of data in each column of the table. It
84+
guessed correctly but you should always specify the expected columns
85+
with the option `col_types` (use `?read_tsv` for additional details).
86+
2187

2288
```{r}
2389
Tab <- read_tsv("data/rhizo/otu_table.tsv",
@@ -26,17 +92,20 @@ Tab <- read_tsv("data/rhizo/otu_table.tsv",
2692
Tab
2793
```
2894

29-
3095
# Basic barplot
3196

3297
We need to think back to the original figure and reformat our data to have one
33-
column for the x-axis and another for the y-axis
98+
column for the x-axis and another for the y-axis. This is a requirement for
99+
`ggplot2`. We can to that with `pivot_longer`, a function of the `tidyverse`.
34100

35101
```{r}
36102
Tab %>%
37103
pivot_longer(-otu_id, names_to = "sample_id", values_to = "count")
38104
```
39105

106+
In the code above the options `samples_to` and `names_to` indicate the
107+
names of the new columns in the new tibble.
108+
40109
Lets create a smaller subset of the data to make some basic plots
41110

42111
```{r}
@@ -110,7 +179,6 @@ p1
110179

111180
We repeat the plot with some beautification
112181

113-
114182
```{r}
115183
p1 <- dat %>%
116184
ggplot(aes(x = sample_id, y = count)) +

extended_example.md

+78-4
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,10 @@
11
Plotting the distribution of taxa
22
================
33
Sur Herrera Paredes
4-
2022-05-24
4+
2022-06-06
55

6+
- [Introduction](#introduction)
7+
- [Getting ready](#getting-ready)
68
- [Read data](#read-data)
79
- [Basic barplot](#basic-barplot)
810
- [Adding sample metadata](#adding-sample-metadata)
@@ -14,6 +16,61 @@ Sur Herrera Paredes
1416
- [Extra excercises](#extra-excercises)
1517
- [Session info](#session-info)
1618

19+
# Introduction
20+
21+
In this extended example I go through every step to produce a relative
22+
abundance barplot that represents bacterial communities living in
23+
individual hosts.
24+
25+
Bacterial communities are everywhere, and when we characterize them it
26+
is important to describe they overall taxonomic structure as that gives
27+
us clues as to what types of functions might be performed by the
28+
community. At the same time, it is important to show the variability of
29+
these communities and thus it is useful to plot them at the lowest
30+
aggregation level possible.
31+
32+
In this example, I utilize data from a big experiment that was published
33+
[here](https://www.nature.com/articles/nature11237). In that experiment,
34+
we planted individual *Arabidopsis thaliana* plants in individual pots.
35+
The pots each had one of two types of natural soil, each from two
36+
different seasons. We planted eight different accessions in each of
37+
those soils, and plants were harvested at two developmental stages.
38+
Additionally we had unplanted soil only pots, these are the “soil”
39+
samples in the example. For each individual plant, we harvested two
40+
fractions (i.e. E & R), one which we called the Endophytic Compartment
41+
(E) and corresponds to the interior of the root after removing the outer
42+
cell wall, and another which we called Rhizosphere (R) which is the soil
43+
within 1mm of the plant root. So **E** samples contains bacteria inside
44+
the root, and **R** samples contain bacteria immediately surrounding the
45+
root.
46+
47+
Plant-bacteria interactions in the root are incredibly important because
48+
the root is both the gut and the brain of the plant. Microbes there can
49+
benefit of the products of the plant photosynthesis as sources of
50+
nutrition, and can also provide chemistry that the plant couldn’t
51+
perform by itself. However, the story is more complicated because
52+
microbial competition and the plant immune system also provide a fertile
53+
evolutionary environment for antagonistic interactions.
54+
55+
Ultimately understanding and being able to manipulate plant-bacteria
56+
interactions has a lot of implications as hunger is one of the most
57+
pressing problems of humanity with 800 million people living in hunger.
58+
Agriculture is our only sustainable tool against hunger, and even though
59+
it currently employs a quarter of the World population it has not been
60+
enough to tackle this challenge.
61+
62+
# Getting ready
63+
64+
First you need to get the data. If you haven’t check the
65+
[README](https://github.com/surh/scip_barplot/blob/master/README.md)
66+
file of the GitHub repository of the workshop. It is also recommended
67+
that you watch the YouTube
68+
[video](https://www.youtube.com/watch?v=siIoupAnILk) that runs through
69+
the example. Finally, you will need to install the `tidyverse` package.
70+
71+
Once you have everything you need, start an R session and load the
72+
tidyverse package:
73+
1774
``` r
1875
library(tidyverse)
1976
```
@@ -31,6 +88,9 @@ library(tidyverse)
3188

3289
# Read data
3390

91+
First read the OTU table. You may need to change the file path to
92+
wherever you downloaded the files in your machine.
93+
3494
``` r
3595
Tab <- read_tsv("data/rhizo/otu_table.tsv")
3696
```
@@ -69,6 +129,15 @@ Tab
69129
## # D416 <dbl>, D417 <dbl>, D418 <dbl>, D419 <dbl>, D420 <dbl>, D421 <dbl>,
70130
## # D422 <dbl>, D423 <dbl>, D424 <dbl>, D425 <dbl>, D426 <dbl>, D427 <dbl>, …
71131

132+
The code above reads the file into a `tibble`, which is a type of
133+
`data.frame` that has some neat additional properties. You don’t need to
134+
concern yourself too much with the differences.
135+
136+
The code above also produces a warning, indicating that `read_tsv` tried
137+
to guess the types of data in each column of the table. It guessed
138+
correctly but you should always specify the expected columns with the
139+
option `col_types` (use `?read_tsv` for additional details).
140+
72141
``` r
73142
Tab <- read_tsv("data/rhizo/otu_table.tsv",
74143
col_types = cols(otu_id = col_character(),
@@ -100,7 +169,9 @@ Tab
100169
# Basic barplot
101170

102171
We need to think back to the original figure and reformat our data to
103-
have one column for the x-axis and another for the y-axis
172+
have one column for the x-axis and another for the y-axis. This is a
173+
requirement for `ggplot2`. We can to that with `pivot_longer`, a
174+
function of the `tidyverse`.
104175

105176
``` r
106177
Tab %>%
@@ -122,6 +193,9 @@ Tab %>%
122193
## 10 OTU_14834 D196 1
123194
## # … with 8,891 more rows
124195

196+
In the code above the options `samples_to` and `names_to` indicate the
197+
names of the new columns in the new tibble.
198+
125199
Lets create a smaller subset of the data to make some basic plots
126200

127201
``` r
@@ -492,7 +566,7 @@ p1
492566
```
493567

494568
![](extended_example_files/figure-gfm/unnamed-chunk-22-1.png)<!-- -->
495-
\#\# Excercise
569+
\## Excercise
496570

497571
Use `scale_color_manual` to manually select a good set of colors for
498572
this plot
@@ -574,7 +648,7 @@ ggsave("rhizo_phylo_distribution.png", p1, width = 8, height = 4)
574648

575649
# Extra excercises
576650

577-
Look at the files at [data/hmp\_v13](data/hmp_v13) which contain much
651+
Look at the files at [data/hmp_v13](data/hmp_v13) which contain much
578652
bigger data tables generated from the Human Microbiome Project (HMP).
579653

580654
Can you make similar plots illustrating the bacterial taxonomic

0 commit comments

Comments
 (0)