Skip to content

Latest commit

 

History

History
90 lines (67 loc) · 3.32 KB

hclust-ord-plot.md

File metadata and controls

90 lines (67 loc) · 3.32 KB

HClust and Ordination

Michael D. Lee 5/10/2019

Libraries used here

library("DESeq2")
library("dendextend")
library("phyloseq")
library("ggplot2")

Reading in data

count_tab <- read.table("example_data/sample_raw_counts.tsv", sep="\t", header=T, row.names=1)
sample_info_tab <- read.table("example_data/sample_info_tab.tsv", sep="\t", header=T, row.names=1)

Transformation

We first want to transform our table. We are going to use DESeq to do that.

deseq_counts <- DESeqDataSetFromMatrix(count_tab, colData = sample_info_tab, design = ~treatment) 
deseq_counts_vst <- varianceStabilizingTransformation(deseq_counts)

NOTE: If you get this error here with your dataset: "Error in estimateSizeFactorsForMatrix(counts(object), locfunc =locfunc, : every gene contains at least one zero, cannot compute log geometric means", that can be because the count table is too sparse with many zeroes, which is common with marker-gene surveys. In that case you'd need to use a specific function first that is equipped to deal with that. You could run: deseq_counts <- estimateSizeFactors(deseq_counts, type = "poscounts"), followed by the transformation function deseq_counts_vst <- varianceStabilizingTransformation(deseq_counts).

Pulling out transformed table.

vst_trans_count_tab <- assay(deseq_counts_vst)

Hierarchical Clustering

# and calculating our Euclidean distance matrix
euc_dist <- dist(t(vst_trans_count_tab))

euc_clust <- hclust(euc_dist, method="ward.D2")

# hclust objects like this can be plotted with the generic plot() function
# plot(euc_clust) 
# but i like to change them to dendrograms for two reasons:
# 1) it's easier to color the dendrogram plot by groups
# 2) if wanted you can rotate clusters with the rotate() 
#    function of the dendextend package

euc_dend <- as.dendrogram(euc_clust, hang=0.1)
dend_cols <- as.character(sample_info_tab$color[order.dendrogram(euc_dend)])
labels_colors(euc_dend) <- dend_cols

plot(euc_dend, ylab="VST Euc. dist.")

Ordination

Here we're going to make a PCoA (Principle Coordinates Analysis). PCoA utilizes distances, so we are going to make a phyloseq object of our transformed table which is suitable for euclidean distance like we used above.

# making our phyloseq object with transformed table
vst_count_phy <- otu_table(vst_trans_count_tab, taxa_are_rows=T)
sample_info_tab_phy <- sample_data(sample_info_tab)
vst_physeq <- phyloseq(vst_count_phy, sample_info_tab_phy)

# generating and visualizing the PCoA with phyloseq
vst_pcoa <- ordinate(vst_physeq, method="MDS", distance="euclidean")
eigen_vals <- vst_pcoa$values$Eigenvalues # allows us to scale the axes according to their magnitude of separating apart the samples

plot_ordination(vst_physeq, vst_pcoa, color="treatment") + 
  geom_point(size=1) + labs(col="treatment") + 
  geom_text(aes(label=rownames(sample_info_tab), hjust=0.5, vjust=-0.4)) + 
  coord_fixed(sqrt(eigen_vals[2]/eigen_vals[1])) + ggtitle("PCoA") + 
  scale_color_manual(values=unique(as.character(sample_info_tab$color)[order(sample_info_tab$treatment)])) + 
  theme(legend.position="none")