R
-
extra steps in phyloseq
workflow incomplete 🚧
Following from work assembling the
metagenomic data, and importing the
microbial community into an R
-friendly format, we might
want to use the handy phyloseq
package for R
to look at things.
It’s hard to overstate how powerful phyloseq
can be - it
standardises your work, makes sure things are sensible, has great
documentation, and shows you how to start ecological analysis of your
data.
It was linked in the last page, but the teams behind
DADA2
and phyloseq
put together an incredible
showcase paper of what can be done: do check it out.
These commands are simply ripped straight from the excellent phyloseq tutorial section, as this package covers many of the important first steps.
library("phyloseq"); packageVersion("phyloseq")
theme_set(theme_bw())
# basic visuals
plot_bar(physeq, fill = "Family")
plot_heatmap(physeq1)
plot_heatmap(physeq1, taxa.label="Phylum")
plot_richness(myData, x="BODY_SITE", color="Description")
OTUnames10 = names(sort(taxa_sums(GP), TRUE)[1:10])
GP10 = prune_taxa(OTUnames10, GP)
mGP10 = prune_taxa(OTUnames10, mergedGP)
ocean_samples = sample_names(subset(sample_data(GP), SampleType=="Ocean"))
print(ocean_samples)
rowSums(otu_table(GP10)[, ocean_samples])
plot_richness(GP, "human", "SampleType", title="unmerged")
# basic subsets
rank_names(GlobalPatterns)
sample_variables(GlobalPatterns)
otu_table(GlobalPatterns)[1:5, 1:5]
tax_table(GlobalPatterns)[1:5, 1:4]
taxa_names(GlobalPatterns)[1:10]
# name the ten largest
myTaxa = names(sort(taxa_sums(GlobalPatterns), decreasing = TRUE)[1:10])
# choose jsut those ten
ex1 = prune_taxa(myTaxa, GlobalPatterns)
# make %
GPr = transform_sample_counts(GlobalPatterns, function(x) x / sum(x) )
# filter out those below 10-5 (0.00001) %
GPfr = filter_taxa(GPr, function(x) mean(x) > 1e-5, TRUE)
# keep only the family Chlamydiae
GP.chl = subset_taxa(GlobalPatterns, Phylum=="Chlamydiae")
# from those samples, only keep samples with at least 20 counts
GP.chl = prune_samples(sample_sums(GP.chl)>=20, GP.chl)
# combine aka merge the first 5 Chlamydiae taxa
GP.chl.merged = merge_taxa(GP.chl, taxa_names(GP.chl)[1:5])
# summarise all taxa to the family level
gpsfbg = tax_glom(gpsfb, "Family")