workflow incomplete 🚧


Introduction

Following from work assembling the metagenomic data, and importing the microbial community into an R-friendly format, we might want to use the handy phyloseq package for R to look at things.

It’s hard to overstate how powerful phyloseq can be - it standardises your work, makes sure things are sensible, has great documentation, and shows you how to start ecological analysis of your data.

It was linked in the last page, but the teams behind DADA2 and phyloseq put together an incredible showcase paper of what can be done: do check it out.


Phyloseq Steps

These commands are simply ripped straight from the excellent phyloseq tutorial section, as this package covers many of the important first steps.

library("phyloseq"); packageVersion("phyloseq")

theme_set(theme_bw())

# basic visuals

plot_bar(physeq, fill = "Family")

plot_heatmap(physeq1)

plot_heatmap(physeq1, taxa.label="Phylum")

plot_richness(myData, x="BODY_SITE", color="Description")

OTUnames10 = names(sort(taxa_sums(GP), TRUE)[1:10])
GP10  = prune_taxa(OTUnames10,  GP)
mGP10 = prune_taxa(OTUnames10, mergedGP)
ocean_samples = sample_names(subset(sample_data(GP), SampleType=="Ocean"))
print(ocean_samples)

rowSums(otu_table(GP10)[, ocean_samples])

plot_richness(GP, "human", "SampleType", title="unmerged")


# basic subsets 

rank_names(GlobalPatterns)

sample_variables(GlobalPatterns)

otu_table(GlobalPatterns)[1:5, 1:5]

tax_table(GlobalPatterns)[1:5, 1:4]

taxa_names(GlobalPatterns)[1:10]

# name the ten largest
myTaxa = names(sort(taxa_sums(GlobalPatterns), decreasing = TRUE)[1:10])

# choose jsut those ten
ex1 = prune_taxa(myTaxa, GlobalPatterns)

# make %
GPr  = transform_sample_counts(GlobalPatterns, function(x) x / sum(x) )

# filter out those below 10-5 (0.00001) %
GPfr = filter_taxa(GPr, function(x) mean(x) > 1e-5, TRUE)

# keep only the family Chlamydiae
GP.chl = subset_taxa(GlobalPatterns, Phylum=="Chlamydiae")
# from those samples, only keep samples with at least 20 counts
GP.chl = prune_samples(sample_sums(GP.chl)>=20, GP.chl)

# combine aka merge the first 5 Chlamydiae taxa
GP.chl.merged = merge_taxa(GP.chl, taxa_names(GP.chl)[1:5])

# summarise all taxa to the family level
gpsfbg = tax_glom(gpsfb, "Family")