climber

CLI, Microbial Ecology, and R

Some basic steps in microbial ecology, focusing on the processing of 2ndGen Illumina fastq data, into either amplicon (e.g. 16S) or metagenomic (e.g. shotgun) datasets, followed by ecology-based analysis of the communities and patterns we find in that data.

Metagenomic data (i.e. shotgun)

As above, the tutorial covers the following steps:

  1. Setting up your analysis - bash and friends
  2. Checking your sequence data - FastQC & MultiQC
  3. Sequencing QC - filtering and trimming your sequences - Trimmomatic
  4. Sequencing QC - purifying your sequences - BowTie2
  5. Metagenomic Community profiling - Kraken2 & Bracken (or Kaiju if you like)

We also move through importing output from Kaiju or Kraken2+Bracken into R (bare-bones): .

This metagenomic workflow is also present in simple, no-nonsense, raw code (note there might be differences to the complete workflow above).

Amplicon data (e.g. 16S)

Forthcoming. The initial steps (setup, get data, QC) are very similar in most cases (remember to cut off your primers!), but are followed by a denoising step (DADA2) and optionally an attempt to predict the metabolic capabilities of the communities at hand (PICRUSt2).

Microbial Ecology (and R)

Still to be done. Although it’s a simply enormous topic, it is also the real magic, and we get to make pictures. Until this section is properly fleshed out, consider instead this comprehensive methods (F1000) paper from DADA2’s Callahan et al., this guide from AstroBioMike - Bioinformatics for beginners, and the steady pace of phyloseq which is an excellent on-ramp.


This guide to metagenomic analysis continues to be updated (April, 3023 April 5^th^ 3,024!). All (+/-)feedback is welcome: simply throw objects/comments directly at me, or drop us a line at the related repo.

all the best!

Jamie