| Literature DB >> 21176179 |
Alicia Oshlack1, Mark D Robinson, Matthew D Young.
Abstract
Many methods and tools are available for preprocessing high-throughput RNA sequencing data and detecting differential expression.Entities:
Mesh:
Substances:
Year: 2010 PMID: 21176179 PMCID: PMC3046478 DOI: 10.1186/gb-2010-11-12-220
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Figure 1Overview of the RNA-seq analysis pipeline for detecting differential expression. The steps in the pipeline are in red boxes; the methodological components of the pipeline are shown in blue boxes and bold text; software examples and methods for each step (a non-exhaustive list) are shown by regular text in blue boxes. References for the tools and methods shown are listed in Table 1. First, reads are mapped to the reference genome or transcriptome (using junction libraries to map reads that cross exon boundaries); mapped reads are assembled into expression summaries (tables of counts, showing how may reads are in coding region, exon, gene or junction); the data are normalized; statistical testing of differential expression (DE) is performed, producing and a list of genes with associated P-values and fold changes. Systems biology approaches can then be used to gain biological insights from these lists.
Software methods and tools for differential expression analysis of RNA-seq
| Analysis step | Method | Implementation | References |
|---|---|---|---|
| General aligner | GMAP/GSNAP | [ | |
| BFAST | [ | ||
| BOWTIE | [ | ||
| CloudBurst | [ | ||
| GNUmap | [ | ||
| MAQ/BWA | [ | ||
| PerM | [ | ||
| RazerS | [ | ||
| Mrfast/mrsfast | [ | ||
| SOAP/SOAP2 | [ | ||
| SHRiMP | [ | ||
| QPALMA/GenomeMapper/PALMapper | [ | ||
| SpliceMap | [ | ||
| SOAPals | [ | ||
| G-Mo.R-Se | [ | ||
| TopHat | [ | ||
| SplitSeek | [ | ||
| Oases | [ | ||
| MIRA | [ | ||
| Isoform-based | Cufflinks | [ | |
| ALEXA-seq | [ | ||
| Gene-based | Count exons only | For example, [ | |
| Exon junction libraries | [ | ||
| Library size | For example, [ | ||
| RPKM | ERANGE | [ | |
| TMM | edgeR | [ | |
| Upper quartile | Myrna | [ | |
| Poisson GLM | DEGseq | [ | |
| Myrna | [ | ||
| Negative binomial | edgeR | [ | |
| DESeq | [ | ||
| baySeq | [ | ||
| Gene Ontology analysis | GOseq | [ |
Abbreviations: GLM, generalized linear model; RPKM, reads per kilobase of exon model per million mapped reads; TMM, trimmed mean of M-values.
Figure 2Summarizing mapped reads into a gene level count. (a) Mapped reads from a small region of the RNA-binding protein 39 (RBM39) gene are shown for LNCaP prostate cancer cells [90], human liver and human testis from the UCSC track. The three rows of RNA-seq data (blue and black graphs) are shown as a 'pileup track', where the y-axis at each location measures the number of mapped reads that overlap that location. Also shown are the genomic coordinates, gene model (labeled RBM39; blue boxes indicate exons) and conservation score across vertebrates. It is clear that many reads originate from regions with no known exons. (b) A schematic of a genomic region and reads that might arise from it. Reads are color-coded by the genomic feature from which they originate. Different summarization strategies will result in the inclusion or exclusion of different sets of reads in the table of counts. For example, including only reads coming from known exons will exclude the intronic reads (green) from contributing to the results. Splice junctions are listed as a separate class to emphasize both the potential ambiguity in their assignment (such as which exon should a junction read be assigned to) and the possibility that many of these reads may not be mapped because they are harder to map than continuous reads. CDS, coding sequence.