| Literature DB >> 27307617 |
Avi Srivastava1, Hirak Sarkar1, Nitish Gupta1, Rob Patro1.
Abstract
MOTIVATION: The alignment of sequencing reads to a transcriptome is a common and important step in many RNA-seq analysis tasks. When aligning RNA-seq reads directly to a transcriptome (as is common in the de novo setting or when a trusted reference annotation is available), care must be taken to report the potentially large number of multi-mapping locations per read. This can pose a substantial computational burden for existing aligners, and can considerably slow downstream analysis.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27307617 PMCID: PMC4908361 DOI: 10.1093/bioinformatics/btw277
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.The transcriptome (consisting of transcripts ) is converted into a -separated string, T, on which a suffix array, SA[T], and a hash table, h, are constructed. The mapping operation begins with a k-mer (here, k = 3) mapping to an interval in SA[T]. Given this interval and the read, MMP and NIP(MMP) are calculated as described in section 2. The search for the next hashable k-mer begins k bases before NIP(MMP)
Fig. 2.The time taken by Bowtie 2, STAR and RapMap to process the synthetic data using varying numbers of threads. RapMap processes the data substantially faster than the other tools, while providing results of comparable or better accuracy
Accuracy of aligners/mappers under different metrics
| Metric | Bowtie 2 | Kallisto | RapMap | STAR |
|---|---|---|---|---|
| Reads aligned | 47 579 567 | 44 804 857 | 47 613 536 | 44 711 604 |
| Recall | 97.41 | 91.60 | 97.49 | 91.35 |
| Precision | 98.31 | 97.72 | 98.48 | 97.02 |
| F1-score | 97.86 | 94.56 | 97.98 | 94.10 |
| FDR | 1.69 | 2.28 | 1.52 | 2.98 |
| Hits per read | 5.98 | 5.30 | 4.30 | 3.80 |
Fig. 3.Mapping agreement between subsets of Bowtie 2, STAR, Kallisto andRapMap.
Performance evaluation of different tools along with quasi-enabled sailfish (q-Sailfish) with other tools on synthetic data generated by Flux simulator and RSEM simulator
| Metric | Flux simulation | RSEM-sim simulation | ||||||
|---|---|---|---|---|---|---|---|---|
| Kallisto | RSEM | q-Sailfish | Tigar 2 | Kallisto | RSEM | q-Sailfish | Tigar 2 | |
| Proportionality corr. | 0.74 | 0.78 | 0.75 | 0.77 | 0.91 | 0.93 | 0.91 | 0.93 |
| Spearman corr. | 0.69 | 0.73 | 0.70 | 0.72 | 0.91 | 0.93 | 0.91 | 0.93 |
| TPEF | 0.77 | 0.96 | 0.60 | 0.59 | 0.53 | 0.49 | 0.53 | 0.50 |
| TPME | −0.24 | −0.37 | −0.10 | −0.09 | 0.00 | −0.01 | 0.00 | 0.00 |
| MARD | 0.36 | 0.29 | 0.31 | 0.26 | 0.29 | 0.25 | 0.29 | 0.23 |
| wMARD | 4.68 | 5.23 | 4.45 | 4.35 | 1.00 | 0.88 | 1.01 | 0.94 |
Performance of CORSET, CD-HIT and RapMap enabled clustering (R-CL) on yeast and human data
| Metric | Human | Yeast | ||||
|---|---|---|---|---|---|---|
| CORSET | CD-HIT | R-CL | CORSET | CD-HIT | R-CL | |
| precision | 0.96 | 0.96 | 0.95 | 0.36 | 0.41 | 0.36 |
| recall | 0.56 | 0.37 | 0.60 | 0.63 | 0.36 | 0.71 |
| time (min) | 957 | 268 | 8 | 23 | 5 | 2 |