| Literature DB >> 31676772 |
Li Song1,2,3, Sarven Sabunciyan4, Guangyu Yang1,2, Liliana Florea5,6,7.
Abstract
Transcript assembly from RNA-seq reads is a critical step in gene expression and subsequent functional analyses. Here we present PsiCLASS, an accurate and efficient transcript assembler based on an approach that simultaneously analyzes multiple RNA-seq samples. PsiCLASS combines mixture statistical models for exonic feature selection across multiple samples with splice graph based dynamic programming algorithms and a weighted voting scheme for transcript selection. PsiCLASS achieves significantly better sensitivity-precision tradeoff, and renders precision up to 2-3 fold higher than the StringTie system and Scallop plus TACO, the two best current approaches. PsiCLASS is efficient and scalable, assembling 667 GEUVADIS samples in 9 h, and has robust accuracy with large numbers of samples.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31676772 PMCID: PMC6825223 DOI: 10.1038/s41467-019-12990-0
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1Performance evaluation of methods on simulated and real data: a 25 simulated RNA-seq sets, all genes; b 25 simulated sets, genes grouped by abundance; c 25 GEUVADIS samples (polyadenylated RNA); d 73 liver RNA-seq samples (rRNA-depleted total RNA); e 44 hippocampus samples from healthy and epileptic mice; and f subsets of 1, 2, 3, 5, 10, 20, 40, 80, 160, and 320, and the full set of 667 GEUVADIS samples. In a, b–e, sensitivity (recall) and precision values for PsiCLASS, StringTie, and Scallop at the level of individual samples are shown in boxed plots, and meta-annotations resulted from aggregation (with PsiCLASS voting, ST-merge, and TACO) are shown with colored shapes. Boxplots were generated with R using default options, namely, the box extends between the lower and upper quartiles, horizontal lines mark the median, and whiskers are at the 1.5 × interquantile point above the upper quartile and bellow the lower quartile. Lastly, additional symbols mark sensitivity and precision values for PsiCLASS when tuned to match or approach the sensitivity of its competitors, along the range of cutoff values 0–16, shown with dotted red lines. Source data are provided as a Source Data file
Performance of methods on experiments with small numbers of samples on simulated data
| Number of samples | Recall PsiCLASS | Recall StringTie | Recall Scallop | Precision PsiCLASS | Precision StringTie | Precision Scallop |
|---|---|---|---|---|---|---|
| 1 | 45.6 | 41.7 | 46.2 | 66.9 | 70.8 | 62.9 |
| 2 | 46.7 | 43.8 | 43.3 | 68.6 | 69.1 | 57.0 |
| 3 | 49.9 | 45.2 | 45.0 | 67.3 | 68.5 | 56.7 |
| 5 | 52.1 | 46.6 | 46.2 | 69.7 | 66.7 | 56.7 |
| 10 | 52.7 | 48.0 | 48.1 | 70.0 | 64.0 | 56.5 |
Performance of methods on experiments with small numbers of samples on real data
| Sample | Recall PsiCLASSa | Recall PsiCLASSb | Recall StringTie | Recall Scallop | Precision PsiCLASSa | Precision PsiCLASSb | Precision StringTie | Precision Scallop |
|---|---|---|---|---|---|---|---|---|
| SRR534319 | 44.2 | 56.7 | 42.0 | 65.7 | 29.5 | 30.1 | 30.5 | 24.5 |
| SRR545695 | 50.4 | 60.3 | 49.4 | 72.7 | 32.5 | 32.4 | 35.7 | 26.6 |
| SRR534291 | 63.3 | 66.1 | 65.5 | 81.6 | 30.8 | 35.0 | 37.1 | 35.1 |
aSingle samples in one-sample experiments
bIndividual samples in multi-sample experiments