| Literature DB >> 27391904 |
Karin Zimmermann1, Marcel Jentsch2, Axel Rasche3, Michael Hummel4, Ulf Leser5.
Abstract
BACKGROUND: The analysis of differential splicing (DS) is crucial for understanding physiological processes in cells and organs. In particular, aberrant transcripts are known to be involved in various diseases including cancer. A widely used technique for studying DS are exon arrays. Over the last decade a variety of algorithms for the detection of DS events from exon arrays has been developed. However, no comprehensive, comparative evaluation including sensitivity to the most important data features has been conducted so far. To this end, we created multiple data sets based on simulated data to assess strengths and weaknesses of seven published methods as well as a newly developed method, KLAS. Additionally, we evaluated all methods on two cancer data sets that comprised RT-PCR validated results.Entities:
Keywords: Alternative splicing; Differential splicing; Exon arrays; Method comparison; Parameter influence
Mesh:
Substances:
Year: 2015 PMID: 27391904 PMCID: PMC4391533 DOI: 10.1186/s12864-015-1322-x
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Differential exon expression. The second left exon in tissue A is differentially spliced. A comparison on exon level only would lead to the opposite of the desired result, as the only exon differentially spliced would gain the lowest evidence for DS.
Parameters: Values used for the different parameters tested
|
|
| ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
| |
| Samples per group |
| 15 vs. 5 | 15 vs. 15 | + | + | + | - | + | + | - | - |
| Exons per gene |
| 10 | 30 enum | - | - | - | - | - | + | - | + |
| Expression intensity |
| high | low expr | + | + | + | + | + | + | - | + |
| Percent DS samples per group |
| 60% | 100% pcnt | + | + | + | + | + | + | - | + |
The combination of 4 parameters with two possible values leads to 16 scenarios. ANOVA Results: Analysis of variance reveals the influence of parameters on accuracy. ‘+’ indicates a significant influence of the parameter on accuracy, ‘-’ means no significant influence. For computational aspects see Additional file 1: Section “Significance of parameter influence”.
Figure 2P-value based accuracy, i.e. binary AUC for all scenarios. Asterisks indicate highest values per scenario, multiple maxima are possible. Column names encode scenarios in the order expression.exons.percent.samples, thus H.10.100.5 describes the scenario with high expression, 10 exons per gene, 100 percent spliced samples in the respective group and 5 versus 15 samples per group.
Figure 3Sensitivity and specificity averaged over all scenarios.
Figure 4Accuracy computed on the RT-PCR validated results for the colon cancer data set (left) and the lung cancer data set (right).
Result summary and comparison
|
|
| ||||
|---|---|---|---|---|---|
|
|
|
|
|
| |
| ANOSVA | 7 | 5 | 5 | 5 | 4 |
| ARH | 2 | 1 | 4 | 2 | 1 |
| FIRMA | n.a. | 1 | 2 | 2 | 5 |
| KLAS | 3 | 3 | 4 | 3 | n.a. |
| MADS’ | 1 | 4 | 1 | 1 | 6 |
| MIDAS | 6 | 2 | 7 | 6 | 7 |
| Splicing Index | 4 | 8 | 8 | 8 | 2 |
| SplicingCompass | 5 | 7 | 6 | 6 | n.a. |
| PAC | 8 | 6 | 9 | 8 | 3 |
Per dataset D and method M we show the rank that M achivies on D, when all methods are sorted by accurracy, i.e., the number of truely recognized splicing events. For comparison, we also add ranks from Rasche et al. [11], which used a different data set and ranked by AUC.