| Literature DB >> 24885750 |
Su Yeon Kim1, Laurent Jacob, Terence P Speed.
Abstract
BACKGROUND: Accurate somatic mutation-calling is essential for insightful mutation analyses in cancer studies. Several mutation-callers are publicly available and more are likely to appear. Nonetheless, mutation-calling is still challenging and there is unlikely to be one established caller that systematically outperforms all others. Therefore, fully utilizing multiple callers can be a powerful way to construct a list of final calls for one's research.Entities:
Mesh:
Year: 2014 PMID: 24885750 PMCID: PMC4035752 DOI: 10.1186/1471-2105-15-154
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Venn diagram of the point mutations detected by three callers on 20 TCGA endometrial tumor-normal exome-seq pairs.
Validation results of the seven disjoint mutation sets shown in Figure 1
| All callers | 99.4 | 12 | 1,914 | 1.2 | 55.3 |
| Caller A and C only | 96.4 | 11 | 294 | 2.4 | 63.8 |
| Caller A and B only | 96.3 | 7 | 184 | 3.1 | 69.1 |
| Caller B and C only | 94.4 | 2 | 34 | 3.3 | 70.1 |
| Caller C only | 79.6 | 11 | 43 | 4.4 | 71.3 |
| Caller A only | 59.7 | 632 | 935 | 69.1 | 98.4 |
| Caller B only | 15.9 | 302 | 57 | 100 | 100 |
For each mutation set (row), the validation rate (Val. rate), the false positive (FP) and true positive (TP) counts, and the cumulative false positive (cFP) and cumulative true positive (cTP) rates in percentage, are presented. Mutation sets are ordered by the validation rate.
Figure 2Performances of individual and combined callers. Model fitting was done using a random 50% of the point mutations detected from the selected 20 patients, and evaluation was done based on the remaining half. In the main panel, the true positive and the false positive rates of various callers are shown: (1) three individual callers (red filled triangles): Caller A, Caller B, and Caller C, (2) the caller that cumulatively adds mutation sets based on the combination call status in the order of the validation rate (connected blue dots), (3) the combined caller built by fitting a logistic model (for details, see text) (green line). The area near the point showing Caller C’s performance is enlarged and shown as a small sub-panel on the lower right part of the main figure. This panel further indicates the performance of the callers that take unions or intersections of calls from three callers (brown diamonds): all callers (ABC), intersections of two callers (AB, AC, or BC), called by more than two callers (‘2orMore’).
Figure 3ROC curve of an improved Caller B built by fitting a logistic model using the mutation quality score and individual filters of Caller B. Model fitting was done using the point mutations in 243 genes of interest from 174 patients excluding the 20 patients, and evaluation was done on the point mutations in the 20 selected patients. The performances of three individual callers (red filled triangles), the combined caller that cumulatively adds mutation sets (connected blue dots), and the combined caller by fitting the logistic model (green lilne) are shown for comparison purposes. ROC curves of two updated versions of Caller B are shown. One version is obtained by ranking the mutations detected by Caller B using the mutation quality score of Caller B (violet line), and the other version by fitting a logistic model using the mutation quality score and the individual filters of Caller B on an extended set of mutations that were detected by at least one of the three callers (orange line).