| Literature DB >> 28387367 |
Rocco Piazza1, Daniele Ramazzotti2, Roberta Spinelli1, Alessandra Pirola3, Luca De Sano4, Pierangelo Ferrari3, Vera Magistroni1, Nicoletta Cordani1, Nitesh Sharma5, Carlo Gambacorti-Passerini1.
Abstract
The complicated, evolving landscape of cancer mutations poses a formidable challenge to identify cancer genes among the large lists of mutations typically generated in NGS experiments. The ability to prioritize these variants is therefore of paramount importance. To address this issue we developed OncoScore, a text-mining tool that ranks genes according to their association with cancer, based on available biomedical literature. Receiver operating characteristic curve and the area under the curve (AUC) metrics on manually curated datasets confirmed the excellent discriminating capability of OncoScore (OncoScore cut-off threshold = 21.09; AUC = 90.3%, 95% CI: 88.1-92.5%), indicating that OncoScore provides useful results in cases where an efficient prioritization of cancer-associated genes is needed.Entities:
Mesh:
Year: 2017 PMID: 28387367 PMCID: PMC5384236 DOI: 10.1038/srep46290
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1OncoScore distribution of ‘Cancer’ and ‘Non-Cancer’ gene sets.
BoxPlot (a) and frequency histogram (b) of the OncoScore distributions for non-cancer and cancer genes. (a) Each box plot is drawn between the lower and upper quartiles of the distributions with bold black line showing the median value. The OncoScore distributions of ‘Cancer’ and ‘Non-Cancer’ genes are significantly different (Mann-Whitney-Wilcoxon Test: p-value = 2.2e-16). (b) OncoScore frequency distribution plotted by equispaced breaks. (c) OncoScore and (d) Gene Ranker ranking plot of a mixed panel comprising ‘Cancer’ (*) and ‘Non-Cancer’ genes. The horizontal red lines identify the best cut-off classifier threshold models.
Figure 2(a) OncoScore prediction accuracy. ROC curve depicting the relationship between true positive rate (Sensitivity) and true negative rate (Specificity) and AUC metric on CGC and nCan genes. (b) OncoScore density score distribution of true positives and true negatives. The blue line represents the CGC and the grey one the nCan genes. The dashed red line shows the optimal Youden’s cut-off threshold.
Figure 3Boxplot reporting the OncoScore values of all the genes carrying somatic mutations in chronic phase or blast crisis chronic myeloid leukemia samples.
P-value = 0.0007 (Two-tailed Mann-Whitney test).
Figure 4Time-series OncoScore plot spanning from 1975 to 2016.
(a) Time-series plot involving a set of manually defined cancer (TP53, KRAS, NRAS, HRAS, ASXL1, IDH1, IDH2, TET2 and SETBP1) and housekeeping genes (GAPDH and GUSB). The grey boxes highlight two major scientific breakthroughs occurred during this time span. (b) Time-series plot of 10 genes randomly selected from the CGC (ARID1A, HMGA2, KIF5B, NUP214, RBM15; dashed lines) and nCan (ALMS1, DCAF17, GPD1L, WFS1, RBM10; continuous lines) dataset.