| Literature DB >> 31103852 |
Ka-Chun Wong1, Junyi Chen2, Jiao Zhang2, Jiecong Lin2, Shankai Yan2, Shxiong Zhang2, Xiangtao Li3, Cheng Liang4, Chengbin Peng5, Qiuzhen Lin6, Sam Kwong2, Jun Yu7.
Abstract
The early detection of cancers has the potential to save many lives. A recent attempt has been demonstrated successful. However, we note several critical limitations. Given the central importance and broad impact of early cancer detection, we aspire to address those limitations. We explore different supervised learning approaches for multiple cancer type detection and observe significant improvements; for instance, one of our approaches (i.e., CancerA1DE) can double the existing sensitivity from 38% to 77% for the earliest cancer detection (i.e., Stage I) at the 99% specificity level. For Stage II, it can even reach up to about 90% across multiple cancer types. In addition, CancerA1DE can also double the existing sensitivity from 30% to 70% for detecting breast cancers at the 99% specificity level. Data and model analysis are conducted to reveal the underlying reasons. A website is built at http://cancer.cs.cityu.edu.hk/.Entities:
Keywords: Algorithms; Bioinformatics; Biological Sciences; Cancer; Cancer Systems Biology
Year: 2019 PMID: 31103852 PMCID: PMC6548890 DOI: 10.1016/j.isci.2019.04.035
Source DB: PubMed Journal: iScience ISSN: 2589-0042
Feature List for Cancer Type Localization ranked by Information Gain (InfoG)
| InfoG | Input Features | Feature Description |
|---|---|---|
| 1.0389 | TGFa (pg/mL) | Circulating Transforming Growth Factor α Concentration in pg/mL |
| 0.8301 | HE4 (pg/mL) | Circulating Human Epididymis Protein 4 Concentration in pg/mL |
| 0.6135 | sFas (pg/mL) | Circulating soluble Fas Cell Surface Death Receptor Concentration in pg/mL |
| 0.5372 | Thrombospondin-2 (pg/mL) | Circulating Thrombospondin-2 Concentration in pg/mL |
| 0.5073 | AFP (pg/mL) | Circulating Alpha Fetoprotein Precursor Concentration in pg/mL |
| 0.3759 | G-CSF (pg/mL) | Circulating Granulocyte-Colony Stimulating Factor Concentration in pg/mL |
| 0.3633 | IL-6 (pg/mL) | Circulating Interleukin-6 Concentration in pg/mL |
| 0.3597 | CA-125 (U/mL) | Circulating Cancer Antigen 125 Concentration in U/mL |
| 0.2568 | Sex | Patient Gender Information (Male or Female) |
| 0.2352 | sHER2/sEGFR2/sErbB2 (pg/mL) | Circulating sHER2/sEGFR2/sErbB2 Concentration in pg/mL |
| 0.2259 | TIMP-2 (pg/mL) | Circulating Tissue Inhibitor of Metalloproteinases 2 Concentration in pg/mL |
| 0.2231 | CD44 (ng/mL) | Circulating CD44 Concentration in pg/mL |
| 0.183 | CA19-9 (U/mL) | Circulating Cancer Antigen 19-9 Concentration in U/mL |
| 0.1805 | IL-8 (pg/mL) | Circulating Interleukin-8 Concentration in pg/mL |
| 0.164 | CA 15-3 (U/mL) | Circulating Cancer Antigen 15-3 Concentration in U/mL |
| 0.1448 | HGF (pg/mL) | Circulating Hepatocyte Growth Factor Concentration in pg/mL |
| 0.1431 | OPG (ng/mL) | Circulating Osteopontin Concentration in pg/mL |
| 0.1414 | GDF15 (ng/mL) | Circulating Growth Differentiation Factor 15 Concentration in ng/mL |
| 0.1384 | Leptin (pg/mL) | Circulating Leptin Concentration in pg/mL |
| 0.1271 | Myeloperoxidase (ng/mL) | Circulating Myeloperoxidase Concentration in ng/mL |
| 0.125 | Kallikrein-6 (pg/mL) | Circulating Kallikrein-6 Concentration in pg/mL |
| 0.1173 | TIMP-1 (pg/mL) | Circulating Tissue Inhibitor of Metalloproteinases 1 Concentration in pg/mL |
| 0.1122 | Midkine (pg/mL) | Circulating Midkine Concentration in pg/mL |
| 0.1095 | Prolactin (pg/mL) | Circulating Prolactin Concentration in pg/mL |
| 0.1032 | Mesothelin (ng/mL) | Circulating Mesothelin Concentration in ng/mL |
| 0.103 | Galectin-3 (ng/mL) | Circulating Galectin-3 Concentration in ng/mL |
| 0.096 | OPN (pg/mL) | Circulating Osteopontin Concentration in pg/mL |
| 0.0956 | NSE (ng/mL) | Circulating Neuron-Specific Enolase Concentration in ng/mL |
| 0.0901 | sEGFR (pg/mL) | Circulating Soluble Epidermal Growth Factor Receptor Concentration in pg/mL |
| 0.0901 | CEA (pg/mL) | Circulating Carcinoembryonic Antigen Concentration in pg/mL |
| 0.085 | AXL (pg/mL) | Circulating AXL Receptor Tyrosine Kinase Concentration in pg/mL |
| 0.0771 | sPECAM-1 (pg/mL) | Circulating Soluble Platelet and Endothelial Cell Adhesion Molecule 1 Concentration in pg/mL |
| 0.0637 | SHBG (nM) | Circulating Sex Hormone-Binding Globulin Concentration in nM |
| 0.0635 | OmegaScore | Omega Score for Mutations in Circulating Cell-Free DNA [ |
| 0 | Angiopoietin-2 (pg/mL) | Circulating Angiopoietin-2 Concentration in pg/mL |
| 0 | DKK1 (ng/mL) | Circulating Dickkopf WNT Signaling Pathway Inhibitor 1 Concentration in ng/mL |
| 0 | CYFRA 21-1 (pg/mL) | Circulating Cytokeratin-19 Fragment Concentration in pg/mL |
| 0 | PAR (pg/mL) | Circulating Protease-Activated Receptor Concentration in pg/mL |
| 0 | Endoglin (pg/mL) | Circulating Endoglin Concentration in pg/mL |
| 0 | FGF2 (pg/mL) | Circulating Fibroblast Growth Factor 2 Concentration in pg/mL |
| 0 | Follistatin (pg/mL) | Circulating Follistatin Concentration in pg/mL |
Figure 1Receiver Operating Characteristic (ROC) curves for Cancer Detection
Different methods have different colors and line styles. The curves are generated under 10-fold cross-validations. The vertical black line on the right panel is drawn at the 99% specificity level.
(A) Full Scale ROC Curves.
(B) ROC Curves Zoomed to FPR<=0.1.
Figure 2Proportion of Detected Cancers with Different Stages at the 99% Specificity Level
Each color represents a method, and the horizontal axis has been ordered by cancer stages. Each bar represents the median sensitivity of each method on each cancer stage with standard errors.
Figure 3Detected Proportions of Different Cancer Types at the 99% Specificity Level
Different colors represent different methods. The horizontal axis is ordered by cancer types. Each bar represents the sensitivity of each method on each cancer type with 95% confidence intervals.
Figure 4Localized Proportions of Different Cancer Types using the Top One Prediction Approach
Different colors represent different methods. The horizontal axis is ordered by cancer types. Each bar represents the sensitivity of each method on each cancer type with 95% confidence intervals.
Figure 5Feature Importance Heatmap for Cancer Type Localization under One-Class-versus-Others Setting
The feature rankings are measured based on the Learning Vector Quantization (LVQ) building under Python caret package (Bischl et al., 2016). Ten-fold cross-validations are run to compute the feature importance values. After that, the function “heatmap.2” in R language is adopted with the default setting to cluster and visualize the feature importance values. Further details can be found in Figure S14.