| Literature DB >> 31443703 |
Nathan Wan1, David Weinberg1, Tzu-Yu Liu1, Katherine Niehaus1, Eric A Ariazi1, Daniel Delubac1, Ajay Kannan1, Brandon White1, Mitch Bailey1, Marvin Bertin1, Nathan Boley1, Derek Bowen1, James Cregg1, Adam M Drake1, Riley Ennis1, Signe Fransen1, Erik Gafni1, Loren Hansen1, Yaping Liu1, Gabriel L Otte1, Jennifer Pecson1, Brandon Rice1, Gabriel E Sanderson1, Aarushi Sharma1, John St John1, Catherina Tang1, Abraham Tzou1, Leilani Young1, Girish Putcha2, Imran S Haque1.
Abstract
BACKGROUND: Blood-based methods using cell-free DNA (cfDNA) are under development as an alternative to existing screening tests. However, early-stage detection of cancer using tumor-derived cfDNA has proven challenging because of the small proportion of cfDNA derived from tumor tissue in early-stage disease. A machine learning approach to discover signatures in cfDNA, potentially reflective of both tumor and non-tumor contributions, may represent a promising direction for the early detection of cancer.Entities:
Keywords: Cell-free DNA; Colorectal cancer; Early-stage cancer; Screening; Whole-genome sequencing
Mesh:
Substances:
Year: 2019 PMID: 31443703 PMCID: PMC6708173 DOI: 10.1186/s12885-019-6003-8
Source DB: PubMed Journal: BMC Cancer ISSN: 1471-2407 Impact factor: 4.430
Clinical characteristics and demographics of CRC patients and non-cancer controls
| CRC N = 546 | Control N = 271 | Total Samples N = 817 | ||
|---|---|---|---|---|
| Gender | Female N (%) | 264 (48%) | 182 (67%) | 446 (55%) |
| Male N (%) | 282 (52%) | 84 (31%) | 366 (45%) | |
| Unknown | 0 | 5 (2%) | 5 (< 1%) | |
| Stage | I | 172 (32%) | N/A | N/A |
| II | 266 (49%) | |||
| III | 98 (18%) | |||
| IV | 6 (1%) | |||
| Unknown | 4 (< 1%) | |||
| Age (yrs) | Median (IQR) | 71 (63–80) | 60 (53–67) | 68 (59–77) |
Fig. 1Model training overview and CV procedures. a All methods were trained on k-fold, and the best performing method was chosen to train models for the other cross-validation procedures. Diagram describes individual steps in common to all methods. Models are trained on a given dataset and set of methods (i.e., dimension reduction and classification) and then evaluated, resulting in a performance estimate. b Illustration of CV procedures for k-fold, k-batch, ordered k-batch, and balanced k-batch. Each square represents a single sample, with the fill color indicating class label, the border color representing a confounding factor like institution, and the number indicating processing batch. Each column represents a possible fold constructed for the given CV procedure. The dashed line separates the test set of samples held out from the training set
Performance Evaluation of Known Confounders
| Confounder | k-fold CV AUC (95% CI) | k-fold CV Sensitivity at 85% Specificity (95% CI) | Confounder CV method | Confounder CV AUC (95% CI) |
|---|---|---|---|---|
| Age | 0.71 (0.64–0.77) | 44% (29–57%) | Binned-age | 0.50 (0.50–0.50) |
| Batch | 0.72 (0.69–0.75) | 43% (31–53%) | k-batch | 0.50 (0.50–0.50) |
| Processing Date | 0.69 (0.64–0.74) | 38% (25–49%) | Ordered k-batch | 0.48 (0.43–0.52) |
| Institution | 0.87 (0.84–0.90) | 74% (72–77%) | Balanced k-batch | 0.51 (0.28–0.74) |
Performance evaluation of known confounders alone to predict cancer with either k-fold or the CV procedure designed to control for the confounder. Confidence intervals are calculated from bootstrapped distributions of the metric across folds
CRC Performance by Validation Procedure
| Validation | Mean AUC (95% CI) | Mean Sensitivity at 85% Specificity (95% CI) |
|---|---|---|
| k-fold | 0.92 (0.91–0.93) | 85% (83–86%) |
| Binned-age | 0.91 (0.89–0.94) | 79% (73–87%) |
| k-batch | 0.91 (0.88–0.94) | 85% (80–89%) |
| Ordered k-batch | 0.90 (0.83–0.94) | 73% (53–88%) |
| Balanced k-batch | 0.83 (0.79–0.86) | 71% (63–76%) |
CRC performance by cross-validation procedure in 50–84 year-old patients. Confidence intervals are calculated from bootstrapped distributions of the metric across folds
Fig. 2Colorectal cancer classification performance (ROC curves) by each cross-validation method. Average of all folds drawn in solid blue; random chance is represented as dashed red; ROCs for each fold drawn behind. a k-fold, b binned age, c k-batch, d ordered k-batch, and e balanced k-batch
Fig. 3Classification performance for colorectal cancer within the IU age range across all validation methods. N is number of samples, [cancer, controls]. The average of all folds is represented by the colored bars; the 95% bootstrap confidence intervals are represented by the solid black lines. a Sensitivity at 85% nominal specificity by CRC stage across all CV procedures. b AUC by age bins across all CV procedures. c AUC by gender across all CV procedures. d AUC by an IchorCNA-based estimated TF across all CV procedures