| Literature DB >> 31437221 |
Bradley J Nartowt1, Gregory R Hart1, David A Roffman2, Xavier Llor3, Issa Ali1, Wazir Muhammad1, Ying Liang1, Jun Deng1.
Abstract
Colorectal cancer (CRC) is third in prevalence and mortality among all cancers in the US. Currently, the United States Preventative Services Task Force (USPSTF) recommends anyone ages 50-75 and/or with a family history to be screened for CRC. To improve screening specificity and sensitivity, we have built an artificial neural network (ANN) trained on 12 to 14 categories of personal health data from the National Health Interview Survey (NHIS). Years 1997-2016 of the NHIS contain 583,770 respondents who had never received a diagnosis of any cancer and 1409 who had received a diagnosis of CRC within 4 years of taking the survey. The trained ANN has sensitivity of 0.57 ± 0.03, specificity of 0.89 ± 0.02, positive predictive value of 0.0075 ± 0.0003, negative predictive value of 0.999 ± 0.001, and concordance of 0.80 ± 0.05 per the guidelines of Transparent Reporting of Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) level 2a, comparable to current risk-scoring methods. To demonstrate clinical applicability, both USPSTF guidelines and the trained ANN are used to stratify respondents to the 2017 NHIS into low-, medium- and high-risk categories (TRIPOD levels 4 and 2b, respectively). The number of CRC respondents misclassified as low risk is decreased from 35% by screening guidelines to 5% by ANN (in 60 cases). The number of non-CRC respondents misclassified as high risk is decreased from 53% by screening guidelines to 6% by ANN (in 25,457 cases). Our results demonstrate a robustly-tested method of stratifying CRC risk that is non-invasive, cost-effective, and easy to implement publicly.Entities:
Mesh:
Year: 2019 PMID: 31437221 PMCID: PMC6705772 DOI: 10.1371/journal.pone.0221421
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Schematic of example ANN.
A schematic of an ANN with four layers and a logistic activation function. The ANN in this paper has one input neuron for each of the 12 to 14 factors, the same number of neurons in each hidden layer, and a single output neuron for the prediction. The upper arrow indicates forward-propagation and the lower arrow indicates back-propagation.
Fig 2ROC curves of the ANN for ten-fold cross-testing dataset (TRIPOD 2a).
The ANN trained with the factors marked “default model” in Table 2 with (blue line) and without hypertension (purple line). The ANN was also trained on a reduced dataset that included family history with (green line) and without (red line) hypertension. Error bars denote the standard deviation of the TPR and FPR across ten folds of stratified cross-testing (TRIPOD level 2a).
Number of NHIS respondents in final dataset when certain factors are chosen.
| Ever screened? | Family history | Age (years) | Hyper- | Training | Training & CRC | Testing | Testing& CRC |
|---|---|---|---|---|---|---|---|
| Unused | Unused | 18–85 | Used | 525,394◊ | 1,269◊ | 58,376◊ | 140◊ |
| Unused | Used | 18–85 | Used | 105,950 | 245 | 11,772 | 27 |
| Unused | Used | 18–85 | Unused | 105,760 | 245 | 11,723 | 27 |
| Unused | Unused | 18–85 | Unused | 525,394 | 1,269 | 58,376 | 140 |
| Unused | Unused | 18–49 | Used | 298,085 | 162 | 33,120 | 18 |
| Unused | Unused | 50–75 | Used | 227,310 | 1,107 | 25,255 | 122 |
| Used | Unused | 18–85 | Used | 10,261 | 72 | 217,774 | 446 |
| Used | Used | 18–85 | Used | 66,938 | 300 | 161,098 | 218 |
| Used | Used | 18–85 | Unused | 9,002 | 59 | 219,176 | 459 |
| Used | Unused | 18–85 | Unused | 13,755 | 71 | 212,995 | 443 |
† Data appearing in NHIS years 2000, 2005, 2010, and 2015 only when a set of supplementary questions were asked.
◊ Data in the default model.
All factors in the NHIS datasets used to train the ANN in scoring CRC risk, in descending order of correlation magnitude.
| Name of Factor | Correlation with Recent CRC, ×10−2 | Type of Factor | # of Unique Values of Factor | Time of Incidence, Frequency, or Duration |
|---|---|---|---|---|
| Current or Cancer Age | +4.907 | Continuous | 68 | Permanent |
| Hypertension | +3.045 | Ordinal | 2 | Ever |
| Number of first-degree relatives with CRC (NHIS years 2000, 2005, 2010, and 2015 only) | +2.906 | Ordinal | 4 | Permanent |
| Coronary heart disease | +2.349 | Ordinal | 2 | Ever |
| Pooled heart conditions | +2.063 | Ordinal | 2 | Ever |
| Myocardial infarction | +2.060 | Ordinal | 2 | Ever |
| Diabetes (non-gestational) | +2.056 | Ordinal | 3 | Ever |
| Heart condition/disease | +1.972 | Ordinal | 2 | Ever |
| Vigorous exercise frequency | -1.971 | Continuous | 33 | Per week |
| Angina pectoris | +1.769 | Ordinal | 2 | Ever |
| Ulcer (stomach, duodenal, peptic) | +1.540 | Ordinal | 2 | Ever |
| Hispanic ethnicity | -1.269 | Categorical | 2 | Permanent |
| Stroke | +1.218 | Ordinal | 2 | Ever |
| Emphysema | +1.220 | Ordinal | 2 | Ever |
| American Indian, African American, other, or multiple race | -0.494 | Categorical | 2 | Permanent |
| Sex (male) | -0.350 | Categorical | 2 | Permanent |
| Body-mass index | +0.234 | Continuous | 4223 | Current |
| Smoking frequency | +0.0461 | Ordinal | 4 | Current |
† Denotes factors that are part of the model referred to as “default” throughout this paper.
Fig 3Cross-testing ROC curves of ANN for data non-randomly split between screened and non-screened NHIS respondents (TRIPOD 2b).
Using the reduced dataset the ANN was cross-tested between the group of survey respondents (NHIS years 2000, 2005, 2010, and 2015 only) screened for CRC by colonoscopy/sigmoidoscopy and the remaining year group.
Fig 4Cross-testing ROC curves of ANN for age groups formed by USPSTF screening guidelines.
The ANN is trained by and tested upon 3 datasets: ages 18–49, ages 50–75, and all ages for the full dataset. Error bars denote the standard deviation of the TPR and FPR across ten-fold stratified cross-testing (TRIPOD level 2a).
Fig 5Diagnostic performance of the ANN for the random testing dataset (TRIPOD level 2a).
Positive predictive value PPV and false omission rate were parametrically plotted in analogy to Fig 2.
Fig 6Risk stratification into three categories.
The 2017 NHIS respondents are stratified by the ANN into three categories for CRC risk: green (low risk), yellow (medium risk), and red (high risk).
Comparison of ANN risk-scoring with USPSTF screening guidelines on 2017 NHIS dataset for 3-category risk-score stratification.
| # Respondents | # Low Score | % Low Score | # Medium Score | % Medium Score | # High Score | % High Score | |
|---|---|---|---|---|---|---|---|
| 60 | 3 | 5% | 52 | 87% | 5 | 8% | |
| 25,457 | 2,932 | 12% | 20,998 | 82% | 1,527 | 6% | |
| 60 | 21 | 35% | n/a | n/a | 39 | 65% | |
| 25,457 | 11,845 | 47% | n/a | n/a | 13,612 | 53% | |
Comparison of ANN to conventional screening methods.
| Screening method | Sensitivity, Specificity and/or PPV | Advantages | Disadvantages |
|---|---|---|---|
| Artificial neural network (ANN) trained with NHIS data years 1997–2016 tested on ten random splits | ● Sensitivity ~ of 0.57 ± 0.03 | ● Better performance w/more training data | ● Low PPV |
| Guaiac or immunoassay fecal occult blood test (gFOBT or iFOBT) | ● Sensitivity ~ 0.9 | ● No pre-test colon-cleansing | ● Low PPV |
| ● Fecal immunochemical test (FIT) | (1) For FIT: | ● No pre-test colon-cleansing | ● Adenoma insensitivity |
| Methylated SEPT9 gene test | ● Sensitivity ~ 0.6 at Stage I. | ● No pre-test colon-cleansing | ● Moderately expensive |
| Flexible sigmoidoscopy | ● Sensitivity ~ 0.6 | ● Able to perform biopsy/polypectomy | ● Only rectum, lower-colon |
| Virtual colonoscopy | ● Sensitivity ~ 0.6 | ● Noninvasive | ● Colon-cleansing |