| Literature DB >> 29374196 |
David Roffman1, Gregory Hart1, Michael Girardi2, Christine J Ko2, Jun Deng3.
Abstract
Ultraviolet radiation (UVR) exposure and family history are major associated risk factors for the development of non-melanoma skin cancer (NMSC). The objective of this study was to develop and validate a multi-parameterized artificial neural network based on available personal health information for early detection of NMSC with high sensitivity and specificity, even in the absence of known UVR exposure and family history. The 1997-2015 NHIS adult survey data used to train and validate our neural network (NN) comprised of 2,056 NMSC and 460,574 non-cancer cases. We extracted 13 parameters for our NN: gender, age, BMI, diabetic status, smoking status, emphysema, asthma, race, Hispanic ethnicity, hypertension, heart diseases, vigorous exercise habits, and history of stroke. This study yielded an area under the ROC curve of 0.81 and 0.81 for training and validation, respectively. Our results (training sensitivity 88.5% and specificity 62.2%, validation sensitivity 86.2% and specificity 62.7%) were comparable to a previous study of basal and squamous cell carcinoma prediction that also included UVR exposure and family history information. These results indicate that our NN is robust enough to make predictions, suggesting that we have identified novel associations and potential predictive parameters of NMSC.Entities:
Mesh:
Year: 2018 PMID: 29374196 PMCID: PMC5786038 DOI: 10.1038/s41598-018-19907-9
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
The demographics of the NHIS dataset that was used in our ANN.
| Demographics of the Data | NMSC Cancer | Non-Cancer |
|---|---|---|
| Average Age | 62.37 [61.81, 62.93] | 46.12 [46.02, 46.22] |
| Average BMI | 27.05 [26.80, 27.30] | 27.30 [27.28, 27.32] |
| Male/Female | 47.89%/52.11% | 45.11%/54.89% |
| Ever Smoked | 49.64% [47.32%, 52.00%] | 41.95% [41.78%, 42.12%] |
| Have Emphysema | 3.710% [2.82%, 4.59%] | 1.527% [1.48%, 1.57%] |
| Have Asthma | 12.21% [10.67%, 13.73%] | 11.04% [10.93%, 11.15%] |
| Have Diabetes Mellitus | 11.49% [10.02%, 13.01%] | 7.826% [7.73%, 7.92%] |
| Have Ever Had a Stroke | 4.828% [3.84%, 5.85%] | 2.507% [2.45%, 2.56%] |
| Have Hypertension | 47.45% [45.10%, 49.77%] | 27.38% [27.23%, 27.53%] |
| Average Heart Disease Score | 0.097 [0.0877, 0.106] | 0.040 [0.0395, 0.0405] |
| White | 97.89% [97.22%, 98.56%] | 77.23% [77.09%, 77.37%] |
| African-American | 0.479% [0.14%, 0.77%] | 15.48% [15.36%, 15.60%] |
| Native American/Alaska Native | 0.040% [0%, 0.17%] | 0.851% [0.82%, 0.88%] |
| Asian | 0.599% [0.26%, 1.00%] | 4.925% [4.85%, 5.00%] |
| Multiracial | 0.998% [0.26%, 1.00%] | 1.514% [1.47%, 1.56%] |
| Hispanic Ethnicity | 1.756% [1.15%, 2.38%] | 16.96% [16.83%, 17.09%] |
| Average Number of Times Vigorous Exercise is done at Least Once per Week | 1.511 [1.388, 1.634] | 1.597 [1.587, 1.607] |
95% confidence intervals are shown in brackets. Percentages are Wald statistic[24] and raw numbers are Z statistic.
Figure 1A Schematic of the ANN. Each line is weight connecting one layer to next, with each circle representing an input, neuron, or output. The bias terms are analogous to intercepts and improve the model’s performance.
A description of the personal health parameters used in the ANN.
| Parameter | Input Type | Input Range | Details |
|---|---|---|---|
| Age | Continuous | 0.2118–1 | Age range is 18–85, with 85+ being treated as 85. |
| BMI | Continuous | 0–1 | BMI above 99.95 is treated as 99.95. |
| Ever Smoker | Binary | 0 or 1 | Never-smokers are 0 and current and former smokers are 1. |
| Emphysema | Binary | 0 or 1 | No COPD is 0 and COPD is 1. |
| Asthma | Binary | 0 or 1 | No asthma is 0 and asthma is 1. |
| Diabetic Status | Binary | 0 or 1 | Non-diabetics and pre-diabetics are 0, with diabetics being 1. |
| Strokes | Binary | 0 or 1 | No strokes is 0 and having a prior stroke is 1. |
| Hypertension | Binary | 0 or 1 | No recording of hypertension is 0, and having single measurement of it is 1. |
| Heart Disease Score | Continuous | 0–1 | Coronary heart disease, angina, heart attacks, and other heart complications each contribute 0.25 to the score. |
| Race | Continuous | 0.0083–1 | Each race is assigned a value equal to its fractional percentage in the sample plus the fractional percentage of each less common race being added to the race of interest. |
| Hispanic Ethnicity | Binary | 0 or 1 | No Hispanic ethnicity is 0 and having Hispanic ethnicity is 1. |
| Vigorous Exercise | Continuous | 0–1 | Number of times per week vigorous exercise is performed, with 28+ being treated as 28. All years criteria was 20 minutes or more, with the exception of the 2015 which was 10 minutes. |
| Gender | Binary | 0 or 1 | 0 is a man and 1 is a woman. |
Figure 2The sensitivity and specificity for the training and validation datasets as functions of the cutoff values.
Figure 3An ROC plot for our ANN’s training and validation datasets.
Figure 4The non-cancerous (blue and white strip/dash) and cancerous (solid orange) people in each risk bin (histograms) and the cumulative distribution functions above a certain risk level (lines).
Figure 5Cancerous (orange) and non-cancerous (blue) people have very different high (solid) and low (dashed) risk trends. Assuming a 1% miss classification rate in the low and high risk categories (black line), we can divide individual cancer risk into 3 categories shown by the shading: high (red), medium (yellow), and low (green).
Comparison of risk stratification results between NHIS 1997–2015 data and NHIS 2016 data.
| # Respondents | # Low Risk | % Low Risk | # Medium Risk | % Medium Risk | # High Risk | % High Risk | ||
|---|---|---|---|---|---|---|---|---|
| Training 1997–2015 | Cancer | 1,754 | 24 | 1.37% | 1625 | 92.7% | 105 | 6.00% |
| Non-Cancer | 32,2402 | 83,279 | 25.8% | 235,542 | 73.1% | 3,581 | 1.11% | |
| Validation 1997–2015 | Cancer | 752 | 8 | 1.06% | 709 | 94.3% | 35 | 4.65% |
| Non-Cancer | 138,172 | 35,976 | 26.0% | 100,813 | 73.0% | 1,381 | 1.00% | |
| 2016 | Cancer | 214 | 3 | 1.40% | 203 | 94.9% | 8 | 3.74% |
| Non-Cancer | 27,844 | 5,643 | 20.3% | 21,753 | 78.1% | 448 | 1.61% |