| Literature DB >> 35544468 |
Kirsten Voorhies1, Ruofan Bie2, John E Hokanson3, Scott T Weiss4, Ann Chen Wu1,5, Julian Hecker4, Georg Hahn2, Dawn L Demeo4, Edwin Silverman4, Michael H Cho4, Christoph Lange2, Sharon M Lutz1,2.
Abstract
To increase power and minimize bias in statistical analyses, quantitative outcomes are often adjusted for precision and confounding variables using standard regression approaches. The outcome is modeled as a linear function of the precision variables and confounders; however, for many complex phenotypes, the assumptions of the linear regression models are not always met. As an alternative, we used neural networks for the modeling of complex phenotypes and covariate adjustments. We compared the prediction accuracy of the neural network models to that of classical approaches based on linear regression. Using data from the UK Biobank, COPDGene study, and Childhood Asthma Management Program (CAMP), we examined the features of neural networks in this context and compared them with traditional regression approaches for prediction of three outcomes: forced expiratory volume in one second (FEV1), age at smoking cessation, and log transformation of age at smoking cessation (due to age at smoking cessation being right-skewed). We used mean squared error to compare neural network and regression models, and found the models performed similarly unless the observed distribution of the phenotype was skewed, in which case the neural network had smaller mean squared error. Our results suggest neural network models have an advantage over standard regression approaches when the phenotypic distribution is skewed. However, when the distribution is not skewed, the approaches performed similarly. Our findings are relevant to studies that analyze phenotypes that are skewed by nature or where the phenotype of interest is skewed as a result of the ascertainment condition.Entities:
Mesh:
Year: 2022 PMID: 35544468 PMCID: PMC9094505 DOI: 10.1371/journal.pone.0266752
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.752
Characteristics of subjects from the UK Biobank, COPDGene, and CAMP data.
For continuous variables, we give the mean and standard deviation (i.e. mean (sd)). Sample 1 is for FEV1 as the outcome. Sample 2 is for age at smoking cessation as the outcome and includes former smokers. Sample 3 is for FEV1 as the outcome for the subjects with the lowest 20% of FEV1.
| UK Biobank | COPDGene: non-Hispanic white | COPDGene: African American | CAMP | |
|---|---|---|---|---|
| Sample 1, | 151,879 | 6,764 | 3,365 | 698 |
| FEV1 | 2.77 (0.75) | 2.22 (0.95) | 2.29 (0.86) | 1.83 (0.50) |
| Sex (male), | 88,406 (58.21) | 3,553 (52.53) | 1,856 (55.16) | 408 (58.45) |
| Age, years | 56.25 (7.98) | 62.02 (8.86) | 54.66 (7.21) | 8.85 (2.13) |
| BMI | 27.52 (4.86) | 28.68 (6.05) | 29.07 (6.66) | 17.78 (3.05) |
| Height, cm | 167.84 (9.08) | 169.74 (9.46) | 171.01 (9.67) | 132.84 (13.84) |
| Sample 2, | 21,142 | 4,104 | 673 | - |
| Smoking cessation, age in years | 37.03 (10.33) | 50.92 (11.03) | 51.51 (9.66) | - |
| Education (college or university), | 9,201 (43.52) | 3,039 (74.05) | 341 (50.67) | - |
| Pack years | 18.09 (14.46) | 46.71 (26.96) | 38.51 (22.29) | - |
| Smoker in household, | 2,338 (11.06) | 3,268 (79.63) | 521 (77.41) | - |
| Age started smoking, years | 17.43 (3.18) | 16.95 (3.85) | 17.13 (4.97) | - |
| Sample 3, | 29,805 | - | - | - |
| FEV1 | 1.81 (0.28) | - | - | - |
| Sex (male), | 26,078 (87.50) | - | - | - |
| Age, years | 61.00 (6.27) | - | - | - |
| BMI | 28.26 (5.53) | - | - | - |
| Height, cm | 160.77 (7.06) | - | - | - |
Best neural network model features for predicting the different outcomes, determined by testing different combinations of activation functions, number of layers, and number of neurons per layer for each data set.
| Outcome | First Hidden Layer | Second Hidden Layer | ||
|---|---|---|---|---|
| Activation Function | Neurons | Activation Function | Neurons | |
| FEV1 | Sigmoid | 64 | Sigmoid | 16 |
| Smoking Cessation | Hard Sigmoid | 64 | RELU | 32 |
| Log Smoking Cessation | Sigmoid | 64 | Sigmoid | 32 |
Fig 1The plot in the top left shows the density plot of smoking cessation (age).
The plot in the top right shows the density plot of log smoking cessation (age). The plot in the bottom left shows the density plot of FEV1.
Fig 2This figure includes box plots of the MSE for the different studies and outcomes when 50% of the data was used to train the models.