| Literature DB >> 26792494 |
Yiran Guo1, Zhi Wei2, Brendan J Keating3,4, Hakon Hakonarson5,6.
Abstract
BACKGROUND: Anorexia nervosa (AN) is a complex psychiatric disease with a moderate to strong genetic contribution. In addition to conventional genome wide association (GWA) studies, researchers have been using machine learning methods in conjunction with genomic data to predict risk of diseases in which genetics play an important role.Entities:
Mesh:
Year: 2016 PMID: 26792494 PMCID: PMC4721143 DOI: 10.1186/s12920-016-0165-x
Source DB: PubMed Journal: BMC Med Genomics ISSN: 1755-8794 Impact factor: 3.063
Sample sizes of participating studies
| Country | Cases | Controls |
|---|---|---|
| Canada | 54 | – |
| Czech Republic | 72 | – |
| Finland | 131 | 404 |
| France | 293 | – |
| Germany | 475 | – |
| Greece | 70 | – |
| Italy-North | 203 | – |
| Italy-South | 75 | – |
| Netherlands | 348 | – |
| Norway | 82 | – |
| Poland | 175 | – |
| Spain | 186 | – |
| Sweden | 39 | – |
| UK | 213 | – |
| USA | 491 | – |
| USA-CHOP | 1033 | 8862 |
| Total | 3940 | 9266 |
Fig. 1Logistic regression model with ten-fold validation. By harnessing L1 penalty (the lasso), we further removed irrelevant SNPs in fold2 after the preselection step in fold1. Smaller lambda (the penalty parameter) values correspond to fewer SNPs removed, and numbers on the top of the plot indicate how many SNPs survived with respect to specific lambdas as X-axis (natural logarithm scale). We estimated the mean and standard error (SE) for AUCs across 100 different lambda values, and reported the largest lambda such that AUC is within 1 SE of the optimum (the left vertical dashed line shows the lambda with maximum of AUC, while the right vertical dashed line shows the lambda with AUC being within 1 SE of that maximum). The optimal 10-fold cross-validated AUCs on fold 2 data was 0.673 with regularization parameter lambda of 0.00954
Fig. 2Relationship between smaller fold2 sample size (from 10 % to 90 % of the original) and AUC in fold3. 10 % of fold2 corresponds to 129 cases and 312 controls. Error bars with one standard deviation of 10 reruns are shown. Blue horizontal dashed line indicates AUC of fold3 when 100 % of the fold2 data were employed to train the LR model
Fig. 3Relationship between training dataset with more samples and AUC in the model testing dataset, when moving samples from fold3 to fold2. 10 % of fold2 corresponds to 129 cases and 312 controls. Error bars with one standard deviation of 10 reruns are shown. Blue horizontal dashed line indicates AUC of fold3 when 100 % of the fold2 data were employed to train the LR model