| Literature DB >> 33154391 |
Bin Han1, Huashuai Chen2,3, Yao Yao4, Xiaomin Liu5,6, Chao Nie5,6, Junxia Min7, Yi Zeng8,9, Michael W Lutz10.
Abstract
In this study, we split 2156 individuals from the Chinese Longitudinal Healthy Longevity Survey (CLHLS) data into two groups, establishing a phenotype of exceptional longevity & normal cognition versus cognitive impairment. We conducted a genome-wide association study (GWAS) to identify significant genetic variants and biological pathways that are associated with cognitive impairment and used these results to construct polygenic risk scores. We elucidated the important and robust factors, both genetic and non-genetic, in predicting the phenotype, using several machine learning models. The GWAS identified 28 significant SNPs at p-value [Formula: see text] significance level and we pinpointed four genes, ESR1, PHB, RYR3, GRIK2, that are associated with the phenotype though immunological systems, brain function, metabolic pathways, inflammation and diet in the CLHLS cohort. Using both genetic and non-genetic factors, four machine learning models have close prediction results for the phenotype measured in Area Under the Curve: random forest (0.782), XGBoost (0.781), support vector machine with linear kernel (0.780), and [Formula: see text] penalized logistic regression (0.780). The top four important and congruent features in predicting the phenotype identified by these four models are: polygenic risk score, sex, age, and education.Entities:
Mesh:
Substances:
Year: 2020 PMID: 33154391 PMCID: PMC7645680 DOI: 10.1038/s41598-020-75446-2
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Summary statistics of non-genetic factors.
| Feature | Male (N = 533) | Female (N = 1623) | Total (N = 2156) |
|---|---|---|---|
| Age (years), mean (SD) | 100.0 (3.55) | 101.7 (3.54) | 101.3 (3.62) |
| Education (years), mean (SD) | 2.9 (3.9) | 0.3 (1.3) | 0.9 (2.49) |
| White-Collar | 64 (0.12) | 16 (0.01) | 80 (0.04) |
| Other | 469 (0.88) | 1607 (0.99) | 2076 (0.96) |
| Single | 450 (0.84) | 1598 (0.98) | 2048 (0.95) |
| Partnered | 83 (0.16) | 25 (0.02) | 108 (0.05) |
| Corn | 13 (0.02) | 54 (0.03) | 67 (0.03) |
| Rice | 295 (0.55) | 859 (0.53) | 1154 (0.53) |
| Wheat | 129 (0.24) | 467 (0.28) | 596 (0.28) |
| Other | 96 (0.18) | 243 (0.15) | 339 (0.16) |
| Co-residence | 449 (0.85) | 1370 (0.84) | 1819 (0.84) |
| Fruit intake | 73 (0.14) | 188 (0.12) | 261 (0.12) |
| Vegetables intake | 433 (0.81) | 1324 (0.82) | 1757 (0.81) |
| Current smoker | 93 (0.17) | 78 (0.05) | 171 (0.08) |
| Former smoker | 221 (0.41) | 159 (0.10) | 380 (0.18) |
| Current drinker | 103 (0.19) | 133 (0.08) | 236 (0.11) |
| Former drinker | 205 (0.38) | 211 (0.13) | 416 (0.19) |
| Exercise currently | 141 (0.26) | 185 (0.11) | 326 (0.15) |
| Hypertension | 88 (0.17) | 286 (0.18) | 374 (0.17) |
| Diabetes | 6 (0.01) | 12 (0.01) | 18 (0.01) |
| Heart | 53 (0.1) | 131 (0.08) | 184 (0.09) |
| Cardiovascular disease | 29 (0.05) | 79 (0.05) | 108 (0.05) |
| Respiratory | 77 (0.14) | 140 (0.09) | 217 (0.10) |
Data are provided as count (percentage), unless specified in the feature column. From variable “Co-residence” to “Respiratory”, all the features take binary values of either Yes or No. Their count values sum individuals with Yes response.
Study sample stratified by age groups and sex, conditioned on cognitive status.
| Age groups | Cognitively impaired | Cognitively normal | ||
|---|---|---|---|---|
| Female | Male | Female | Male | |
| 90–95 | 9 | 6 | 12 | 21 |
| 95–100 | 191 | 78 | 156 | 122 |
| 100–105 | 615 | 115 | 316 | 142 |
| 105–110 | 189 | 26 | 95 | 18 |
| 110+ | 26 | 2 | 14 | 3 |
The age groups are inclusive on the right bounds and exclusive on the left bounds.
Information of significant SNPs from GWAS using as the p-value threshold.
| SNP | Chr. | Position | Nearest gene | A1 | A2 | MAF | Odds ratio | Lower-95CI | Upper-95CI | |
|---|---|---|---|---|---|---|---|---|---|---|
| rs13198061 | 6 | 152,306,894 | ESR1* | T | C | 0.051 | 0.49 | 0.37 | 0.66 | |
| rs939432 | 15 | 33,986,294 | RYR3* | C | A | 0.274 | 0.71 | 0.61 | 0.82 | |
| rs954303 | 16 | 59,581,776 | RNU4-58P (7606) | A | G | 0.155 | 0.66 | 0.55 | 0.78 | |
| rs56368572 | 5 | 11,300,912 | CTNND2* | T | C | 0.094 | 0.62 | 0.50 | 0.77 | |
| rs4816332 | 21 | 30,201,706 | N6AMT1 (42807) | C | T | 0.400 | 0.74 | 0.65 | 0.85 | |
| rs1030695 | 4 | 130,318,150 | RP11-419L4.1 (91973) | T | A | 0.299 | 0.73 | 0.64 | 0.84 | |
| rs1293144 | 20 | 52,917,208 | PFDN4 (72617) | T | G | 0.371 | 0.75 | 0.66 | 0.85 | |
| rs62001981 | 15 | 25,279,909 | RP11-701H24.10* & PWAR6* | T | C | 0.196 | 0.70 | 0.60 | 0.82 | |
| rs9404070 | 6 | 101,463,320 | GRIK2 (383344) | G | A | 0.415 | 0.76 | 0.67 | 0.86 | |
| rs76299633 | 13 | 40,727,639 | LINC00332 (28307) | G | A | 0.119 | 0.66 | 0.55 | 0.80 | |
| rs9676032 | 18 | 48,297,450 | MRO (27124) | T | A | 0.131 | 0.67 | 0.56 | 0.81 | |
| rs28673399 | 4 | 71,371,765 | AMTN (12492) | G | A | 0.448 | 0.76 | 0.67 | 0.86 | |
| rs10500293 | 19 | 46,431,638 | NOVA2 (5354) | G | A | 0.440 | 0.77 | 0.68 | 0.87 | |
| rs72627042 | 3 | 23,906,287 | UBE2E1* | T | C | 0.058 | 0.55 | 0.41 | 0.72 | |
| rs13028996 | 2 | 234,246,225 | SAG* | C | T | 0.466 | 1.38 | 1.21 | 1.57 | |
| rs6726046 | 2 | 234,287,221 | DGKD* & AC019221.4* | A | G | 0.375 | 1.37 | 1.21 | 1.56 | |
| rs935129 | 17 | 47,486,016 | RP11-81K2.1* & PHB* | A | G | 0.387 | 1.35 | 1.18 | 1.53 | |
| rs2792251 | 1 | 164,541,977 | PBX1* | G | A | 0.136 | 1.53 | 1.27 | 1.85 | |
| rs6547617 | 2 | 85,655,402 | SH2D6* | A | T | 0.406 | 1.35 | 1.18 | 1.55 | |
| rs10037430 | 5 | 180,569,007 | OR2V2 (12936) | C | T | 0.069 | 1.93 | 1.44 | 2.59 | |
| rs7710849 | 5 | 82,220,225 | RP11-78C3.1 (3287) | T | A | 0.052 | 2.04 | 1.48 | 2.80 | |
| rs79669991 | 22 | 43,936,861 | - | A | G | 0.253 | 1.39 | 1.20 | 1.61 | |
| rs2418761 | 10 | 107,295,345 | RNU6-463P (822) | C | T | 0.110 | 1.57 | 1.28 | 1.93 | |
| rs7927292 | 11 | 44,730,158 | RP11-45A12.2 (10784) | A | C | 0.082 | 1.71 | 1.34 | 2.18 | |
| rs741171 | 16 | 6,652,854 | RP11-420N3.2* & RBFOX1* | G | A | 0.206 | 1.41 | 1.21 | 1.66 | |
| rs57164734 | 11 | 44,773,258 | TSPAN18* | G | C | 0.089 | 1.66 | 1.31 | 2.11 | |
| rs2528812 | 7 | 22,446,110 | STEAP1B (12953) | C | T | 0.420 | 1.32 | 1.16 | 1.50 | |
| rs4934715 | 10 | 35,364,992 | CUL2* | T | G | 0.306 | 1.35 | 1.17 | 1.55 | |
The nearest genes are either the genes that contain the variants (overlapping) or the nearest upstream/downstream gene to the variants. * indicates overlapping gene. Distances to the nearest upstream/downstream genes are listed in the parenthesis, measured in bp distance.
Figure 1Average AUC from fivefold cross validation using PRS to predict the cognitive impairment. The p-value threshold increases from in panel (a) to in panel (b) and to in panel (c).
Model performances and important features from the best four predictive models.
| Model | Performance (AUC) | Top six important features (descending in importance) |
|---|---|---|
| 0.780 | PRS, education, age, sex, vegetables intake, former smoker | |
| SVM—linear kernel | 0.780 | PRS, education, age, sex, co-residence, vegetables intake |
| Random forest | 0.782 | PRS, age, education, staple food, sex, exercise currently |
| XGBoost | 0.781 | PRS, education, staple food, sex, age, exercise currently |
Model performances are almost identical and four importance factors are congruent among the models—PRS, education, age, and sex.
Results of the gene to function pathway analysis.
| Category | GeneSet | N Genes | N Overlap | P | adjP | Genes | Link |
|---|---|---|---|---|---|---|---|
| Immunologic signatures | GSE12392_WT_VS_IFNB_KO _CD8A_POS_SPLEEN_DC_DN | 200 | 4 | 0.022 | RYR3:PHB:GRIK2:ESR1 | Link1 | |
| Transcription factor targets | AACTTT_UNKNOWN | 1928 | 7 | 0.037 | PBX1:RYR3:RBFOX1:DGKD: UBE2E1:CTNND2:ESR1 | Link2 |
Link 1: http://www.gsea-msigdb.org/gsea/msigdb/cards/GSE12392_WT_VS_IFNB_KO_CD8A_POS_SPLEEN_DC_DN
Link 2: http://www.gsea-msigdb.org/gsea/msigdb/cards/AACTTT_UNKNOWN
“adjP” refers to adjustment for multiple comparisons.
Figure 2Composition of the study sample. The numbers in the parenthesis are the sample size for each survey cohort.
Count of missing values of non-genetic variables.
| Variable | Count of missing value (%) |
|---|---|
| Education | 10 (0.44%) |
| Occupation | 126 (5.61%) |
| Marital status | 11 (0.49%) |
| Co-residence | 9 (0.40%) |
| Staple food | 1 (0.04%) |
| Fruit intake | 4 (0.18%) |
| Vegetables intake | 5 (0.22%) |
| Current smoker | 4 (0.18%) |
| Former smoke | 1 (0.04%) |
| Current drinker | 10 (0.45%) |
| Former drinker | 2 (0.09%) |
| Exercise currently | 19 (0.85%) |
| Hypertension | 76 (3.39%) |
| Diabetes | 87 (3.88%) |
| Heart | 71 (3.17%) |
| Cardiovascular disease | 75 (3.34%) |
| Respiratory | 63 (2.81%) |
Variables that do not have missing values are not listed.