| Literature DB >> 20941391 |
Sebastian Okser1, Terho Lehtimäki, Laura L Elo, Nina Mononen, Nina Peltonen, Mika Kähönen, Markus Juonala, Yue-Mei Fan, Jussi A Hernesniemi, Tomi Laitinen, Leo-Pekka Lyytikäinen, Riikka Rontu, Carita Eklund, Nina Hutri-Kähönen, Leena Taittonen, Mikko Hurme, Jorma S A Viikari, Olli T Raitakari, Tero Aittokallio.
Abstract
The relative contribution of genetic risk factors to the progression of subclinical atherosclerosis is poorly understood. It is likely that multiple variants are implicated in the development of atherosclerosis, but the subtle genotypic and phenotypic differences are beyond the reach of the conventional case-control designs and the statistical significance testing procedures being used in most association studies. Our objective here was to investigate whether an alternative approach--in which common disorders are treated as quantitative phenotypes that are continuously distributed over a population--can reveal predictive insights into the early atherosclerosis, as assessed using ultrasound imaging-based quantitative measurement of carotid artery intima-media thickness (IMT). Using our population-based follow-up study of atherosclerosis precursors as a basis for sampling subjects with gradually increasing IMT levels, we searched for such subsets of genetic variants and their interactions that are the most predictive of the various risk classes, rather than using exclusively those variants meeting a stringent level of statistical significance. The area under the receiver operating characteristic curve (AUC) was used to evaluate the predictive value of the variants, and cross-validation was used to assess how well the predictive models will generalize to other subsets of subjects. By means of our predictive modeling framework with machine learning-based SNP selection, we could improve the prediction of the extreme classes of atherosclerosis risk and progression over a 6-year period (average AUC 0.844 and 0.761), compared to that of using conventional cardiovascular risk factors alone (average AUC 0.741 and 0.629), or when combined with the statistically significant variants (average AUC 0.762 and 0.651). The predictive accuracy remained relatively high in an independent validation set of subjects (average decrease of 0.043). These results demonstrate that the modeling framework can utilize the "gray zone" of genetic variation in the classification of subjects with different degrees of risk of developing atherosclerosis.Entities:
Mesh:
Year: 2010 PMID: 20941391 PMCID: PMC2947986 DOI: 10.1371/journal.pgen.1001146
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Figure 1Distributions of intima-media thickness (IMT) of the study subjects.
(A) IMT levels in the baseline and follow-up studies in 2001 and 2007, respectively. (B) IMT changes from 2001 to 2007. The age-stratified distributions depict the baseline age groups of 24–30 and 33–39 years (Younger and Older subjects), as well as their combined distribution (All subjects). The vertical lines indicate the representative 15% and 85% quantile points (q) that divide the subjects into two risk groups: the low-risk class (subjects with the lowest q% of IMT levels or changes) and the high-risk class (subjects with the highest q% of IMT levels or changes).
The baseline characteristics in 2001 along with their correlations with the 2007 level and progression of intima-media thickness (IMT).
| Conventional Risk Factor | Mean (SD) | IMT 2001 | IMT 2007 | IMT Progression | |||
|
|
|
|
|
|
| ||
| Sex (% women) | 55.3 | 0.132 | <0.001 | 0.195 | <0.001 | 0.086 | NS |
| Age in 2001 (years) | 31.7 (4.92) | 0.290 | <0.001 | 0.301 | <0.001 | 0.041 | NS |
| BMI (kg/m2) | 25.2 (4.38) | 0.152 | <0.001 | 0.188 | <0.001 | 0.094 | NS |
| Waist circumference (mm) | 84.0 (12.0) | 0.189 | <0.001 | 0.260 | <0.001 | 0.133 | 0.006 |
| Systolic blood pressure (mmHg) | 117 (13.2) | 0.180 | <0.001 | 0.158 | <0.001 | 0.044 | NS |
| Diastolic blood pressure (mmHg) | 70.6 (10.5) | 0.220 | <0.001 | 0.160 | <0.001 | −0.020 | NS |
| Total cholesterol (mmol/L) | 5.17 (0.99) | 0.113 | 0.011 | 0.155 | <0.001 | 0.082 | NS |
| LDL cholesterol (mmol/L) | 3.28 (0.86) | 0.126 | 0.002 | 0.166 | <0.001 | 0.087 | NS |
| HDL cholesterol (mmol/L) | 1.29 (0.32) | −0.037 | NS | −0.107 | NS | −0.089 | NS |
| Triglycerides (mmol/L) | 1.35 (0.86) | 0.047 | NS | 0.131 | 0.007 | 0.099 | NS |
| ApoA1 (g/L) | 1.49 (0.26) | −0.052 | NS | −0.085 | NS | −0.039 | NS |
| ApoB (g/L) | 1.06 (0.27) | 0.110 | 0.016 | 0.195 | <0.001 | 0.138 | 0.003 |
| Smoking (% subjects) | 22.8 | 0.049 | NS | 0.007 | NS | −0.011 | NS |
*The characteristics in 2001 were used as potential confounding risk factors in predictive models.
†Pearson correlation coefficient (r-value) was calculated using the risk factors collected in 2001.
‡Statistical significance (Bonferroni corrected p-value) is from the t-distribution with n-2 df (n = 1,027 in 2001 and n = 813 in 2007); NS, non-significant.
Figure 2Prediction accuracy as a function of increasing risk classes.
The accuracy was defined using the area under the receiver operating characteristic curve (AUC), and the risk classes using the quantile points (5–25%). (A) Prediction of the baseline IMT risk classes in 2001 when using the conventional risk factors either alone, or when combined with the panel of 17 SNPs associated in previous studies with cardiovascular morbidity (Established SNPs), with those SNPs that are significantly associated with the low- and high-risk classes (Significant SNPs), or with the most predictive SNPs identified using the machine learning-based approach (Predictive SNPs). (B) Prediction of the follow-up IMT risk classes in 2007 using the baseline conventional and genetic risk factors measured in 2001. (C) Prediction of the IMT progression risk classes when using the baseline conventional and genetic risk factors measured in 2001 (the same as in (A,B)).
The single nucleotide polymorphisms (SNPs) predictive of the subjects with 15% lowest and highest IMT levels in 2001.
| SNP ID | Gene symbol (HGNC name) | SNP location (Chr region) | Significance | Predictive power |
| rs2073658 | USF1 | 1q23.3 | 0.70 | 11.8 |
| rs1205 | CRP | 1q23.2 | 0.02 | 10.6 |
| rs805305 | DDAH2 | 6p21.33 | 0.38 | 9.68 |
|
| ABCA1 | 9q31.1 | 0.81 | 7.53 |
| rs6929137 | C6orf97 | 6q25.1 | 0.10 | 7.53 |
| rs4073307 | IGSF1 | Xq26.1 | 0.71 | 6.45 |
|
| APOB | 2p24.1 | 0.53 | 6.45 |
| rs3130340 | INTERGENIC | 6p21.32 | 0.11 | 6.45 |
| rs599839 | PSRC1 | 1p13.3 | 0.10 | 6.45 |
|
| INTERGENIC | 2p24.1 | 1.00 | 5.38 |
| rs1143634 | IL1B | 2q13 | 0.51 | 5.38 |
| rs4404254 | ICOS | 2q33.2 | 0.16 | 4.30 |
| rs2548861 | WWOX | 16q23.1 | 0.14 | 4.30 |
| rs2553268 | WRN | 8p12 | 0.15 | 3.23 |
| rs4937100 | IL18 | 11q23.1 | 0.22 | 2.15 |
| rs2516839 | USF1 | 1q23.3 | 0.13 | 2.15 |
*The SNPs identified also in the previous case-control association studies [9]–[21] are boldfaced.
†The corrected p-values larger than one were truncated to unity.
‡The SNPs are arranged according to their contribution to the overall prediction accuracy (AUC).
The single nucleotide polymorphisms (SNPs) predictive of the subjects with 15% lowest and highest IMT levels in 2007.
| SNP ID | Gene symbol (HGNC name) | SNP location (Chr region) | Significance | Predictive power |
| rs17672135 | FMN2 | 1q43 | 0.41 | 17.5 |
| rs9941339 | CDH13 | 16q24.2-q24.3 | 0.75 | 8.75 |
| rs2548861 | WWOX | 16q23.1 | 0.14 | 8.75 |
| rs9939609 | FTO | 16q12.2 | 0.69 | 7.50 |
|
| APOB | 2p24.1 | 0.53 | 7.50 |
| rs17222814 | ALOX5AP | 13q12.3 | 0.89 | 7.50 |
| rs1041981 | LTA | 6p21.33 | 1.00 | 7.50 |
| rs9551963 | ALOX5AP | 13q12.3 | 0.64 | 6.25 |
| rs7524102 | INTERGENIC | 1p36.12 | 0.77 | 5.00 |
| rs2516839 | USF1 | 1q23.3 | 0.13 | 5.00 |
| rs2301880 | WNK1 | 12p13.33 | 1.00 | 5.00 |
| rs7759938 | INTERGENIC | 6q21 | 0.12 | 3.75 |
| rs9479055 | C6orf97 | 6q25.1 | 0.40 | 3.75 |
| rs3130340 | INTERGENIC | 6p21.32 | 0.11 | 3.75 |
| rs2553268 | WRN | 8p12 | 0.15 | 2.50 |
*The SNPs identified also in the previous case-control association studies [9]–[21] are boldfaced.
†The corrected p-values larger than one were truncated to unity.
‡The SNPs are arranged according to their contribution to the overall prediction accuracy (AUC).
The single nucleotide polymorphisms (SNPs) predictive of the subjects with 15% lowest and highest IMT changes from 2001 to 2007.
| SNP ID | Gene symbol (HGNC name) | SNP location (Chr region) | Significance | Predictive power |
| rs2073658 | USF1 | 1q23.3 | 0.70 | 9.40 |
| rs9479055 | C6orf97 | 6qs25.1 | 0.40 | 8.55 |
| rs17672135 | FMN2 | 1q43 | 0.41 | 8.55 |
| rs9687339 | MAST4 | 5q12.3 | 0.93 | 7.69 |
| rs1042713 | ADRB2 | 5q33.1 | 0.48 | 7.69 |
| rs2301880 | WNK1 | 12p13.33 | 1.00 | 6.84 |
| rs3130340 | INTERGENIC | 6p21.32 | 0.11 | 6.84 |
| rs2476601 | PTPN22 | 1p13.2 | 0.44 | 5.13 |
| rs11898505 | SPTBN1 | 2p16.2 | 0.27 | 5.13 |
|
| LPA | 6q25.3 | 1.00 | 5.13 |
| rs10172036 | ICOS | 2q33.2 | 0.52 | 5.13 |
| rs2820037 | INTERGENIC | 1q43 | 0.66 | 4.27 |
| rs2234693 | ESR1 | 6q25.1 | 0.74 | 3.42 |
| rs1800896 | IL10 | 1q32.1 | 0.71 | 3.42 |
| rs17222814 | ALOX5AP | 13q12.3 | 0.89 | 3.42 |
| rs1801274 | FCGR2A | 1q23.3 | 0.75 | 2.56 |
| rs854560 | PON1 | 7q21.3 | 0.81 | 1.71 |
| rs10246939 | TAS2R38 | 7q34 | 0.80 | 1.71 |
| rs9594738 | INTERGENIC | 13q14.11 | 0.58 | 1.71 |
| rs1799983 | NOS3 | 7q36.1 | 0.06 | 0.855 |
| rs1256049 | ESR2 | 14q23.2 | 0.46 | 0.855 |
*The SNPs identified also in the previous case-control association studies [9]–[21] are boldfaced.
†The corrected p-values larger than one were truncated to unity.
‡The SNPs are arranged according to their contribution to the overall prediction accuracy (AUC).
Figure 3Candidate interaction partners of a variant in USF1 (rs2516839).
The candidate SNP-SNP interactions were searched among the variants predictive of the extreme IMT progression (see Table S4). The interaction score for a SNP-pair (x,y) is , depicting the combined contribution of the SNP-pair to the predictive power (), relative to that of the individual SNPs' contributions ( and ). The predictive power was assessed in terms of how much the AUC value changed when the particular SNP or SNP-pair was deleted from the subset of variants. The Gene ID was used as a SNP identifier, where available; otherwise, the rs ID was used instead.
Figure 4Prediction accuracies on independent and randomized subject sets.
The accuracy was defined using the area under the receiver operating characteristic curve (AUC), and the risk classes using the quantile points (5%–25%). The prediction accuracies were evaluated for the baseline IMT risk classes in the independent dataset, in comparison with the cross-validated accuracies obtained in the original dataset using the same IMT thresholds, conventional risk factors and the most predictive SNPs identified with the machine learning-based procedure in the original subject set. The dotted trace shows the effect of deleting those subjects whose IMT level was the same or close to the quantile cut-off value (<0.02 difference in IMT). The randomized datasets were generated by first dividing the original set of subjects into the low- and high-risk classes at random, independent of their IMT-levels, and then repeating the same randomization process 100 times for each of the risk classes. The average AUC level for the various risk classes is reported. None of the 500 randomized datasets produced prediction accuracy higher than that obtained using the most predictive SNPs identified in the original set of subjects.