| Literature DB >> 23051645 |
Ana I Vazquez1, Gustavo de los Campos, Yann C Klimentidis, Guilherme J M Rosa, Daniel Gianola, Nengjun Yi, David B Allison.
Abstract
Prediction of genetic risk for disease is needed for preventive and personalized medicine. Genome-wide association studies have found unprecedented numbers of variants associated with complex human traits and diseases. However, these variants explain only a small proportion of genetic risk. Mounting evidence suggests that many traits, relevant to public health, are affected by large numbers of small-effect genes and that prediction of genetic risk to those traits and diseases could be improved by incorporating large numbers of markers into whole-genome prediction (WGP) models. We developed a WGP model incorporating thousands of markers for prediction of skin cancer risk in humans. We also considered other ways of incorporating genetic information into prediction models, such as family history or ancestry (using principal components, PCs, of informative markers). Prediction accuracy was evaluated using the area under the receiver operating characteristic curve (AUC) estimated in a cross-validation. Incorporation of genetic information (i.e., familial relationships, PCs, or WGP) yielded a significant increase in prediction accuracy: from an AUC of 0.53 for a baseline model that accounted for nongenetic covariates to AUCs of 0.58 (pedigree), 0.62 (PCs), and 0.64 (WGP). In summary, prediction of skin cancer risk could be improved by considering genetic information and using a large number of single-nucleotide polymorphisms (SNPs) in a WGP model, which allows for the detection of patterns of genetic risk that are above and beyond those that can be captured using family history. We discuss avenues for improving prediction accuracy and speculate on the possible use of WGP to prospectively identify individuals at high risk.Entities:
Mesh:
Year: 2012 PMID: 23051645 PMCID: PMC3512154 DOI: 10.1534/genetics.112.141705
Source DB: PubMed Journal: Genetics ISSN: 0016-6731 Impact factor: 4.562
Figure 1 First 20 eigenvalues derived from ethnicity-informative panel of 1000 SNPs.
Estimated probabilities and 95% credibility region (CR) of developing skin cancer for different levels of the predictor variables, derived from a model including sex, cohort, and the first two principal components of 1000 ethinicity-informative SNPs
| Cohort | Sex | Probability of developing skin cancer | |
|---|---|---|---|
| Estimate | CR 95% | ||
| Original | Male | 0.242 | [0.213, 0.273] |
| Offspring | Male | 0.153 | [0.138, 0.171] |
| Original | Female | 0.190 | [0.167, 0.215] |
| Offspring | Female | 0.115 | [0.103, 0.129] |
Figure 2 (A) First (x-axis) and second (y-axis) principal components eigenvectors derived from 1000 ethnicity-informative SNPs (red dots correspond to subjects that developed skin cancer, and gray dots correspond to healthy subjects). (B) Empirical distribution of the first principal component separated by cancerous or healthy subjects.
Incidence of skin cancer by levels defined using the first and second eigenvectors of the ethnicity SNP-derived principal components
| Group | ||||
|---|---|---|---|---|
| First principal component | 0.178 | 0.150 | 0.126 | 0.098 |
| Second principal component | 0.100 | 0.150 | 0.152 | 0.163 |
, Value of the first and second principal component in subject i; , corresponding quartile.
Figure 3 Scatter plot of the pedigree-based predicted genetic risk for skin cancer and the SNP-based ones (, respectively), as well as the histogram of their distribution.
Figure 4 Mean area under the curve for 20-fold cross-validation for (A) a model without any genetic information and two models with genetic information, one including pedigree and a WGP model, and (B) for WGP models of increasing number of SNPs.
Area under the curve estimated in the subjects that have no relatives in the training set and in the subjects that do, for all the models
| Covariates | Pedigree | PC-SNP | 41K-SNP | |
|---|---|---|---|---|
| No relatives in training set | 0.540 | 0.549 | 0.635 | 0.629 |
| At least one relative in training set | 0.531 | 0.583 | 0.619 | 0.637 |
Figure 5 AUC in 500 random training–testing sets of genetically informed models (pedigree model and PC-SNP and 41K-SNP models) vs. the baseline model (covariates) and average AUC for the 500 training–testing sets in the 41K-SNP model vs. the pedigree model.