| Literature DB >> 27129736 |
Ana I Vazquez1, Yogasudha Veturi2, Michael Behring3, Sadeep Shrestha4, Matias Kirst5, Marcio F R Resende5, Gustavo de Los Campos6.
Abstract
Whole-genome multiomic profiles hold valuable information for the analysis and prediction of disease risk and progression. However, integrating high-dimensional multilayer omic data into risk-assessment models is statistically and computationally challenging. We describe a statistical framework, the Bayesian generalized additive model ((BGAM), and present software for integrating multilayer high-dimensional inputs into risk-assessment models. We used BGAM and data from The Cancer Genome Atlas for the analysis and prediction of survival after diagnosis of breast cancer. We developed a sequence of studies to (1) compare predictions based on single omics with those based on clinical covariates commonly used for the assessment of breast cancer patients (COV), (2) evaluate the benefits of combining COV and omics, (3) compare models based on (a) COV and gene expression profiles from oncogenes with (b) COV and whole-genome gene expression (WGGE) profiles, and (4) evaluate the impacts of combining multiple omics and their interactions. We report that (1) WGGE profiles and whole-genome methylation (METH) profiles offer more predictive power than any of the COV commonly used in clinical practice (e.g., subtype and stage), (2) adding WGGE or METH profiles to COV increases prediction accuracy, (3) the predictive power of WGGE profiles is considerably higher than that based on expression from large-effect oncogenes, and (4) the gain in prediction accuracy when combining multiple omics is consistent. Our results show the feasibility of omic integration and highlight the importance of WGGE and METH profiles in breast cancer, achieving gains of up to 7 points area under the curve (AUC) over the COV in some cases.Entities:
Keywords: GenPred; Shared data resource; diseases risk; genomic selection; omics integration; prediction of complex traits
Mesh:
Substances:
Year: 2016 PMID: 27129736 PMCID: PMC4937492 DOI: 10.1534/genetics.115.185181
Source DB: PubMed Journal: Genetics ISSN: 0016-6731 Impact factor: 4.562
Parameter estimates, model goodness of fit, model complexity, and predictive accuracy (case study I)
| Whole data analysis | 200 CVs | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Model | Predictors | Log likelihood | Effective number of parameters (pD) | Deviance information criteria (DIC) | Average CV-AUC | Proportion of times (of 200 CVs) model in column had AUC > model in row | |||||||||||
| Age at diagnosis | Race | Lobular (Y/N) | Tumor subtype | Pathological stage | Gene expression | M2 | M3 | M4 | M5 | M6 | M7 (COV) | M8 (COV + WGGE) | |||||
| M1 | X | −146.1 | 2.1 | 294.3 | 0.557 | 0.14 | <0.01 | >0.99 | >0.99 | >0.99 | >0.99 | >0.99 | |||||
| M2 | X | −147.5 | 2.0 | 296.9 | 0.525 | 0.59 | >0.99 | >0.99 | >0.99 | >0.99 | >0.99 | ||||||
| M3 | X | −144.3 | 2.0 | 290.6 | 0.526 | >0.99 | >0.99 | >0.99 | >0.99 | >0.99 | |||||||
| M4 | X | −138.6 | 4.1 | 281.3 | 0.618 | 0.14 | >0.99 | >0.99 | >0.99 | ||||||||
| M5 | X | −142.4 | 2.0 | 286.9 | 0.596 | >0.99 | >0.99 | >0.99 | |||||||||
| M6 | X | −132.4 | 15.5 | 280.3 | 0.659 | >0.99 | >0.99 | ||||||||||
| M7: COV | X | X | X | X | X | −146.3 | 3.2 | 295.8 | 0.704 | >0.99 | |||||||
| M8: COV + WGGE | X | X | X | X | X | X | −131.3 | 17.6 | 280.3 | 0.721 | |||||||
African American, Y/N.
Estimated posterior mean of the log likelihood.
Average over 200 tenfold CVs.
The same letter indicates that the models are no different (empirical P < 0.05).
Parameter estimates, model goodness of fit, model complexity, and prediction accuracy (case study II)
| Whole data analysis | 200 CVs | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Model | Covariates | Oncotype DX | Whole-genome gene expression (WGGE) | Estimated variance (90% posterior confidence region) | Log likelihood | Effective number of parameters (pD) | Deviance information criteria (DIC) | Average CV-AUC, | AUC model in column > AUC model in row, | Average CV-AUC, | AUC model in column > AUC model in row, | |||
| Oncotype DX | Whole-genome gene expression | M10 | M11 | M10 | M11 | |||||||||
| M9 (COV) | X | — | — | −59.7 | 4.4 | 123.7 | 0.703 | 0.96 | >0.99 | 0.689 | 0.43 | 0.99 | ||
| M10 (COV + ONCO) | X | X | 0.027 (0.003; 0.056) | — | −45.3 | 4.2 | 94.9 | 0.725 | — | 0.99 | 0.685 | — | >0.99 | |
| M11 (COV + WGGE) | X | X | — | 0.439 (0.083; 0.931) | −37.7 | 9.2 | 84.6 | 0.774 | — | — | 0.755 | — | — | |
Age and race (African American, Y/N).
Estimated posterior mean of the log likelihood.
Average over 200 tenfold CVs.
Proportion of times that the model in column had AUC > the model in row (in 200 tenfold CVs).
The same letter indicates that the models are no different (empirical P < 0.05).
Figure 1Venn diagram with the number of patients who had information by omic layer (CNV, copy number variant; miRNA, micro-RNA; RNA, RNA abundance measured with RNA-seq).
Parameter estimates, model goodness of fit, model complexity, and prediction accuracy (case study III)
| Whole data analysis | 200 CVs | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Set | Model | Factors Included | Variance (90% posterior confidence region) | Log likelihood | Effective number of parameters (pD) | Deviance information criteria (DIC) | Average CV-AUC | Proportion of times model in column had AUC > model in the row | ||||
| Covariates | CNV | METH | miRNA | M13, M16, M19: omic only | M14, M17, M20: COV + omic | |||||||
| Set 1 ( | M12: COV | X | — | −125.5 | 8.1 | 259.0 | 0.699 | <0.01 | >0.99 | |||
| M13: CNV | X | 0.637 (0.155; 1.124) | −112.1 | 26.8 | 250.9 | 0.653 | — | >0.99 | ||||
| M14: COV + CNV | X | X | 0.398 (0.070; 0.736) | −110.5 | 24.5 | 245.6 | 0.714 | — | — | |||
| Set 2 ( | M15: COV | X | — | −88.7 | 8.4 | 185.7 | 0.667 | 0.60 | >0.99 | |||
| M16: METH | X | 0.652 (0.086; 1.261) | −76.6 | 18.7 | 171.8 | 0.672 | — | 0.76 | ||||
| M17: COV + METH | X | X | 0.402 (0.032; 0.739) | −78.9 | 18.5 | 176.3 | 0.684 | — | — | |||
| Set 3 ( | M18: COV | X | — | −71.2 | 8.2 | 150.6 | 0.747 | <0.01 | 0.29 | |||
| M19: miRNA | X | 0.338 (0.072; 0.615) | −75.2 | 13.5 | 163.8 | 0.623 | — | >0.99 | ||||
| M20: COV + miRNA | X | X | 0.179 (0.029; 0.324) | −67.3 | 13.8 | 148.5 | 0.744 | — | — | |||
Age: African American, Y/N; lobular (Y/N); cancer subtype and stage.
Copy-number variants.
Methylation.
Whole-genome RNA-seq.
Estimated posterior mean of the log likelihood.
Average over 200 tenfold CVs.
The same letter indicates that the models are no different (empirical P < 0.05).
Parameter estimates, model goodness of fit, model complexity, and prediction accuracy (case study IV)
| Whole data analysis | 200 CVs | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Models | Models components | Estimated variance (90% posterior confidence region) | Log likelihood | Effective number of parameters (pD) | Deviance information criteria (DIC) | Average CV-AUC | Proportion of times model in column had AUC > model in row | ||||||
| COV | METH | WGGE | METH × WGGE | METH | WGGE | METH × WGGE | COV + METH + WGGE | COV + METH × WGGE | |||||
| COV | X | — | — | — | −85.7 | 6.4 | 177.9 | 0.724 | >0.99 | >0.99 | |||
| COV + METH + WGGE | X | X | X | 0.162 (0.075; 0.440) | 0.220 (0.090; 0.690) | — | −73.9 | 17.6 | 165.4 | 0.754 | — | 0.40 | |
| COV + METH × WGGE | X | X | X | X | 0.101 (0.046; 0.272) | 0.138 (0.055; 0.474) | 0.101 (0.044; 0.329) | −69.9 | 20.2 | 159.9 | 0.753 | — | — |
Age: African American Y/N; lobular (Y/N); and tumor subtype.
Methylation.
Whole-genome RNA-seq.
Methylation-by-WGGE.
Estimated posterior mean of the log likelihood.
Average over 200 tenfold CVs.
The same letter indicates that the models are no different (empirical P < 0.05).