| Literature DB >> 30305021 |
Jie Liu1, Gangning Liang2, Kimberly D Siegmund3, Juan Pablo Lewinger3.
Abstract
BACKGROUND: To integrate molecular features from multiple high-throughput platforms in prediction, a regression model that penalizes features from all platforms equally is commonly used. However, data from different platforms are likely to differ in effect sizes, the proportion of predictive features, and correlations structures. Subtle but important features may be missed by shrinking all features equally.Entities:
Keywords: Classification; Data integration; Elastic net
Mesh:
Year: 2018 PMID: 30305021 PMCID: PMC6180486 DOI: 10.1186/s12859-018-2401-1
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Summary of simulation scenarios
| Independent Features | |||||
|---|---|---|---|---|---|
| Scenario # |
|
|
|
| Optimal penalty ratio ( |
| 1 | 0.6 | 0.8 | 5 | 20 | 0.55 |
| 2 | 0.6 | 0.6 | 5 | 20 | 0.70 |
| 3 | 0.8 | 0.6 | 5 | 20 | 0.75 |
| 4 | 0.8 | 0.8 | 5 | 5 | 1 |
| 5 | 0.8 | 0.6 | 5 | 5 | 1 |
| Correlated Features: | |||||
|
|
|
| Optimal penalty ratio ( | ||
| 6 | 0.4 | 0.2 | 0 | 0.85 | |
| 7 | 0.4 | 0.2 | 0.4 | 0.9 | |
Two hundred samples per data set, 250 features per omic type, 2 omic types. The performance of MTP EN is evaluated by varying effect sizes, number of informative features, and correlation structures between omic types. Specifically, ρ1 is the correlation between informative features in platform 1, ρ2 is the correlation between informative features in platform 2, ρ12 is the correlation between informative features from the different platforms, β1 and β2 are the effect sizes of informative features in platforms 1 and 2, respectively, while q1 and q2 are the numbers of informative features
Fig. 1Mean testing AUC as a function of the penalty ratio parameter κ for different simulation settings. The effect sizes and numbers of informative features are given in Table 1, Scenarios 1–5. Dots indicate the κ resulting in the maximum of mean testing AUC. Each analysis includes 200 samples per data set with 250 features per data type (N = 200 simulation replicates). When the number of informative features differs among the platforms (Scenarios 1–3), the multi-tuning parameter EN yields more predictive models comparing with the standard EN where κ=1. Differential penalization increased AUC the most when the effect sizes are smaller in the omic type with fewer informative features (Scenario 1)
Fig. 2Factors associated with the change of optimal penalty ratio parameter κ. a For the scenario with different numbers of informative features, q1 < q2, κ increases monotonically to 1 (i.e. less differential penalization) as the effect size in the first omic type increases. b For fixed effect sizes β1 and β2, κ becomes smaller (more differential penalization) as the number of informative features in the second omic type increases. c As the overall proportion of informative features of both types, , decreases relative to the total number of features, κ approaches 1, i.e. less differential penalization is required to maximize the AUC. Dots represent the optimal weights and caps represent the standard error of the mean; N = 200 simulated data sets
Fig. 3The AUC as a function of the penalty ratio parameters κ in cancer data sets. Solid line: AML data set; dashed line: Prostate cancer data set. The dark dots represent the κ that resulted in the maximum of mean test set AUC. For the AML data, a better classifier is obtained by adding relatively less penalty on the methylation features. For the prostate cancer data, there was very little difference in prediction performance between using data from a single platform or combining it