| Literature DB >> 27590269 |
Kukatharmini Tharmaratnam1, Matthew Sperrin2, Thomas Jaki1, Sjur Reppe3,4,5, Arnoldo Frigessi6.
Abstract
BACKGROUND: It is useful to incorporate biological knowledge on the role of genetic determinants in predicting an outcome. It is, however, not always feasible to fully elicit this information when the number of determinants is large. We present an approach to overcome this difficulty. First, using half of the available data, a shortlist of potentially interesting determinants are generated. Second, binary indications of biological importance are elicited for this much smaller number of determinants. Third, an analysis is carried out on this shortlist using the second half of the data.Entities:
Keywords: Bone mineral density; Elicitation; Lasso
Mesh:
Year: 2016 PMID: 27590269 PMCID: PMC5010709 DOI: 10.1186/s12859-016-1210-7
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Correlation structure for simulation study 1
| Covariates in true model | Biologically relevant variables | Correlation |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Average number and percentage of biologically relevant variables in the model with S N R=0.5 and (β =0.1,j=1,2,…20)
| Over 100 runs | Adaptive lasso | B1 | B2 | B3 |
|---|---|---|---|---|
| Average number of selected variables | 41 | 41 | 41 | 41 |
|
| ||||
| Average number of Biologically relevant variables | 12 | 20 | 15 | 14 |
| Average percentage of Biologically relevant variables | 29.3 % | 48.8 % | 36.6 % | 34.1 % |
| Standard deviation | 9.1 | 0.86 | 1.24 | 3.11 |
|
| ||||
| Average number of Biologically relevant variables | 12 | 38 | 34 | 29 |
| Average percentage of Biologically relevant variables | 29.3 % | 92.7 % | 82.9 % | 70.7 % |
| Standard deviation | 9.1 | 0.85 | 1.22 | 3.08 |
|
| ||||
| Average number of Biologically relevant variables | 12 | 39 | 35 | 30 |
| Average percentage of Biologically relevant variables | 29.3 % | 95.1 % | 85.4 % | 73.2 % |
| Standard deviation | 9.1 | 0.85 | 1.23 | 3.09 |
|
| ||||
| Average number of Biologically relevant variables | 12 | 39 | 36 | 31 |
| Average percentage of Biologically relevant variables | 29.3 % | 95.1 % | 87.8 % | 75.6 % |
| Standard deviation | 9.1 | 0.84 | 1.22 | 3.07 |
| Over 100 runs | Lasso | B1 | B2 | B3 |
| Average number of selected variables | 53 | 53 | 53 | 53 |
|
| ||||
| Average number of Biologically relevant variables | 12 | 24 | 18 | 16 |
| Average percentage of Biologically relevant variables | 22.6 % | 45.3 % | 34.0 % | 30.2 % |
| Standard deviation | 10.2 | 0.99 | 1.55 | 3.94 |
|
| ||||
| Average number of Biologically relevant variables | 12 | 46 | 39 | 34 |
| Average percentage of Biologically relevant variables (%) | 22.6 % | 86.8 % | 72.1 % | 64.2 % |
| Standard deviation | 10.2 | 0.97 | 1.53 | 3.89 |
|
| ||||
| Average number of Biologically relevant variables | 12 | 47 | 39 | 36 |
| Average percentage of Biologically relevant variables (%) | 22.6 % | 88.7 % | 73.6 % | 67.9 % |
| Standard deviation | 10.2 | 0.97 | 1.53 | 3.89 |
|
| ||||
| Average number of Biologically relevant variables | 12 | 48 | 40 | 37 |
| Average percentage of Biologically relevant variables (%) | 22.6 % | 90.6 % | 75.5 % | 69.8 % |
| Standard deviation | 10.2 | 0.96 | 1.51 | 3.86 |
Percentage and standard deviations are over 100 runs from data using correlation structure for simulation study 1 with different bag sizes q and correlation thresholds ρ from Adaptive lasso and Bag types B1, B2 and B3 based on Adaptive lasso selection and also from Lasso and Bag types B1, B2 and B3 based on lasso selection
Average number and percentage of biologically relevant variables in the model with S N R=0.5 and (β =0.1,j=1,2,…20)
| Over 100 runs | Adaptive lasso | B1 | B2 | B3 |
|---|---|---|---|---|
| Average number of selected variables | 41 | 41 | 41 | 41 |
| Average number of Biologically relevant variables | 12 | 38 | 34 | 29 |
| Average percentage of Biologically relevant variables (%) | 29.3 % | 92.7 % | 82.9 % | 70.7 % |
| Standard deviation | 9.1 | 0.85 | 1.22 | 3.08 |
| PMSE (absolute) | 1.148 | 1.145 | 1.143 | 1.154 |
| PRPMSE % | 100 % | 99.7 % | 99.6 % | 100.5 % |
| (St.dev) | (0.89) | (1.53) | (5.98) | |
| Favorable substitution % | 91 % | 78 % | 69 % | |
| (St.dev) | (0.99) | (1.07) | (6.93) | |
| MISE | 1.572 | 1.598 | 1.605 | 1.643 |
| Over 100 runs | Lasso | B1 | B2 | B3 |
| Average number of selected variables | 53 | 53 | 53 | 53 |
| Average number of Biologically relevant variables | 12 | 46 | 39 | 34 |
| Average percentage of Biologically relevant variables (%) | 22.6 % | 86.8 % | 72.1 % | 64.2 % |
| Standard deviation | 10.2 | 0.97 | 1.53 | 3.89 |
| PMSE (absolute) | 1.576 | 1.570 | 1.567 | 1.596 |
| PRPMSE % | 100 % | 99.6 % | 99.4 % | 101.3 % |
| (St.dev) | (0.97) | (1.81) | (6.71) | |
| Favorable substitution % | 88 % | 71 % | 63 % | |
| (St.dev) | (1.02) | (1.52) | (7.68) | |
| MISE | 1.876 | 1.914 | 1.927 | 1.941 |
Percentage and standard deviations are over 100 runs from data . The average of the PMSE and PRPMSE over 100 runs and the percentage of such runs for which the bootstrap 95 % CI includes 1 or less than 1 and mean integrated squared error (MISE), with S N R=0.5 and (β =0.1,j=1,2,…,20) from data using correlation structure for simulation study 1
Average number and percentage of biologically relevant variables in the model with S N R=2 and (β =0.1,j=1,2,…20)
| Over 100 runs | Adaptive lasso | B1 | B2 | B3 |
|---|---|---|---|---|
| Average number of selected variables | 44 | 44 | 44 | 44 |
| Average number of Biologically relevant variables | 16 | 42 | 41 | 37 |
| Average percentage of Biologically relevant variables (%) | 36.4 % | 95.5 % | 93.2 % | 84.1 % |
| Standard deviation | 9.01 | 0.92 | 1.11 | 3.91 |
| PMSE (absolute) | 1.895 | 1.878 | 1.880 | 1.901 |
| PRPMSE % | 100 % | 99.1 % | 99.2 % | 100.3 % |
| (St.dev) | (1.01) | (2.14) | (6.16) | |
| Favorable substitution % | 92 % | 78 % | 70 % | |
| (St.dev) | (0.99) | (1.37) | (8.92) | |
| MISE | 1.986 | 2.003 | 2.052 | 2.097 |
| Over 100 runs | Lasso | B1 | B2 | B3 |
| Average number of selected variables | 55 | 55 | 55 | 55 |
| Average number of Biologically relevant variables | 16 | 48 | 45 | 40 |
| Average percentage of Biologically relevant variables (%) | 22.8 % | 87.3 % | 81.8 % | 72.7 % |
| Standard deviation | 10.2 | 0.99 | 1.53 | 4.12 |
| PMSE (absolute) | 2.132 | 2.104 | 2.117 | 2.158 |
| PRPMSE % | 100 % | 98.7 % | 99.3 % | 101.2 % |
| (St.dev) | (1.11) | (2.31) | (6.89) | |
| Favorable substitution % | 89 % | 70 % | 68 % | |
| (St.dev) | (1.01) | (1.69) | (9.53) | |
| MISE | 2.234 | 2.298 | 2.306 | 2.342 |
Percentage and standard deviations are over 100 runs from data . The average of the PMSE and PRPMSE over 100 runs and the percentage of such runs for which the bootstrap 95 % CI includes 1 or less than 1 and mean integrated squared error (MISE), with S N R=2 and (β =0.1,j=1,2,…,20) from data using correlation structure for simulation study 1
Selected genes from lasso and each bag type (B1, B2, B3) based on biological reasons and PMSE and averaged PMSE over 100 bootstrap samples (B-PMSE) in the test data
| Lasso | Bag type B1 | Bag type B2 | Bag type B3 | |
|---|---|---|---|---|
| AK3L1 | ADAMTS2 | ADAMTS2 | RARA | |
| CCHCR1 | CCHCR1 | CCHCR1 | CCHCR1 | |
| CRYGS | PPARA | PPARA | ESR1 | |
| CSRNP3 | RUNX2 | RUNX2 | CSRNP3 | |
| FAF1 | FAF1 | FAF1 | BMPR2 | |
| FKBP14 | SPTBN1 | SPTBN1 | VDR | |
| FLRT2 | PDGFA | PDGFA | FLRT2 | |
| KDM4A | SLC44A1 | KDM4A7 | KDM4A | |
| LOC642852 | OSTM1 | OSTM1 | LOC642852 | |
| MAPK8 | BMP7 | BMP7 | WHAMML1 /// WHAMML2 | |
| NF1 | NF1 | NF1 | CSNK1G3 | |
| PIAS4 | ESR1 | ESR1 | PIAS4 | |
| PLIN5 | PLIN5 | HGF | RHO | |
| PPIL2 | BMP5 | BMP5 | SRGAP3 | |
| RNF31 | RNF31 | RNF31 | GLP1R G | |
| SRR | SMAD3 | SRR | SRR | |
| TRPS1 | SFRP1 | SFRP1 | TRPS1 | |
| ZMAT3 | ZMAT3 | ZMAT3 | ZMAT3 | |
| PMSE | 1.900 | 1.009 | 1.086 | 5.810 |
| B-PMSE | 1.871 | 1.001 | 1.024 | 5.053 |
Selected genes from adaptive lasso and each bag type (B1, B2, B3) based on biological reasons and PMSE and averaged PMSE over 100 bootstrap samples (B-PMSE) in the test data
| Adaptive lasso | Bag type B1 | Bag type B2 | Bag type B3 | |
|---|---|---|---|---|
| AK3L1 | ADAMTS2 | ADAMTS2 | RARA | |
| CSRNP3 | RUNX2 | RUNX2 | CSRNP3 | |
| FKBP14 | SPTBN1 | SPTBN1 | VDR | |
| NF1 | NF1 | NF1 | CSNK1G3 | |
| PIAS4 | ESR1 | ESR1 | PIAS4 | |
| PLIN5 | PLIN5 | HGF | RHO | |
| PPIL2 | BMP5 | BMP5 | SRGAP3 | |
| RNF31 | RNF31 | RNF31 | GLP1R G | |
| SRR | SMAD3 | SRR | SRR | |
| TRPS1 | SFRP1 | SFRP1 | TRPS1 | |
| ZMAT3 | ZMAT3 | ZMAT3 | ZMAT3 | |
| PMSE | 1.292 | 0.251 | 0.343 | 1.306 |
| B-PMSE | 1.074 | 0.242 | 0.298 | 1.286 |