| Literature DB >> 30255764 |
Burcu F Darst1, Kristen C Malecki1, Corinne D Engelman2.
Abstract
BACKGROUND: Random forest (RF) is a machine-learning method that generally works well with high-dimensional problems and allows for nonlinear relationships between predictors; however, the presence of correlated predictors has been shown to impact its ability to identify strong predictors. The Random Forest-Recursive Feature Elimination algorithm (RF-RFE) mitigates this problem in smaller data sets, but this approach has not been tested in high-dimensional omics data sets.Entities:
Keywords: Correlation; Epigenomics; Genetics; Genomics; High-dimensional data; Integration; Machine-learning; Methylation; Omics; Random forest; Recursive feature elimination
Mesh:
Substances:
Year: 2018 PMID: 30255764 PMCID: PMC6157185 DOI: 10.1186/s12863-018-0633-8
Source DB: PubMed Journal: BMC Genet ISSN: 1471-2156 Impact factor: 2.797
Summary statistics and variable importance rankings of simulated causal effects on TG
| RF* | RF-RFE | ||||||
|---|---|---|---|---|---|---|---|
| r2 | −0.00203 | 0.19217a | |||||
| MSEOOB | 0.07378 | 0.05948a | |||||
| Chr | Causal SNP/CpG | MAF or Mean (SD) | Simulated h2† | Main Effects, β(SE), | Interaction Effects, β(SE), | Rank (percentile rank) | |
| 1 | rs9661059 | 0.12 | 0.125 | .14 (0.02), | −0.19 (0.07), | 1 (100.0) | 20 (100.0) |
| cg00000363 | 0.49 (0.33) | – | −0.05 (0.03), 0.0873 | 8680 (97.6) | 239,755 (32.7) | ||
| 6 | rs736004 | 0.09 | 0.075 | 0.09 (0.03), | −0.30 (0.08), | 13,480 (96.2) | 766 (99.8) |
| cg10480950 | 0.54 (0.33) | – | −0.04 (0.03), 0.1711 | 5332 (98.5) | 232,579 (34.7) | ||
| 8 | rs1012116 | 0.20 | 0.100 | 0.08 (0.02), | −0.21 (0.06), | 50,218 (85.9) | 333,504 (6.4) |
| cg18772399 | 0.56 (0.33) | – | −0.002 (0.03), 0.9466 | 339,475 (4.7) | 301,855 (15.3) | ||
| 10 | rs10828412 | 0.14 | 0.025 | 0.07 (0.02), | −0.07 (0.07), 0.2770 | 2984 (99.2) | 330,516 (7.3) |
| cg00045910 | 0.49 (0.34) | – | 0.01 (0.03), 0.7176 | 263,465 (26.1) | 231,315 (35.1) | ||
| 17 | rs4399565 | 0.41 | 0.050 | 0.04 (0.02), | −0.13 (0.05), | 11,078 (96.9) | 196,276 (44.9) |
| cg01242676 | 0.46 (0.32) | – | 0.01 (0.03), 0.8159 | 350,420 (1.9) | 350,420 (1.7) | ||
Bolded values are significant at a p<0.05
*RF is the first RF in RF-RFE
ar2 and MSEOOB are averaged over all 324 RFs in the RF-RFE column
†Simulated h2 was provided by GAW20 organizers and based on full 200 simulations; main and interaction effects are calculated within the data set used for this study, which uses the 84th simulation replicate. Effects are calculated with linear regression models using the residual of change in TG after adjusting for baseline TG as the outcome. Interaction effects include the main effects of the interaction terms being tested in the given model
Abbreviations: β Effect size, Chr Chromosome, h2 Heritability
Fig. 1Regional association plots showing RF and RF-RFE importance rankings of causal and correlated SNPs (r2 > 0.10). The causal SNP in each plot is shown by the purple diamond, with the reference SNP number indicated above. A higher value on the y-axis indicates a higher importance score and better rank. a. RF importance rankings for chromosome 1. b. RF-RFE importance rankings for chromosome 1. c. RF importance rankings for chromosome 6. d. RF-RFE importance rankings for chromosome 6. e. RF importance rankings for chromosome 8. f. RF-RFE importance rankings for chromosome 8. g. RF importance rankings for chromosome 10. h. RF-RFE importance rankings for chromosome 10. i. RF importance rankings for chromosome 17. i. RF-RFE importance rankings for chromosome 17