| Literature DB >> 26698561 |
Gang Wang1, Guifang Fu2, Christopher Corcoran3.
Abstract
BACKGROUND: Genome-wide association studies (GWAS) interrogate large-scale whole genome to characterize the complex genetic architecture for biomedical traits. When the number of SNPs dramatically increases to half million but the sample size is still limited to thousands, the traditional p-value based statistical approaches suffer from unprecedented limitations. Feature screening has proved to be an effective and powerful approach to handle ultrahigh dimensional data statistically, yet it has not received much attention in GWAS. Feature screening reduces the feature space from millions to hundreds by removing non-informative noise. However, the univariate measures used to rank features are mainly based on individual effect without considering the mutual interactions with other features. In this article, we explore the performance of a random forest (RF) based feature screening procedure to emphasize the SNPs that have complex effects for a continuous phenotype.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26698561 PMCID: PMC4690313 DOI: 10.1186/s12863-015-0294-9
Source DB: PubMed Journal: BMC Genet ISSN: 1471-2156 Impact factor: 2.797
The average rank of each causative feature, R , for Simulation 1 & 2
| Sim1 | Sim2 | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| METHOD | R1 | R2 | R3 | R4 | R5 | R1 | R2 | R3 | R4 | R5 |
| SIS | 12.21 | 1.56 | 1.51 | 143.14 | 322.16 | 359.17 | 360.41 | 398.89 | 340.45 | 428.30 |
| ISIS | 39.29 | 1.56 | 1.51 | 250.98 | 412.43 | 432.97 | 456.97 | 481.98 | 426.94 | 502.13 |
| CC-SIS | 12.81 | 1.59 | 1.48 | 60.31 | 179.77 | 168.57 | 242.27 | 242.85 | 258.39 | 369.68 |
| ICC-SIS | 43.75 | 1.59 | 1.48 | 129.80 | 259.34 | 237.70 | 362.12 | 382.58 | 368.86 | 400.27 |
| DC-SIS | 5.95 | 1.59 | 1.48 | 7.93 | 19.58 | 3.51 | 21.07 | 32.86 | 7.44 | 14.79 |
| RF | 8.63 | 1.91 | 1.67 | 3.72 | 4.06 | 2.80 | 8.59 | 10.70 | 4.66 | 7.85 |
The quantiles of M, for Simulation 1 & 2
| Sim1 | Sim2 | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| METHOD | 5 % | 25 % | 50 % | 75 % | 95 % | 5 % | 25 % | 50 % | 75 % | 95 % |
| SIS | 15.60 | 72.25 | 339.50 | 646.00 | 887.75 | 257.25 | 681.00 | 817.50 | 888.00 | 970.20 |
| ISIS | 14.65 | 331.75 | 597.50 | 756.25 | 958.00 | 555.85 | 766.75 | 875.00 | 954.75 | 986.15 |
| CC-SIS | 7.90 | 34.75 | 107.00 | 288.50 | 703.80 | 131.85 | 357.25 | 605.50 | 812.75 | 957.30 |
| ICC-SIS | 7.90 | 150.50 | 357.50 | 530.25 | 838.60 | 387.35 | 614.50 | 784.00 | 865.75 | 951.25 |
| DC-SIS | 5.00 | 6.00 | 8.00 | 16.25 | 55.20 | 7.00 | 16.50 | 31.00 | 66.50 | 152.60 |
| RF | 5.00 | 5.00 | 5.00 | 8.00 | 17.05 | 5.00 | 7.75 | 11.00 | 22.00 | 67.15 |
The overall and individual power, p and p , for Simulation 1 & 2
| Sim1 | Sim2 | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| d | METHOD |
|
|
|
|
|
|
|
|
|
|
|
|
| SIS | 0.97 | 0.97 | 0.97 | 0.48 | 0.09 | 0.08 | 0.09 | 0.04 | 0.01 | 0.01 | 0.01 | 0.00 | |
| ISIS | 0.95 | 0.95 | 0.95 | 0.41 | 0.11 | 0.06 | 0.09 | 0.02 | 0.01 | 0.03 | 0.01 | 0.00 | |
| CC-SIS | 0.95 | 0.95 | 0.95 | 0.61 | 0.17 | 0.15 | 0.34 | 0.13 | 0.09 | 0.07 | 0.01 | 0.00 | |
| 16 | ICC-SIS | 0.91 | 0.91 | 0.91 | 0.52 | 0.16 | 0.11 | 0.32 | 0.09 | 0.09 | 0.04 | 0.01 | 0.00 |
| DC-SIS | 0.99 | 0.99 | 0.99 | 0.92 | 0.79 | 0.77 | 0.95 | 0.67 | 0.56 | 0.81 | 0.72 | 0.30 | |
| RF | 0.93 | 0.93 | 0.93 | 0.93 | 0.93 | 0.93 | 0.99 | 0.88 | 0.84 | 0.95 | 0.89 | 0.67 | |
| SIS | 0.97 | 0.97 | 0.97 | 0.55 | 0.18 | 0.14 | 0.09 | 0.05 | 0.04 | 0.03 | 0.02 | 0.00 | |
| ISIS | 0.95 | 0.95 | 0.95 | 0.42 | 0.12 | 0.07 | 0.09 | 0.02 | 0.01 | 0.03 | 0.01 | 0.00 | |
| CC-SIS | 0.95 | 0.95 | 0.95 | 0.67 | 0.29 | 0.22 | 0.34 | 0.17 | 0.15 | 0.08 | 0.04 | 0.00 | |
| 32 | ICC-SIS | 0.91 | 0.91 | 0.91 | 0.55 | 0.22 | 0.14 | 0.32 | 0.09 | 0.10 | 0.05 | 0.01 | 0.00 |
| DC-SIS | 0.99 | 0.99 | 0.99 | 0.94 | 0.90 | 0.86 | 0.95 | 0.78 | 0.67 | 0.91 | 0.84 | 0.48 | |
| RF | 0.93 | 0.93 | 0.93 | 0.93 | 0.93 | 0.93 | 0.99 | 0.94 | 0.92 | 0.97 | 0.95 | 0.82 | |
| SIS | 0.97 | 0.97 | 0.97 | 0.57 | 0.20 | 0.16 | 0.09 | 0.05 | 0.04 | 0.03 | 0.02 | 0.00 | |
| ISIS | 0.95 | 0.95 | 0.95 | 0.42 | 0.13 | 0.07 | 0.09 | 0.02 | 0.02 | 0.03 | 0.01 | 0.00 | |
| CC-SIS | 0.95 | 0.95 | 0.95 | 0.74 | 0.35 | 0.30 | 0.34 | 0.19 | 0.17 | 0.08 | 0.05 | 0.00 | |
| 48 | ICC-SIS | 0.91 | 0.91 | 0.91 | 0.60 | 0.24 | 0.16 | 0.32 | 0.10 | 0.11 | 0.07 | 0.03 | 0.00 |
| DC-SIS | 0.99 | 0.99 | 0.99 | 0.97 | 0.96 | 0.94 | 0.95 | 0.85 | 0.75 | 0.94 | 0.92 | 0.64 | |
| RF | 0.93 | 0.93 | 0.93 | 0.93 | 0.93 | 0.93 | 0.99 | 0.96 | 0.95 | 0.99 | 0.95 | 0.88 |
The average rank of each causative feature, R , for Simulation 3 & 4
| Sim3 | Sim4 | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| METHOD |
|
|
|
|
|
|
|
|
|
|
| SIS | 1.00 | 2.00 | 3.00 | 160.15 | 379.58 | 262.87 | 369.88 | 392.07 | 363.33 | 494.10 |
| ISIS | 1.00 | 2.00 | 3.00 | 353.95 | 518.05 | 311.39 | 416.71 | 485.63 | 428.34 | 461.79 |
| CC-SIS | 1.00 | 2.00 | 3.00 | 140.46 | 376.82 | 26.58 | 155.47 | 199.26 | 269.65 | 409.99 |
| ICC-SIS | 1.00 | 2.00 | 3.00 | 285.10 | 429.93 | 44.73 | 305.71 | 316.91 | 344.90 | 417.34 |
| DC-SIS | 1.00 | 2.01 | 2.99 | 111.32 | 228.75 | 1.35 | 16.93 | 27.88 | 30.67 | 57.52 |
| RF | 1.00 | 2.01 | 3.14 | 59.87 | 107.06 | 1.25 | 6.58 | 13.66 | 14.98 | 26.18 |
The quantiles of M, for Simulation 3 & 4
| Sim3 | Sim4 | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| METHOD | 5 % | 25 % | 50 % | 75 % | 95 % | 5 % | 25 % | 50 % | 75 % | 95 % |
| SIS | 24.95 | 176.00 | 380.50 | 711.25 | 960.75 | 384.20 | 599.75 | 787.00 | 926.75 | 992.05 |
| ISIS | 225.30 | 425.00 | 624.00 | 827.00 | 956.05 | 361.70 | 688.25 | 796.50 | 917.00 | 983.05 |
| CC-SIS | 31.90 | 165.50 | 393.00 | 662.75 | 883.60 | 43.70 | 330.00 | 623.00 | 811.50 | 959.55 |
| ICC-SIS | 95.45 | 321.00 | 538.00 | 754.50 | 936.90 | 209.75 | 479.00 | 721.00 | 867.25 | 961.35 |
| DC-SIS | 15.00 | 57.50 | 205.50 | 445.25 | 692.85 | 10.00 | 29.00 | 61.50 | 115.00 | 228.30 |
| RF | 7.00 | 14.50 | 65.00 | 189.25 | 603.05 | 6.00 | 12.00 | 19.50 | 42.75 | 149.75 |
The overall and individual power, p and p , for Simulation 3 & 4
| Sim3 | Sim4 | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| d | METHOD |
|
|
|
|
|
|
|
|
|
|
|
|
| SIS | 1.00 | 1.00 | 1.00 | 0.41 | 0.03 | 0.02 | 0.22 | 0.08 | 0.02 | 0.03 | 0.03 | 0.00 | |
| ISIS | 1.00 | 1.00 | 1.00 | 0.31 | 0.01 | 0.00 | 0.16 | 0.06 | 0.03 | 0.03 | 0.02 | 0.00 | |
| CC-SIS | 1.00 | 1.00 | 1.00 | 0.37 | 0.04 | 0.03 | 0.86 | 0.30 | 0.22 | 0.06 | 0.07 | 0.00 | |
| 16 | ICC-SIS | 1.00 | 1.00 | 1.00 | 0.27 | 0.02 | 0.02 | 0.83 | 0.24 | 0.16 | 0.06 | 0.05 | 0.00 |
| DC-SIS | 1.00 | 1.00 | 1.00 | 0.42 | 0.09 | 0.09 | 1.00 | 0.73 | 0.66 | 0.58 | 0.35 | 0.11 | |
| RF | 1.00 | 1.00 | 1.00 | 0.60 | 0.37 | 0.32 | 1.00 | 0.94 | 0.83 | 0.79 | 0.69 | 0.49 | |
| SIS | 1.00 | 1.00 | 1.00 | 0.48 | 0.08 | 0.06 | 0.22 | 0.08 | 0.04 | 0.04 | 0.03 | 0.00 | |
| ISIS | 1.00 | 1.00 | 1.00 | 0.32 | 0.02 | 0.00 | 0.16 | 0.06 | 0.03 | 0.03 | 0.02 | 0.00 | |
| CC-SIS | 1.00 | 1.00 | 1.00 | 0.50 | 0.08 | 0.06 | 0.86 | 0.40 | 0.31 | 0.14 | 0.11 | 0.02 | |
| 32 | ICC-SIS | 1.00 | 1.00 | 1.00 | 0.30 | 0.05 | 0.02 | 0.83 | 0.27 | 0.18 | 0.12 | 0.08 | 0.00 |
| DC-SIS | 1.00 | 1.00 | 1.00 | 0.54 | 0.22 | 0.16 | 1.00 | 0.87 | 0.73 | 0.76 | 0.55 | 0.28 | |
| RF | 1.00 | 1.00 | 1.00 | 0.66 | 0.47 | 0.37 | 1.00 | 0.97 | 0.93 | 0.88 | 0.80 | 0.66 | |
| SIS | 1.00 | 1.00 | 1.00 | 0.54 | 0.12 | 0.08 | 0.22 | 0.08 | 0.04 | 0.04 | 0.04 | 0.00 | |
| ISIS | 1.00 | 1.00 | 1.00 | 0.32 | 0.03 | 0.00 | 0.16 | 0.06 | 0.04 | 0.03 | 0.02 | 0.00 | |
| CC-SIS | 1.00 | 1.00 | 1.00 | 0.54 | 0.13 | 0.08 | 0.86 | 0.46 | 0.33 | 0.20 | 0.12 | 0.04 | |
| 48 | ICC-SIS | 1.00 | 1.00 | 1.00 | 0.33 | 0.08 | 0.02 | 0.83 | 0.28 | 0.19 | 0.14 | 0.10 | 0.01 |
| DC-SIS | 1.00 | 1.00 | 1.00 | 0.60 | 0.28 | 0.22 | 1.00 | 0.91 | 0.81 | 0.81 | 0.65 | 0.40 | |
| RF | 1.00 | 1.00 | 1.00 | 0.71 | 0.58 | 0.41 | 1.00 | 0.99 | 0.94 | 0.94 | 0.88 | 0.77 |
Fig. 1PVIM based Manhattan Plot. Variable importance measure of SNPs obtained from RF for the NMRI mice HDL cholesterol GWA study. Each color corresponds to one chromosome