| Literature DB >> 28327091 |
Sinan Abo Alchamlat1, Frédéric Farnir2.
Abstract
BACKGROUND: Finding epistatic interactions in large association studies like genome-wide association studies (GWAS) with the nowadays-available large volume of genomic data is a challenging and largely unsolved issue. Few previous studies could handle genome-wide data due to the intractable difficulties met in searching a combinatorial explosive search space and statistically evaluating epistatic interactions given a limited number of samples. Our work is a contribution to this field. We propose a novel approach combining K-Nearest Neighbors (KNN) and Multi Dimensional Reduction (MDR) methods for detecting gene-gene interactions as a possible alternative to existing algorithms, e especially in situations where the number of involved determinants is high. After describing the approach, a comparison of our method (KNN-MDR) to a set of the other most performing methods (i.e., MDR, BOOST, BHIT, MegaSNPHunter and AntEpiSeeker) is carried on to detect interactions using simulated data as well as real genome-wide data.Entities:
Keywords: Epistasis; Gene-gene interaction; Genome-wide association study; K-nearest neighbors; Multi dimensional reduction; Single nucleotide polymorphism
Mesh:
Year: 2017 PMID: 28327091 PMCID: PMC5361736 DOI: 10.1186/s12859-017-1599-7
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Simulation results when G = 2 and the number of cases and controls is 500
| Method | MDR | AntEpiSeeker | BOOST | MegaSNPHunter | KNN-MDR |
|---|---|---|---|---|---|
| Power | 0.68 | 0.88 | 0.76 | 0.84 | 0.81 |
| Corrected power | 0.56 | 0.39 | 0.48 | 0.20 | 0.71 |
Simulation results when G = 3 and the number of cases and controls is 500
| Method | MDR | AntEpiSeeker | BOOST | MegaSNPHunter | KNN-MDR |
|---|---|---|---|---|---|
| Power | N/A | 0.65 | 0.67 | 0.80 | 0.74 |
| Corrected power | N/A | 0.15 | 0.28 | 0.12 | 0.63 |
Tables 1 and 2 shows the results of 100 simulations. For KNN-MDR, the number of neighbors is set to 10, and the 1000 markers are split into 100 windows of 10 consecutive markers. All possible sets of up to 2 windows for Table 1 (5050 sets) and up to 3 windows for Table 2 (166750 sets) have been tested. Parameters for the other methods were set to default values. Due to the very large number of tests required when 3 markers are involved, MDR results have not been obtained in Table 2. Data sets used to generate these 2 tables are provided as Additional files 2 and 3
Simulation results when G = 0 and the number of cases and controls is 500
| Method | MDR | AntEpiSeeker | BOOST | MegaSNPHunter | KNN-MDR |
|---|---|---|---|---|---|
| Power ( | 0.18 | 0.45 | 0.19 | 0.38 | 0.07 |
The detection threshold α is set to 0.05. The data set used to generate this table is provided as Additional file 5
Power (above) and corrected power (below) when the parameters K (number of markers) and W (windows size) are varied in 100 simulations with 500 cases and 500 controls and G = 2
| W= | |||||
|---|---|---|---|---|---|
| 5 | 10 | 15 | 20 | ||
| K= | 5 | 71 | 68 | 62 | 52 |
| 65 | 62 | 51 | 38 | ||
| 10 | 70 | 66 | 64 | 56 | |
| 60 | 53 | 51 | 43 | ||
| 15 | 71 | 65 | 59 | 58 | |
| 59 | 49 | 47 | 44 | ||
| 20 | 69 | 60 | 56 | 53 | |
| 67 | 55 | 52 | 45 | ||
The data set used to generate this table is provided as Additional file 6
The 10 most significant results of the analysis on the RA dataset from WTCCC
| SNP | Position | Testing balanced accuracy |
|
|---|---|---|---|
| rs10979420, rs778980 | 9:108634242, 19:5863725 | 0.89 | 2.51*10-6 |
| rs10979420, rs778982 | 9:108634242, 19:5866574 | 0.89 | 2.51*10-6 |
| rs6781338, rs778982 | 3:180060018, 19:5866574 | 0.89 | 2.51*10-6 |
| rs778980, rs17325560 | 19:5863725, 20:2614933 | 0.89 | 2.51*10-6 |
| rs4979291, rs10979420 | 9:107732763, 9:108634242 | 0.89 | 2.51*10-6 |
| rs561259, rs10979420 | 2:79014325, 9:108634242 | 0.89 | 2.51*10-6 |
| rs1862333, rs17325560 | 5:181066946, 20:2614933 | 0.89 | 2.51*10-6 |
| rs1862333, rs485409 | 5:181066946,18:28918712 | 0.89 | 2.51*10-6 |
| rs571307, rs578044 | 13:29942173,18:28918696 | 0.89 | 2.51*10-6 |
| rs1169565, rs571307 | 2:71196518, 13:29942173 | 0.88 | 2.51*10-6 |
The first two columns provide the names and chromosomal positions of the SNP found to be associated to the phenotype. Positions are indicated by the chromosome and the SNP physical position on the chromosome using the NCBI human build 35. The third column contains the corresponding balanced accuracies and the last column reports the P-values computed using an adaptative permutation scheme. The complete table is provided as Additional file 7
Fig. 1Comparison of the inter-chromosomal interactions detected on the RA dataset by KNN-MDR and other interaction methods using this same dataset as example (Shchetynsky et al. [47]; Zhang et al. [46])