| Literature DB >> 25763929 |
Lingtao Su1, Guixia Liu1, Han Wang2, Yuan Tian1, Zhihui Zhou1, Liang Han1, Lun Yan1.
Abstract
Single Nucleotide Polymorphisms (SNPs) found in Genome-Wide Association Study (GWAS) mainly influence the susceptibility of complex diseases, but they still could not comprehensively explain the relationships between mutations and diseases. Interactions between SNPs are considered so important for deeply understanding of those relationships that several strategies have been proposed to explore such interactions. However, part of those methods perform poorly when marginal effects of disease loci are weak or absent, others may lack of considering high-order SNPs interactions, few methods have achieved the requirements in both performance and accuracy. Considering the above reasons, not only low-order, but also high-order SNP interactions as well as main-effect SNPs, should be taken into account in detection methods under an acceptable computational complexity. In this paper, a new pairwise (or low-order) interaction detection method IG (Interaction Gain) is introduced, in which disease models are not required and parallel computing is utilized. Furthermore, high-order SNP interactions were proposed to be detected by finding closely connected function modules of the network constructed from IG detection results. Tested by a wide range of simulated datasets and four WTCCC real datasets, the proposed methods accurately detected both low-order and high-order SNP interactions as well as disease-associated main-effect SNPS and it surpasses all competitors in performances. The research will advance complex diseases research by providing more reliable SNP interactions.Entities:
Mesh:
Year: 2015 PMID: 25763929 PMCID: PMC4357495 DOI: 10.1371/journal.pone.0119146
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Details of real data sets got from WTCCC.
| Data Sets | SNP | Case/Control | Chromosome |
|---|---|---|---|
| BD (bipolar disorder) | 6865 | 1998/3000 | 21 |
| HT (hypertension) | 6207 | 2000/3000 | 22 |
| CAD (coronary artery disease) | 6114 | 1988/3000 | 19 |
| T2D (type 2 diabetes) | 4847 | 1999/3000 | 14 |
Fig 1Relationship between interaction gain value and SNP base pair distance.
(a) Interaction Gain value is calculated using SNPs all located in the area between genes. (b) Interaction Gain value is calculated using SNPs all located on the same gene. (c) Interaction Gain value is calculated using SNPs one locates on a gene and the other locates in the area between genes. (d) Interaction Gain value is calculated using SNPs all located on genes but on two different genes. We set α = 0.1 during all the computation.
Four models used for simulated data generation.
| Model 1 | bb | Bb | BB | Model 2 | bb | Bb | BB |
|---|---|---|---|---|---|---|---|
| aa | δ | δ | δ | aa | δ | δ (1+t) | δ (1+t) |
| Aa | δ | δ (1+t) | δ (1+t)2 | Aa | δ (1+t) | δ | δ |
| AA | δ | δ (1+t)2 | δ (1+t)4 | AA | δ (1+t) | δ | δ |
| Model 3 | bb | Bb | BB | Model 4 | bb | Bb | BB |
| aa | δ | δ | δ (1+t) | aa | δ | δ (1+t) | δ |
| Aa | δ | δ (1+t) | δ | Aa | δ (1+t) | δ | δ (1+t) |
| AA | δ (1+t) | δ | δ | AA | δ | δ (1+t) | δ |
δ represents the impact value of the genotype at SNP location when there is no epistasis between SNPs; t represents the change of impact value when there are interactions between SNPs.
Mapping results of all the SNPs researched.
| Data sets | Chromosome | All | On Gene | Between Gene | Gene |
|---|---|---|---|---|---|
| HT | 22 | 6207 | 2869 | 3338 | 330 |
| BD | 21 | 6865 | 3089 | 3776 | 280 |
| CAD | 19 | 6114 | 3507 | 2607 | 1028 |
| T2D | 14 | 4847 | 1804 | 3043 | 192 |
All: the number of SNPs considered; On Gene: the number of SNPs mapped on genes; Between Gene: the number of SNPs mapped between genes; Gene: the number of genes mapped by SNPs
Fig 2Node degree distribution of SNP-SNP interaction work constructed with α = 0.1.
Degree is calculated by counting the edges of a SNP in the network.
Fig 3Shortest path length distribution of SNP-SNP interaction network constructed with α = 0.1.
Shortest path means the shortest path between any two SNPs among all the possible paths. Frequency means how many SNP pairs have a certain shortest path length.
Fig 4Shortest path length distribution of PPI network.
Shortest path means the shortest path between any two proteins among all the possible paths. Frequency means how many protein pairs have a certain shortest path length.
Relationship between α, p-value, FDR using data set of T2D (type 2 diabetes).
| α | SNP Interaction Pair Number | p-value | Significant threshold (Bonferroni correction) | FDR |
|---|---|---|---|---|
| 0.001 | 895654 | 0.0682 | 2.37e-9 | 1.00 |
| 0.002 | 90160 | 0.0038 | 2.37e-9 | 0.89 |
| 0.003 | 34584 | 1.7e-4 | 2.37e-9 | 0.11 |
| 0.004 | 26764 | 6.87e-6 | 2.37e-9 | 0.0054 |
| 0.005 | 24264 | 2.37e-7 | 2.37e-9 | 2.06e-4 |
| 0.01 | 16911 | <2.37e-8 | 2.37e-9 | <0.1e-6 |
| 0.05 | 6456 | <2.37e-9 | 2.37e-9 | <0.1e-9 |
| 0.1 | 2789 | ≤2.37e-10 | 2.37e-9 | <0.1e-10 |
All the SNP interaction pairs in LD are deleted. There are 4595 SNPs in all.4595*4595 = 21114025 pairs of SNPs are tested under NULL hypothesis. The significant threshold after Bonferroni correction = 0.05/21114025 = 2.37e-9.
Fig 5The power comparison between BOOST, PLINK and our method (IG) based on interaction gain.
The power is calculated as the proportion of the 100 data sets in which the interactions of the disease-associated SNPs are detected. The absence of bars indicates no power. MAF means minor allele frequency.
Fig 6Comparison of the performance of IG, BOOST and SNPsyn in identifying SNPs associated with corresponding disease.
During comparison, top 150 SNPs predicted by RandomForest in each WTCCC dataset was used as reference datasets (details are shown in S2 File). The other three methods also took their top 150 most significant SNPs.
Fig 7The power comparison of BOOST, IG, SNPsyn in detecting disease associated SNP-SNP interaction pairs using data sets of T2D and CAD.
The power is defined as the ratio of disease related genes and all mapped genes.
Five SNP functional modules detected by MCODE using HT.
| SNP functional module ID | SNP members | Genes Involved | SNPs In PubMed | Functional module |
|---|---|---|---|---|
| 1 | rs2017874,rs738536,rs738387,rs8138930,rs4820483,rs738378,rs13433641,rs4822208,rs2273142 | PACSIN2, ARFGAP3 | rs738536(ID: 20018033) | See |
| 2 | rs9605422,rs2165971,rs2075453,rs4819644,rs2075444,rs1057721,rs2016042,rs11917,rs2075455,rs873387 | MICAL3 | no | See |
| 3 | rs7289941,rs1034589,rs2413035,rs5753659,rs2106294,rs5998067,rs6518752,rs9609297,rs7287267 | RNF185, LIMK2,SFI1,EIF4ENIF1 | rs2106294(ID: 21150874) | See |
| 4 | rs1569492,rs5757187,rs1946990,rs6519120,rs5757203,rs1056610,rs10135,rs5757133,rs138702,rs6001173,rs4820335,rs138703 | JOSD1, SUN2, TOMM22, LOC646851, DMC1 | rs5757133(ID: 20084279) | See |
| 5 | rs5770112,rs916251,rs2688171,rs2253004,rs2688155,rs5770111,rs2688148,rs5769491 | NULL | rs5770111(ID: 17357082) | See |
SNP members are the SNPs that falls into the same SNP complex. SNPs in the PubMed represent the SNPs in the SNP functional module that have been researched by other people.
Fig 8Relationship between α and node and edge number of the SNP interaction network constructed.
α is the threshold value.