| Literature DB >> 23940698 |
Yao-Hwei Fang1, Yen-Feng Chiu.
Abstract
Advances in next-generation sequencing technologies have enabled the identification of multiple rare single nucleotide polymorphisms involved in diseases or traits. Several strategies for identifying rare variants that contribute to disease susceptibility have recently been proposed. An important feature of many of these statistical methods is the pooling or collapsing of multiple rare single nucleotide variants to achieve a reasonably high frequency and effect. However, if the pooled rare variants are associated with the trait in different directions, then the pooling may weaken the signal, thereby reducing its statistical power. In the present paper, we propose a backward support vector machine (BSVM)-based variant selection procedure to identify informative disease-associated rare variants. In the selection procedure, the rare variants are weighted and collapsed according to their positive or negative associations with the disease, which may be associated with common variants and rare variants with protective, deleterious, or neutral effects. This nonparametric variant selection procedure is able to account for confounding factors and can also be adopted in other regression frameworks. The results of a simulation study and a data example show that the proposed BSVM approach is more powerful than four other approaches under the considered scenarios, while maintaining valid type I errors.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23940698 PMCID: PMC3737136 DOI: 10.1371/journal.pone.0071114
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Type I error and power of the proposed BSVM approach, based on five weighting schemes at a nominal level of 0.05.
| Without Protective Variants | With Protective Variants | ||||||
| WeightingScheme | PAR | No. of Variants Risk/Protective/Neutral | Type I error | Power | No. of Variants Risk/Protective/Neutral | Type I error | Power |
| RM | 0 | 0/0/20 | 0.05 | – | 0/0/30 | 0.05 | – |
| 0.03 | 10/0/10 | – | 0.76 | 10/10/10 | – | 0.66 | |
| 0.05 | 10/0/10 | – | 0.99 | 10/10/10 | – | 1 | |
| RBt | 0 | 0/0/20 | 0.08 | – | 0/0/30 | 0.09 | – |
| 0.03 | 10/0/10 | – | 0.72 | 10/10/10 | – | 0.63 | |
| 0.05 | 10/0/10 | – | 0.97 | 10/10/10 | – | 0.99 | |
| WSt | 0 | 0/0/20 | 0.07 | – | 0/0/30 | 0.07 | – |
| 0.03 | 10/0/10 | – | 0.69 | 10/10/10 | – | 0.53 | |
| 0.05 | 10/0/10 | – | 0.93 | 10/10/10 | – | 0.93 | |
| Fp | 0 | 0/0/20 | 0.02 | – | 0/0/30 | 0.03 | – |
| 0.03 | 10/0/10 | – | 0.68 | 10/10/10 | – | 0.55 | |
| 0.05 | 10/0/10 | – | 0.88 | 10/10/10 | – | 0.88 | |
| EREC | 0 | 0/0/20 | 0.10 | – | 0/0/30 | 0.1 | – |
| 0.03 | 10/0/10 | – | 0.85 | 10/10/10 | – | 0.68 | |
| 0.05 | 10/0/10 | – | 1 | 10/10/10 | – | 1 | |
Weighting schema are: RM, risk measure (present study); RBt, replication-based test; WSt, weighted-sum test; Fp, score test with the weight function based on frequency estimates in the pooled sample; and EREC, score test with the weight function based on the EREC proposed by Lin and Tang. PAR, population-attributable risk.
Type I error for the five approaches.
| N | NominalLevel | Fp | EREC | WSt | RBt | BSVM |
| 1000 | 0.05 | 0.05 | 0.05 | 0.05 | 0.05 | 0.06 |
| 0.025 | 0.02 | 0.02 | 0.03 | 0.02 | 0.02 | |
| 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | |
| 2000 | 0.05 | 0.05 | 0.05 | 0.05 | 0.05 | 0.06 |
| 0.025 | 0.02 | 0.02 | 0.03 | 0.03 | 0.03 | |
| 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 |
Figure 1Power of the five approaches with equal PARs in the presence of different numbers of risk variants.
A. PAR = 0.03 with a sample size of 1000; B. PAR = 0.05 with a sample size of 1000; C. PAR = 0.03 with a sample size of 2000; D. PAR = 0.05 with a sample size of 2000. The nominal level is 0.05.
Figure 2Power of the five approaches with unequal PARs in the presence of different numbers of risk variants.
A. PAR = 0.03 with a sample size of 1000; B. PAR = 0.05 with a sample size of 1000; C. PAR = 0.03 with a sample size of 2000; D. PAR = 0.05 with a sample size of 2000. The nominal level is 0.05.
Figure 3Power of the five approaches with equal PARs in the presence of a mixture of risk, neutral, and protective variants.
The sample size is 1000. The nominal level is 0.05. A. PAR = 0.03; B. PAR = 0.05.
Identification of ten significant individual risk variants (P≤0.05) out of 183 rare variants using the SVM method.
| Variant ID | Avg. | Variant ID | Avg. |
| PAR = 0.03 | PAR = 0.05 | ||
| Variant 1 | 0.02799 | Variant 1 | 0.00841 |
| Variant 2 | 0.00510 | Variant 2 | 0.00351 |
| Variant 3 | 0.03591 | Variant 3 | 0.00108 |
| Variant 4 | 0.02532 | Variant 4 | 0.00071 |
| Variant 5 | 0.0259 | Variant 5 | 0.00108 |
| Variant 6 | 0.03815 | Variant 6 | 0.01925 |
| Variant 7 | 0.04197 | Variant 7 | 0.02854 |
| Variant 8 | 0.0151 | Variant 8 | 0.0083 |
| Variant 9 | 0.02398 | Variant 9 | 0.02105 |
| Variant 10 | 0.00471 | Variant 10 | 0.02978 |
Significance of the T1DM genes from the five methods.
| Average P-value | ||||||
| Gene | #SNVs | Fp | EREC | WSt | R | BSVM |
|
| 29 | 0.00494 | 0.00001 | 0.00028 | 0.00013 | 0.000006 |
|
| 45 | 0.24275 | 0.01101 | 0.02347 | 0.01548 | 0.008107 |
SNVs: single nucleotide variants.
Association analysis of four significant variants in IFIH1 gene from T1DM patients and controls.
| rs# or ss#(for new SNPs) | Location | Major allele | Minor allele | T1DM ChMA | Controls ChMA |
|
|
| Rare | SVM | Exact test | |||||
| rs35667974 | exon 14, I923V | A | g | 7/960 | 23/960 | 0.007 | 0.0049 |
| rs35337543 | intron 8, +1splice | G | c | 3/960 | 24/960 | <0.0001 | 0.000044 |
| ss107794690 | exon 11, T702I | C | t | 1/960 | 4/960 | 0.0716 | 0.37 |
| ss119336617 | exon 2,N160D | A | g | 0/960 | 2/960 | 0.2495 | 0.5 |
| Common | SVM |
| |||||
| rs1990760 | exon 15, T946A | A | g | 298/960 | 367/960 | 0.0025 | 0.00086 |
| rs3747517 | exon 13, A843H | G | a | 241/960 | 252/960 | 0.8069 | 0.58 |
ChMA, estimated fraction of chromosomes with minor alleles.