| Literature DB >> 22373354 |
Alexander Luedtke1, Scott Powers2, Ashley Petersen3, Alexandra Sitarik4, Airat Bekmetjev5, Nathan L Tintle5.
Abstract
A number of rare variant statistical methods have been proposed for analysis of the impending wave of next-generation sequencing data. To date, there are few direct comparisons of these methods on real sequence data. Furthermore, there is a strong need for practical advice on the proper analytic strategies for rare variant analysis. We compare four recently proposed rare variant methods (combined multivariate and collapsing, weighted sum, proportion regression, and cumulative minor allele test) on simulated phenotype and next-generation sequencing data as part of Genetic Analysis Workshop 17. Overall, we find that all analyzed methods have serious practical limitations on identifying causal genes. Specifically, no method has more than a 5% true discovery rate (percentage of truly causal genes among all those identified as significantly associated with the phenotype). Further exploration shows that all methods suffer from inflated false-positive error rates (chance that a noncausal gene will be identified as associated with the phenotype) because of population stratification and gametic phase disequilibrium between noncausal SNPs and causal SNPs. Furthermore, observed true-positive rates (chance that a truly causal gene will be identified as significantly associated with the phenotype) for each of the four methods was very low (<19%). The combination of larger than anticipated false-positive rates, low true-positive rates, and only about 1% of all genes being causal yields poor discriminatory ability for all four methods. Gametic phase disequilibrium and population stratification are important areas for further research in the analysis of rare variant data.Entities:
Year: 2011 PMID: 22373354 PMCID: PMC3287843 DOI: 10.1186/1753-6561-5-S9-S119
Source DB: PubMed Journal: BMC Proc ISSN: 1753-6561
Overall ability of the four rare variant methods to identify genes as significantly associated with the phenotype
| Method | Nominal | Nominal | ||||
|---|---|---|---|---|---|---|
| Total number of significant associations | Number of significant associations that are actually causal | True discoveries (%) | Total number of significant associations | Number of significant associations that are actually causal | True discoveries (%) | |
| Only SNPs with MAF < 5% | ||||||
| WS | 281.0 | 5.69 | 2.03 | 52.5 | 1.56 | 2.97 |
| CMAT | 201.4 | 4.42 | 2.19 | 38.4 | 1.31 | 3.44 |
| CMC | 256.5 | 3.80 | 1.48 | 38.2 | 0.92 | 2.41 |
| PR | 184.6 | 4.46 | 2.41 | 27.9 | 1.17 | 4.20 |
| All SNPs | ||||||
| WS | 348.7 | 6.81 | 1.95 | 76.1 | 2.05 | 2.69 |
| CMAT | 294.9 | 4.63 | 1.57 | 63.8 | 1.46 | 2.28 |
| CMC | 361.1 | 5.16 | 1.43 | 64.6 | 1.23 | 1.90 |
| PR | 285.6 | 4.74 | 1.66 | 53.7 | 1.39 | 2.59 |
| Nonsynonymous SNPs only | ||||||
| WS | 206.1 | 5.25 | 2.54 | 42.3 | 1.78 | 4.20 |
| CMAT | 173.3 | 4.19 | 2.42 | 35.7 | 1.44 | 4.03 |
| CMC | 223.3 | 4.99 | 2.23 | 38.5 | 1.33 | 3.46 |
| PR | 168.7 | 3.88 | 2.30 | 29.4 | 1.38 | 4.69 |
All values are averaged over 200 replicates. WS, weighted sum; CMAT, cumulative minor allele test; CMC, combined multivariate and collapsing; PR, proportion regression.
Overall false- and true-positive rates for the four rare variant methods (significance level 5%)
| Rare variant method | False-positive rate (%) | True-positive rate (%) | ||||
|---|---|---|---|---|---|---|
| Only SNPs with MAF < 5% | All SNPs | Only nonsynonymous SNPs | Only SNPs with MAF < 5% | All SNPs | Only nonsynonymous SNPs | |
| WS | 9.7 | 10.8 | 9.3 | 15.8 | 18.9 | 14.6 |
| CMAT | 6.9 | 9.2 | 7.8 | 12.3 | 12.8 | 11.6 |
| CMC | 8.9 | 11.2 | 10.1 | 10.6 | 14.3 | 13.9 |
| PR | 6.3 | 8.9 | 7.6 | 12.4 | 13.2 | 10.8 |
All values are averaged over 200 replicates. WS, weighted sum; CMAT, cumulative minor allele test; CMC, combined multivariate and collapsing; PR, proportion regression.