| Literature DB >> 21813454 |
Zhi Wei1, Wei Wang, Pingzhao Hu, Gholson J Lyon, Hakon Hakonarson.
Abstract
We develop a statistical tool SNVer for calling common and rare variants in analysis of pooled or individual next-generation sequencing (NGS) data. We formulate variant calling as a hypothesis testing problem and employ a binomial-binomial model to test the significance of observed allele frequency against sequencing error. SNVer reports one single overall P-value for evaluating the significance of a candidate locus being a variant based on which multiplicity control can be obtained. This is particularly desirable because tens of thousands loci are simultaneously examined in typical NGS experiments. Each user can choose the false-positive error rate threshold he or she considers appropriate, instead of just the dichotomous decisions of whether to 'accept or reject the candidates' provided by most existing methods. We use both simulated data and real data to demonstrate the superior performance of our program in comparison with existing methods. SNVer runs very fast and can complete testing 300 K loci within an hour. This excellent scalability makes it feasible for analysis of whole-exome sequencing data, or even whole-genome sequencing data using high performance computing cluster. SNVer is freely available at http://snver.sourceforge.net/.Entities:
Mesh:
Year: 2011 PMID: 21813454 PMCID: PMC3201884 DOI: 10.1093/nar/gkr599
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Summary of T1D and Autism pooled sequencing and ADHD individual sequencing data sets
| Disease | Platform | Total reads | Reads length | #Pool | #Individual per pool | Region | Coverage per individual | |
|---|---|---|---|---|---|---|---|---|
| Case | Ctrl | |||||||
| Autism | SOLiD | ~402 M | 50 bp | 11 | 12 | 6 | ~503 kb | ~90× |
| T1D | 454 | ~9.4 M | ~250 bp | 10 | 10 | 48 | ~31 kb | ~80× |
| ADHD | Illumina | ~57 M | 76 bp × 2 | three individuals | ~38 Mb | ~20× | ||
Figure 1.Power (PW) and Type I error rate (Err) of SNVer using single-pool data at low (10×) and high (30×) coverage.
Figure 2.Power (PW) and Type I error rate (Err) of SNVer using multiple-pool data at low (10×) and high (30×) coverage.
Figure 3.Ranking efficiency of the binomial models employed by SNVer versus the Fisher's exact test employed by CRISP.
Figure 4.Correlation between the minor allele frequencies and its estimates in pooled sequencing.
Comparison of SNP calling by CRISP, SAMtools, GATK and SNVer
| Data | No. of SNP | Ti/Tv | Concordance | |||||
|---|---|---|---|---|---|---|---|---|
| All | Known | Novel | dbSNP% | All | Known | Novel | TP/P (%) | |
| Autism (pooled) | ||||||||
| Case | ||||||||
| CRISP | 2182 | 1791 | 391 | 82.1 | 1.68 | 1.79 | 1.26 | 101/101 (100) |
| SNVer | 2182 | 1795 | 387 | 82.3 | 1.71 | 1.81 | 1.35 | 102/102 (100) |
| SAMtools | 261 | 260 | 1 | 99.6 | 2.26 | 2.29 | 0/1 | 16/16 (100) |
| Control | ||||||||
| CRISP | 2063 | 1610 | 453 | 78.0 | 1.68 | 1.83 | 1.27 | 96/96 (100) |
| SNVer | 2063 | 1617 | 446 | 78.4 | 1.78 | 1.89 | 1.45 | 95/95 (100) |
| SAMtools | 239 | 238 | 1 | 99.6 | 2.06 | 2.05 | 1/0 | 16/16 (100) |
| T1D (pooled) | ||||||||
| Case | ||||||||
| CRISP | 306 | 93 | 213 | 30.3 | 0.95 | 2.58 | 0.63 | N/A |
| SNVer | 306 | 126 | 180 | 41.2 | 1.71 | 2.15 | 1.47 | |
| SAMtools | 14 | 9 | 5 | 64.3 | 10/4 | 8/1 | 2/3 | |
| Control | ||||||||
| CRISP | 167 | 110 | 57 | 65.9 | 1.49 | 2.93 | 0.46 | |
| SNVer | 167 | 120 | 47 | 71.9 | 2.34 | 3.00 | 1.35 | |
| SAMtools | 18 | 12 | 6 | 66.7 | 14/4 | 11/1 | 3/3 | |
| ADHD (Individual) | ||||||||
| 84 060 | ||||||||
| SNVer | 18 001 | 17 535 | 466 | 97.4 | 2.89 | 2.89 | 2.73 | 4158/4183 (99.4) |
| SAMtools | 48 988 | 47 513 | 1475 | 97.0 | 2.66 | 2.68 | 2.16 | 4437/8116 (54.7) |
| SAMtools | 15 038 | 14 538 | 500 | 96.7 | 2.70 | 2.72 | 2.11 | 2034/3158 (64.4) |
| GATK | 19 655 | 19 713 | 482 | 97.6 | 2.91 | 2.94 | 2.15 | 4649/4657 (99.8) |
| 84 615 | ||||||||
| SNVer | 17 436 | 16 914 | 522 | 97.0 | 2.85 | 2.87 | 2.22 | 4032/4063 (99.2) |
| SAMtools | 46 037 | 44 489 | 1548 | 96.6 | 2.64 | 2.67 | 1.94 | 4173/7643 (54.4) |
| SAMtools | 15 510 | 14 942 | 568 | 96.3 | 2.74 | 2.77 | 2.02 | 2062/3247 (63.5) |
| GATK | 18 892 | 18 419 | 473 | 97.5 | 2.89 | 2.92 | 2.03 | 4537/4566 (99.4) |
| 92 157 | ||||||||
| SNVer | 18 676 | 18 208 | 468 | 97.5 | 2.90 | 2.92 | 2.37 | 4192/4224 (99.2) |
| SAMtools | 49 729 | 47 693 | 2036 | 95.9 | 2.69 | 2.73 | 2.03 | 4251/7996 (53.2) |
| SAMtools | 15 881 | 15 370 | 511 | 96.8 | 2.80 | 2.83 | 1.99 | 2028/3259 (62.2) |
| GATK | 20 100 | 19 631 | 469 | 97.7 | 2.98 | 3.00 | 2.35 | 4700/4710 (99.8) |
aTransition and transversion ratio for the identified variants. When the number of variants is small we just report the numbers but not calculate the ratio, e.g. 10/4 for all variants in T1D case by SAMtools means 10 transitions and 4 transversions.
bGenotype concordance. P represents the number of variants called by each program and also genotyped. TP represents the number of variant calls concordant between sequencing data and individual genotyping data.
20×: Additional filtering of sequencing depth ≥20 is applied.
Figure 5.Correlation between alternate allele frequencies in individually genotyped DNA samples and its estimates in the sequenced DNA pools for the Autism data set. Different symbols represent different depth of coverage ranges as shown in the legend.
Figure 6.(a and b) Comparison of running time of SNVer and CRISP for testing testing (a) the T1D 31 kb region and (b) the Autism 503 kb region. Running time of SNVer is mainly determined by the region size (the number of tests), while larger pool numbers and sequencing depth will take additional time for CRISP.
Informative rankings of four rare variants with the null hypothesis θ ≤ θ0 = 0.01
| SNP | T1D case | T1D control | ||||
|---|---|---|---|---|---|---|
| Estimated MAF (%) | SNVer ranking | CRISP CALL | Estimated MAF (%) | SNVer ranking | CRISP CALL | |
| rs35337543 | 0.36 | 17 557 | Y | 2.51 | 45 | Y |
| rs35667974 | 0.72 | 17 557 | N | 2.42 | 59 | Y |
| ss107794688 | 0.50 | 17 557 | Y | 1.79 | 56 | Y |
| ss107794687 | 1.07 | 145 | Y | 2.45 | 51 | Y |