| Literature DB >> 21685105 |
Andre Altmann1, Peter Weber, Carina Quast, Monika Rex-Haffner, Elisabeth B Binder, Bertram Müller-Myhsok.
Abstract
MOTIVATION: High-throughput-sequencing (HTS) technologies are the method of choice for screening the human genome for rare sequence variants causing susceptibility to complex diseases. Unfortunately, preparation of samples for a large number of individuals is still very cost- and labor intensive. Thus, recently, screens for rare sequence variants were carried out in samples of pooled DNA, in which equimolar amounts of DNA from multiple individuals are mixed prior to sequencing with HTS. The resulting sequence data, however, poses a bioinformatics challenge: the discrimination of sequencing errors from real sequence variants present at a low frequency in the DNA pool.Entities:
Mesh:
Year: 2011 PMID: 21685105 PMCID: PMC3117388 DOI: 10.1093/bioinformatics/btr205
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Number of HTS reads along the analysis pipeline
| Pool name | Raw | After QC | For analysis |
|---|---|---|---|
| Cases 1 | 82.2 | 48.9 | 45.1 |
| Cases 2 | 77.5 | 48.6 | 45.4 |
| Controls 1 | 86.5 | 53.3 | 50.1 |
| Controls 2 | 81.7 | 49.2 | 45.8 |
Numbers are given in millions. QC, quality control.
Fig. 1.Statistical power of the Skellam and the Poisson distribution. (a) Statistical power of both models depending on the coverage with varying allele frequency and fixed error rate of 2.7×10−3. (b) Statistical power of both models depending on the coverage with varying error rate (noise) and fixed allele frequency of . (c) Statistical power on real data for the Skellam model (black solid line) using one controls and one cases pool and for the Poisson model separately on one cases (blue solid line) and one controls (orange dashed line) pool.
Fig. 2.Scatter plot between MAFs obtained by HTS and MALDI-TOF. (a) SNPs from different validation sets are represented by different symbols, and allele frequencies in the different DNA pools are color coded. (b) Like (a) but zoomed in on allele frequencies below 0.05.
Number of variant positions found in each pool
| Cases 1 | Cases 2 | Controls 1 | Controls 2 | Total | |
|---|---|---|---|---|---|
| vipR | – | 371 | |||
| CRISP | – | 9425 | |||
| Poisson | 656 | 644 | 701 | 606 | 1223 |
| VarScan | 6711 | 6993 | 7582 | 6715 | 9856 |
| vipR | – | 56 | |||
| CRISP | – | 29 | |||
| Poisson | 31 | 31 | 33 | 28 | 42 |
| VarScan | 63 | 54 | 75 | 64 | 100 |
The upper part of the table lists the number of SNPs found in the resequenced region. The lower part lists the number of small deletions identified in the same region.
Fig. 3.Runtime of variant calling algorithms on the TMEM132D dataset in dependence of the number of pools. Time was measured in seconds and assessed on a single Intel core at 2.67 GHz (and 6 GB memory).
Performance on 82 validated variant positions of set2
| TP | FP | TN | FN | Accuracy | Sensitivity | Specificity | Precision | |
|---|---|---|---|---|---|---|---|---|
| vipR | 39 | 11 | 24 | 8 | 0.77 | 0.83 | 0.69 | 0.78 |
| CRISP | 40 | 15 | 20 | 7 | 0.73 | 0.85 | 0.57 | 0.73 |
| Poisson | 30 | 14 | 21 | 17 | 0.62 | 0.64 | 0.60 | 0.68 |
| VarScan | 43 | 28 | 7 | 4 | 0.61 | 0.91 | 0.20 | 0.61 |
P, positives; N, negatives; TP, true positives; FP, false positives; TN, true negatives; FN, false negatives. accuracy, ; sensitivity, ; specificity, ; precision, .
DNA pool-wise performance on validated variant positions of set2
| Sensitivity | Specificity | Precision | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| vipR | CRISP | Poisson | VarScan | vipR | CRISP | Poisson | VarScan | vipR | CRISP | Poisson | VarScan | |
| Cases 1 | 0.77 | 0.77 | 0.72 | 0.81 | 0.89 | 0.79 | 0.89 | 0.59 | 0.77 | 0.63 | 0.75 | 0.48 |
| Cases 2 | 0.76 | 0.76 | 0.52 | 0.72 | 0.91 | 0.75 | 0.79 | 0.53 | 0.81 | 0.63 | 0.58 | 0.46 |
| Controls 1 | 0.81 | 0.81 | 0.56 | 0.85 | 0.95 | 0.75 | 0.89 | 0.55 | 0.88 | 0.61 | 0.71 | 0.48 |
| Controls 2 | 0.88 | 0.88 | 0.72 | 0.92 | 0.95 | 0.72 | 0.88 | 0.54 | 0.88 | 0.58 | 0.72 | 0.47 |
| Total | 0.80 | 0.80 | 0.60 | 0.82 | 0.92 | 0.75 | 0.86 | 0.55 | 0.83 | 0.61 | 0.68 | 0.47 |
See description of Table 3.