| Literature DB >> 25078893 |
Mehdi Pirooznia, Melissa Kramer, Jennifer Parla, Fernando S Goes, James B Potash, W Richard McCombie, Peter P Zandi1.
Abstract
BACKGROUND: The processing and analysis of the large scale data generated by next-generation sequencing (NGS) experiments is challenging and is a burgeoning area of new methods development. Several new bioinformatics tools have been developed for calling sequence variants from NGS data. Here, we validate the variant calling of these tools and compare their relative accuracy to determine which data processing pipeline is optimal.Entities:
Mesh:
Year: 2014 PMID: 25078893 PMCID: PMC4129436 DOI: 10.1186/1479-7364-8-14
Source DB: PubMed Journal: Hum Genomics ISSN: 1473-9542 Impact factor: 4.639
Figure 1Modular structure of pipeline for processing next-generation sequencing data. The pipeline contains 4 modules: (1) mapping, (2) filtering, (3) realignment/recalibration, and (4) variant calling. Detailed description is available at http://metamoodics.org/wes.
Figure 2Comparison of SNV calling using SAMtools with and without realignment/recalibration on a sample of 30 subjects. Sanger sequencing was performed to evaluate the accuracy of these calls.
Figure 3Comparison of SNVs calls from GATK versus SAMtools using data from 30 subjects. For these comparisons, we used the UnifiedGenotyper algorithm in GATK and mpileup in SAMtools. Sanger sequencing was performed to evaluate the accuracy of these calls.
UnifiedGenotyper Variant Quality Score Recalibration (UGVR) versus Hard Filter (UGHF)
| | ||||
|---|---|---|---|---|
| UGVR | AA | 513,601 | 31 [5, 0, 0, 26] | 49 [0, 0, 0, 49] |
| AB | 0 | 296,714 | 0 | |
| BB | 0 | 1,235 [0, 6, 1,222, 7] | 170,818 | |
UnifiedGenotyper Variant Quality Score Recalibration (UGVR) versus HaplotypeCaller Variant Quality Score Recalibration (HCVR)
| | ||||
|---|---|---|---|---|
| HCVR | AA | 510,296 | 194 [176, 17, 0, 1] | [0, 0, 0, 0] |
| AB | 196 [60, 133, 2, 1] | 294,595 | 210 [0, 5, 204, 1] | |
| BB | 5 [0, 0, 5, 0] | 230 [0, 10, 219, 1] | 171,086 | |
Figure 4Evaluation of the effect of sequencing parameters. Read depth (A), allele balance (B), and mapping quality (C) on the calling accuracy. We compared the accuracy and missing data rates of the sequencing calls after systematically varying these parameters using data from 100 subjects with valid genotype data from GWAS on 7,370 SNPs.