| Literature DB >> 23441864 |
Bernard J Pope1, Tú Nguyen-Dumont, Fabrice Odefrey, Fleur Hammet, Russell Bell, Kayoko Tao, Sean V Tavtigian, David E Goldgar, Andrew Lonie, Melissa C Southey, Daniel J Park.
Abstract
BACKGROUND: Characterising genetic diversity through the analysis of massively parallel sequencing (MPS) data offers enormous potential to significantly improve our understanding of the genetic basis for observed phenotypes, including predisposition to and progression of complex human disease. Great challenges remain in resolving genetic variants that are genuine from the millions of artefactual signals.Entities:
Mesh:
Year: 2013 PMID: 23441864 PMCID: PMC3599469 DOI: 10.1186/1471-2105-14-65
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Standard bioinformatics pipeline for the analysis of rare single nucleotide variants (SNVs). FAVR methods can be applied to assist the shortlisting of SNVs by using the PE Bias Detector tool, the Rare and True Filter and the Family Annotate Tool. FAVR integrates seamlessly into standard pipelines as it uses widely used file formats (BAM and vcf format files).
Figure 2Number of rare SNVs remaining after different stages of SOLiD and TruSeq sequencing data filtering. Mean (of first cousin pairs) number of rare SNVs remaining: (A) without any further filtering and using the PE Bias Detector Tool only, the Rare and True Filter only, or both tools in five families in which sequencing was performed using the SOLiD chemistry, and (B) without any FAVR filtering or using the Rare and True Filter in five families in which sequencing was carried out using the TruSeq chemistry (see Results and discussion). Data were processed according to Pre-FAVR bioinformatic processing and further FAVR filtering was applied as described in FAVR bioinformatic processing (see Methods).
Figure 3Observed/Expected (O/E) number of shared SNVs after different stages of SOLiD and TruSeq sequencing data filtering. Mean (across families) O/E number of shared SNVs, assuming first-cousins share on average 12.5% of their DNA: (A) without further filtering, using the PE Bias Detector Tool only, the Rare and True Filter only, or both tools in five families in which sequencing was conducted using the SOLiD chemistry, and (B) without further filtering or using the Rare and True Filter in five families in which sequencing was performed using the TruSeq chemistry (see Results and discussion). Error bars indicate 95% confidence intervals. Data were processed according to Pre-FAVR bioinformatic processing then further FAVR filtering was applied as described in FAVR bioinformatic processing (see Methods).
Figure 4Distribution of VQSLOD scores derived from the Variant Quality Score Recalibrator (GATK). VQSLOD scores were obtained without FAVR processing and for FAVR-kept and FAVR-removed SNV signals after processing of SOLiD sequencing data (A) and TruSeq sequencing data (B), according to METHODS.