Literature DB >> 24974202

Toward better understanding of artifacts in variant calling from high-coverage samples.

Heng Li1.   

Abstract

MOTIVATION: Whole-genome high-coverage sequencing has been widely used for personal and cancer genomics as well as in various research areas. However, in the lack of an unbiased whole-genome truth set, the global error rate of variant calls and the leading causal artifacts still remain unclear even given the great efforts in the evaluation of variant calling methods.
RESULTS: We made 10 single nucleotide polymorphism and INDEL call sets with two read mappers and five variant callers, both on a haploid human genome and a diploid genome at a similar coverage. By investigating false heterozygous calls in the haploid genome, we identified the erroneous realignment in low-complexity regions and the incomplete reference genome with respect to the sample as the two major sources of errors, which press for continued improvements in these two areas. We estimated that the error rate of raw genotype calls is as high as 1 in 10-15 kb, but the error rate of post-filtered calls is reduced to 1 in 100-200 kb without significant compromise on the sensitivity.
AVAILABILITY AND IMPLEMENTATION: BWA-MEM alignment and raw variant calls are available at http://bit.ly/1g8XqRt scripts and miscellaneous data at https://github.com/lh3/varcmp. CONTACT: hengli@broadinstitute.org SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

Entities:  

Mesh:

Year:  2014        PMID: 24974202      PMCID: PMC4271055          DOI: 10.1093/bioinformatics/btu356

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  43 in total

1.  Mapping short DNA sequencing reads and calling variants using mapping quality scores.

Authors:  Heng Li; Jue Ruan; Richard Durbin
Journal:  Genome Res       Date:  2008-08-19       Impact factor: 9.043

2.  A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data.

Authors:  Heng Li
Journal:  Bioinformatics       Date:  2011-09-08       Impact factor: 6.937

Review 3.  The role of replicates for error mitigation in next-generation sequencing.

Authors:  Kimberly Robasky; Nathan E Lewis; George M Church
Journal:  Nat Rev Genet       Date:  2013-12-10       Impact factor: 53.242

4.  Inference of human population history from individual whole-genome sequences.

Authors:  Heng Li; Richard Durbin
Journal:  Nature       Date:  2011-07-13       Impact factor: 49.962

5.  The diploid genome sequence of an individual human.

Authors:  Samuel Levy; Granger Sutton; Pauline C Ng; Lars Feuk; Aaron L Halpern; Brian P Walenz; Nelson Axelrod; Jiaqi Huang; Ewen F Kirkness; Gennady Denisov; Yuan Lin; Jeffrey R MacDonald; Andy Wing Chun Pang; Mary Shago; Timothy B Stockwell; Alexia Tsiamouri; Vineet Bafna; Vikas Bansal; Saul A Kravitz; Dana A Busam; Karen Y Beeson; Tina C McIntosh; Karin A Remington; Josep F Abril; John Gill; Jon Borman; Yu-Hui Rogers; Marvin E Frazier; Stephen W Scherer; Robert L Strausberg; J Craig Venter
Journal:  PLoS Biol       Date:  2007-09-04       Impact factor: 8.029

6.  A simple consensus approach improves somatic mutation prediction accuracy.

Authors:  David L Goode; Sally M Hunter; Maria A Doyle; Tao Ma; Simone M Rowley; David Choong; Georgina L Ryland; Ian G Campbell
Journal:  Genome Med       Date:  2013-09-30       Impact factor: 11.117

7.  An integrated map of genetic variation from 1,092 human genomes.

Authors:  Goncalo R Abecasis; Adam Auton; Lisa D Brooks; Mark A DePristo; Richard M Durbin; Robert E Handsaker; Hyun Min Kang; Gabor T Marth; Gil A McVean
Journal:  Nature       Date:  2012-11-01       Impact factor: 49.962

8.  A comparative analysis of algorithms for somatic SNV detection in cancer.

Authors:  Nicola D Roberts; R Daniel Kortschak; Wendy T Parker; Andreas W Schreiber; Susan Branford; Hamish S Scott; Garique Glonek; David L Adelson
Journal:  Bioinformatics       Date:  2013-07-09       Impact factor: 6.937

9.  Evaluation of next generation sequencing platforms for population targeted sequencing studies.

Authors:  Olivier Harismendy; Pauline C Ng; Robert L Strausberg; Xiaoyun Wang; Timothy B Stockwell; Karen Y Beeson; Nicholas J Schork; Sarah S Murray; Eric J Topol; Samuel Levy; Kelly A Frazer
Journal:  Genome Biol       Date:  2009-03-27       Impact factor: 13.583

10.  Variant callers for next-generation sequencing data: a comparison study.

Authors:  Xiangtao Liu; Shizhong Han; Zuoheng Wang; Joel Gelernter; Bao-Zhu Yang
Journal:  PLoS One       Date:  2013-09-27       Impact factor: 3.240

View more
  338 in total

1.  FermiKit: assembly-based variant calling for Illumina resequencing data.

Authors:  Heng Li
Journal:  Bioinformatics       Date:  2015-07-27       Impact factor: 6.937

2.  Germline PARP4 mutations in patients with primary thyroid and breast cancers.

Authors:  Yuji Ikeda; Kazuma Kiyotani; Poh Yin Yew; Taigo Kato; Kenji Tamura; Kai Lee Yap; Sarah M Nielsen; Jessica L Mester; Charis Eng; Yusuke Nakamura; Raymon H Grogan
Journal:  Endocr Relat Cancer       Date:  2015-12-23       Impact factor: 5.678

3.  Insertions and Deletions Target Lineage-Defining Genes in Human Cancers.

Authors:  Marcin Imielinski; Guangwu Guo; Matthew Meyerson
Journal:  Cell       Date:  2017-01-12       Impact factor: 41.582

4.  Unique bioinformatic approach and comprehensive reanalysis improve diagnostic yield of clinical exomes.

Authors:  Klaus Schmitz-Abe; Qifei Li; Samantha M Rosen; Neeharika Nori; Jill A Madden; Casie A Genetti; Monica H Wojcik; Sadhana Ponnaluri; Cynthia S Gubbels; Jonathan D Picker; Anne H O'Donnell-Luria; Timothy W Yu; Olaf Bodamer; Catherine A Brownstein; Alan H Beggs; Pankaj B Agrawal
Journal:  Eur J Hum Genet       Date:  2019-04-12       Impact factor: 4.246

5.  Distinct error rates for reference and nonreference genotypes estimated by pedigree analysis.

Authors:  Richard J Wang; Predrag Radivojac; Matthew W Hahn
Journal:  Genetics       Date:  2021-03-03       Impact factor: 4.562

6.  The chromosome-scale assembly of the willow genome provides insight into Salicaceae genome evolution.

Authors:  Suyun Wei; Yonghua Yang; Tongming Yin
Journal:  Hortic Res       Date:  2020-04-01       Impact factor: 6.793

7.  CAGI4 SickKids clinical genomes challenge: A pipeline for identifying pathogenic variants.

Authors:  Lipika R Pal; Kunal Kundu; Yizhou Yin; John Moult
Journal:  Hum Mutat       Date:  2017-06-27       Impact factor: 4.878

8.  Complexity and diversity of F8 genetic variations in the 1000 genomes.

Authors:  J N Li; I G Carrero; J F Dong; F L Yu
Journal:  J Thromb Haemost       Date:  2015-10-20       Impact factor: 5.824

9.  Reproductive Longevity Predicts Mutation Rates in Primates.

Authors:  Gregg W C Thomas; Richard J Wang; Arthi Puri; R Alan Harris; Muthuswamy Raveendran; Daniel S T Hughes; Shwetha C Murali; Lawrence E Williams; Harsha Doddapaneni; Donna M Muzny; Richard A Gibbs; Christian R Abee; Mary R Galinski; Kim C Worley; Jeffrey Rogers; Predrag Radivojac; Matthew W Hahn
Journal:  Curr Biol       Date:  2018-09-27       Impact factor: 10.834

10.  A high-quality sponge gourd (Luffa cylindrica) genome.

Authors:  Haibin Wu; Gangjun Zhao; Hao Gong; Junxing Li; Caixia Luo; Xiaoli He; Shaobo Luo; Xiaoming Zheng; Xiaoxi Liu; Jinju Guo; Junqiu Chen; Jianning Luo
Journal:  Hortic Res       Date:  2020-08-01       Impact factor: 6.793

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.