Heng Li1. 1. Medical Population Genetics Program, Broad Institute, 7 Cambridge Center, Cambridge, MA 02142, USA. hengli@broadinstitute.org
Abstract
MOTIVATION: Most existing methods for DNA sequence analysis rely on accurate sequences or genotypes. However, in applications of the next-generation sequencing (NGS), accurate genotypes may not be easily obtained (e.g. multi-sample low-coverage sequencing or somatic mutation discovery). These applications press for the development of new methods for analyzing sequence data with uncertainty. RESULTS: We present a statistical framework for calling SNPs, discovering somatic mutations, inferring population genetical parameters and performing association tests directly based on sequencing data without explicit genotyping or linkage-based imputation. On real data, we demonstrate that our method achieves comparable accuracy to alternative methods for estimating site allele count, for inferring allele frequency spectrum and for association mapping. We also highlight the necessity of using symmetric datasets for finding somatic mutations and confirm that for discovering rare events, mismapping is frequently the leading source of errors. AVAILABILITY: http://samtools.sourceforge.net. CONTACT: hengli@broadinstitute.org.
MOTIVATION: Most existing methods for DNA sequence analysis rely on accurate sequences or genotypes. However, in applications of the next-generation sequencing (NGS), accurate genotypes may not be easily obtained (e.g. multi-sample low-coverage sequencing or somatic mutation discovery). These applications press for the development of new methods for analyzing sequence data with uncertainty. RESULTS: We present a statistical framework for calling SNPs, discovering somatic mutations, inferring population genetical parameters and performing association tests directly based on sequencing data without explicit genotyping or linkage-based imputation. On real data, we demonstrate that our method achieves comparable accuracy to alternative methods for estimating site allele count, for inferring allele frequency spectrum and for association mapping. We also highlight the necessity of using symmetric datasets for finding somatic mutations and confirm that for discovering rare events, mismapping is frequently the leading source of errors. AVAILABILITY: http://samtools.sourceforge.net. CONTACT: hengli@broadinstitute.org.
Authors: Elaine R Mardis; Li Ding; David J Dooling; David E Larson; Michael D McLellan; Ken Chen; Daniel C Koboldt; Robert S Fulton; Kim D Delehaunty; Sean D McGrath; Lucinda A Fulton; Devin P Locke; Vincent J Magrini; Rachel M Abbott; Tammi L Vickery; Jerry S Reed; Jody S Robinson; Todd Wylie; Scott M Smith; Lynn Carmichael; James M Eldred; Christopher C Harris; Jason Walker; Joshua B Peck; Feiyu Du; Adam F Dukes; Gabriel E Sanderson; Anthony M Brummett; Eric Clark; Joshua F McMichael; Rick J Meyer; Jonathan K Schindler; Craig S Pohl; John W Wallis; Xiaoqi Shi; Ling Lin; Heather Schmidt; Yuzhu Tang; Carrie Haipek; Madeline E Wiechert; Jolynda V Ivy; Joelle Kalicki; Glendoria Elliott; Rhonda E Ries; Jacqueline E Payton; Peter Westervelt; Michael H Tomasson; Mark A Watson; Jack Baty; Sharon Heath; William D Shannon; Rakesh Nagarajan; Daniel C Link; Matthew J Walter; Timothy A Graubert; John F DiPersio; Richard K Wilson; Timothy J Ley Journal: N Engl J Med Date: 2009-08-05 Impact factor: 91.245
Authors: Sohrab P Shah; Ryan D Morin; Jaswinder Khattra; Leah Prentice; Trevor Pugh; Angela Burleigh; Allen Delaney; Karen Gelmon; Ryan Guliany; Janine Senz; Christian Steidl; Robert A Holt; Steven Jones; Mark Sun; Gillian Leung; Richard Moore; Tesa Severson; Greg A Taylor; Andrew E Teschendorff; Kane Tse; Gulisa Turashvili; Richard Varhol; René L Warren; Peter Watson; Yongjun Zhao; Carlos Caldas; David Huntsman; Martin Hirst; Marco A Marra; Samuel Aparicio Journal: Nature Date: 2009-10-08 Impact factor: 49.962
Authors: Timothy J Ley; Elaine R Mardis; Li Ding; Bob Fulton; Michael D McLellan; Ken Chen; David Dooling; Brian H Dunford-Shore; Sean McGrath; Matthew Hickenbotham; Lisa Cook; Rachel Abbott; David E Larson; Dan C Koboldt; Craig Pohl; Scott Smith; Amy Hawkins; Scott Abbott; Devin Locke; Ladeana W Hillier; Tracie Miner; Lucinda Fulton; Vincent Magrini; Todd Wylie; Jarret Glasscock; Joshua Conyers; Nathan Sander; Xiaoqi Shi; John R Osborne; Patrick Minx; David Gordon; Asif Chinwalla; Yu Zhao; Rhonda E Ries; Jacqueline E Payton; Peter Westervelt; Michael H Tomasson; Mark Watson; Jack Baty; Jennifer Ivanovich; Sharon Heath; William D Shannon; Rakesh Nagarajan; Matthew J Walter; Daniel C Link; Timothy A Graubert; John F DiPersio; Richard K Wilson Journal: Nature Date: 2008-11-06 Impact factor: 49.962
Authors: Omar A Ali; Sean M O'Rourke; Stephen J Amish; Mariah H Meek; Gordon Luikart; Carson Jeffres; Michael R Miller Journal: Genetics Date: 2015-12-29 Impact factor: 4.562
Authors: Hao Wu; Bryan C Gontarek; Gibum Yi; Brandon D Beall; Anjanasree K Neelakandan; Bibechana Adhikari; Rumei Chen; Donald R McCarty; Andrew J Severin; Philip W Becraft Journal: Plant Physiol Date: 2020-07-31 Impact factor: 8.340
Authors: Clément Goubert; Jainy Thomas; Lindsay M Payer; Jeffrey M Kidd; Julie Feusier; W Scott Watkins; Kathleen H Burns; Lynn B Jorde; Cédric Feschotte Journal: Nucleic Acids Res Date: 2020-04-06 Impact factor: 16.971
Authors: Anne L Carlton; Anuradha Illendula; Yan Gao; Danielle C Llaneza; Adam Boulton; Anant Shah; Roger A Rajewski; Charles N Landen; David Wotton; John H Bushweller Journal: Gynecol Oncol Date: 2018-03-16 Impact factor: 5.482