Literature DB >> 19605794

A probabilistic approach for SNP discovery in high-throughput human resequencing data.

Rose Hoberman1, Joana Dias, Bing Ge, Eef Harmsen, Michael Mayhew, Dominique J Verlaan, Tony Kwan, Ken Dewar, Mathieu Blanchette, Tomi Pastinen.   

Abstract

New high-throughput sequencing technologies are generating large amounts of sequence data, allowing the development of targeted large-scale resequencing studies. For these studies, accurate identification of polymorphic sites is crucial. Heterozygous sites are particularly difficult to identify, especially in regions of low coverage. We present a new strategy for identifying heterozygous sites in a single individual by using a machine learning approach that generates a heterozygosity score for each chromosomal position. Our approach also facilitates the identification of regions with unequal representation of two alleles and other poorly sequenced regions. The availability of confidence scores allows for a principled combination of sequencing results from multiple samples. We evaluate our method on a gold standard data genotype set from HapMap. We are able to classify sites in this data set as heterozygous or homozygous with 98.5% accuracy. In de novo data our probabilistic heterozygote detection ("ProbHD") is able to identify 93% of heterozygous sites at a <5% false call rate (FCR) as estimated based on independent genotyping results. In direct comparison of ProbHD with high-coverage 1000 Genomes sequencing available for a subset of our data, we observe >99.9% overall agreement for genotype calls and close to 90% agreement for heterozygote calls. Overall, our data indicate that high-throughput resequencing of human genomic regions requires careful attention to systematic biases in sample preparation as well as sequence contexts, and that their impact can be alleviated by machine learning-based sequence analyses allowing more accurate extraction of true DNA variants.

Entities:  

Mesh:

Year:  2009        PMID: 19605794      PMCID: PMC2752119          DOI: 10.1101/gr.092072.109

Source DB:  PubMed          Journal:  Genome Res        ISSN: 1088-9051            Impact factor:   9.043


  19 in total

1.  Direct selection of human genomic loci by microarray hybridization.

Authors:  Thomas J Albert; Michael N Molla; Donna M Muzny; Lynne Nazareth; David Wheeler; Xingzhi Song; Todd A Richmond; Chris M Middle; Matthew J Rodesch; Charles J Packard; George M Weinstock; Richard A Gibbs
Journal:  Nat Methods       Date:  2007-10-14       Impact factor: 28.547

2.  Quality scores and SNP detection in sequencing-by-synthesis systems.

Authors:  William Brockman; Pablo Alvarez; Sarah Young; Manuel Garber; Georgia Giannoukos; William L Lee; Carsten Russ; Eric S Lander; Chad Nusbaum; David B Jaffe
Journal:  Genome Res       Date:  2008-01-22       Impact factor: 9.043

3.  1000 Genomes Project promises closer look at variation in human genome.

Authors:  Bridget M Kuehn
Journal:  JAMA       Date:  2008-12-17       Impact factor: 56.272

4.  Mapping short DNA sequencing reads and calling variants using mapping quality scores.

Authors:  Heng Li; Jue Ruan; Richard Durbin
Journal:  Genome Res       Date:  2008-08-19       Impact factor: 9.043

5.  Rapid whole-genome mutational profiling using next-generation sequencing technologies.

Authors:  Douglas R Smith; Aaron R Quinlan; Heather E Peckham; Kathryn Makowsky; Wei Tao; Betty Woolf; Lei Shen; William F Donahue; Nadeem Tusneem; Michael P Stromberg; Donald A Stewart; Lu Zhang; Swati S Ranade; Jason B Warner; Clarence C Lee; Brittney E Coleman; Zheng Zhang; Stephen F McLaughlin; Joel A Malek; Jon M Sorenson; Alan P Blanchard; Jarrod Chapman; David Hillman; Feng Chen; Daniel S Rokhsar; Kevin J McKernan; Thomas W Jeffries; Gabor T Marth; Paul M Richardson
Journal:  Genome Res       Date:  2008-09-04       Impact factor: 9.043

Review 6.  Next-generation DNA sequencing methods.

Authors:  Elaine R Mardis
Journal:  Annu Rev Genomics Hum Genet       Date:  2008       Impact factor: 8.929

7.  The complete genome of an individual by massively parallel DNA sequencing.

Authors:  David A Wheeler; Maithreyan Srinivasan; Michael Egholm; Yufeng Shen; Lei Chen; Amy McGuire; Wen He; Yi-Ju Chen; Vinod Makhijani; G Thomas Roth; Xavier Gomes; Karrie Tartaro; Faheem Niazi; Cynthia L Turcotte; Gerard P Irzyk; James R Lupski; Craig Chinault; Xing-zhi Song; Yue Liu; Ye Yuan; Lynne Nazareth; Xiang Qin; Donna M Muzny; Marcel Margulies; George M Weinstock; Richard A Gibbs; Jonathan M Rothberg
Journal:  Nature       Date:  2008-04-17       Impact factor: 49.962

Review 8.  Genetic mapping in human disease.

Authors:  David Altshuler; Mark J Daly; Eric S Lander
Journal:  Science       Date:  2008-11-07       Impact factor: 47.728

9.  The diploid genome sequence of an individual human.

Authors:  Samuel Levy; Granger Sutton; Pauline C Ng; Lars Feuk; Aaron L Halpern; Brian P Walenz; Nelson Axelrod; Jiaqi Huang; Ewen F Kirkness; Gennady Denisov; Yuan Lin; Jeffrey R MacDonald; Andy Wing Chun Pang; Mary Shago; Timothy B Stockwell; Alexia Tsiamouri; Vineet Bafna; Vikas Bansal; Saul A Kravitz; Dana A Busam; Karen Y Beeson; Tina C McIntosh; Karin A Remington; Josep F Abril; John Gill; Jon Borman; Yu-Hui Rogers; Marvin E Frazier; Stephen W Scherer; Robert L Strausberg; J Craig Venter
Journal:  PLoS Biol       Date:  2007-09-04       Impact factor: 8.029

10.  Comprehensive resequence analysis of a 136 kb region of human chromosome 8q24 associated with prostate and colon cancers.

Authors:  Meredith Yeager; Nianqing Xiao; Richard B Hayes; Pascal Bouffard; Brian Desany; Laura Burdett; Nick Orr; Casey Matthews; Liqun Qi; Andrew Crenshaw; Zdenek Markovic; Karin M Fredrikson; Kevin B Jacobs; Laufey Amundadottir; Thomas P Jarvie; David J Hunter; Robert Hoover; Gilles Thomas; Timothy T Harkins; Stephen J Chanock
Journal:  Hum Genet       Date:  2008-08-14       Impact factor: 4.132

View more
  16 in total

1.  Allele-specific chromatin remodeling in the ZPBP2/GSDMB/ORMDL3 locus associated with the risk of asthma and autoimmune disease.

Authors:  Dominique J Verlaan; Soizik Berlivet; Gary M Hunninghake; Anne-Marie Madore; Mathieu Larivière; Sanny Moussette; Elin Grundberg; Tony Kwan; Manon Ouimet; Bing Ge; Rose Hoberman; Marcin Swiatek; Joana Dias; Kevin C L Lam; Vonda Koka; Eef Harmsen; Manuel Soto-Quiros; Lydiana Avila; Juan C Celedón; Scott T Weiss; Ken Dewar; Daniel Sinnett; Catherine Laprise; Benjamin A Raby; Tomi Pastinen; Anna K Naumova
Journal:  Am J Hum Genet       Date:  2009-09       Impact factor: 11.025

2.  Sniper: improved SNP discovery by multiply mapping deep sequenced reads.

Authors:  Daniel F Simola; Junhyong Kim
Journal:  Genome Biol       Date:  2011-06-20       Impact factor: 13.583

3.  Single Nucleotide Polymorphism (SNP) Detection and Genotype Calling from Massively Parallel Sequencing (MPS) Data.

Authors:  Yun Li; Wei Chen; Eric Yi Liu; Yi-Hui Zhou
Journal:  Stat Biosci       Date:  2013-05

4.  Global patterns of cis variation in human cells revealed by high-density allelic expression analysis.

Authors:  Bing Ge; Dmitry K Pokholok; Tony Kwan; Elin Grundberg; Lisanne Morcos; Dominique J Verlaan; Jennie Le; Vonda Koka; Kevin C L Lam; Vincent Gagné; Joana Dias; Rose Hoberman; Alexandre Montpetit; Marie-Michele Joly; Edward J Harvey; Daniel Sinnett; Patrick Beaulieu; Robert Hamon; Alexandru Graziani; Ken Dewar; Eef Harmsen; Jacek Majewski; Harald H H Göring; Anna K Naumova; Mathieu Blanchette; Kevin L Gunderson; Tomi Pastinen
Journal:  Nat Genet       Date:  2009-10-18       Impact factor: 38.330

5.  A framework for variation discovery and genotyping using next-generation DNA sequencing data.

Authors:  Mark A DePristo; Eric Banks; Ryan Poplin; Kiran V Garimella; Jared R Maguire; Christopher Hartl; Anthony A Philippakis; Guillermo del Angel; Manuel A Rivas; Matt Hanna; Aaron McKenna; Tim J Fennell; Andrew M Kernytsky; Andrey Y Sivachenko; Kristian Cibulskis; Stacey B Gabriel; David Altshuler; Mark J Daly
Journal:  Nat Genet       Date:  2011-04-10       Impact factor: 38.330

6.  Genomic DNA sequences from mastodon and woolly mammoth reveal deep speciation of forest and savanna elephants.

Authors:  Nadin Rohland; David Reich; Swapan Mallick; Matthias Meyer; Richard E Green; Nicholas J Georgiadis; Alfred L Roca; Michael Hofreiter
Journal:  PLoS Biol       Date:  2010-12-21       Impact factor: 8.029

7.  Positional information resolves structural variations and uncovers an evolutionarily divergent genetic locus in accessions of Arabidopsis thaliana.

Authors:  Alvina G Lai; Matthew Denton-Giles; Bernd Mueller-Roeber; Jos H M Schippers; Paul P Dijkwel
Journal:  Genome Biol Evol       Date:  2011-05-27       Impact factor: 3.416

8.  Estimation of allele frequency and association mapping using next-generation sequencing data.

Authors:  Su Yeon Kim; Kirk E Lohmueller; Anders Albrechtsen; Yingrui Li; Thorfinn Korneliussen; Geng Tian; Niels Grarup; Tao Jiang; Gitte Andersen; Daniel Witte; Torben Jorgensen; Torben Hansen; Oluf Pedersen; Jun Wang; Rasmus Nielsen
Journal:  BMC Bioinformatics       Date:  2011-06-11       Impact factor: 3.169

9.  Next generation sequence analysis and computational genomics using graphical pipeline workflows.

Authors:  Federica Torri; Ivo D Dinov; Alen Zamanyan; Sam Hobel; Alex Genco; Petros Petrosyan; Andrew P Clark; Zhizhong Liu; Paul Eggert; Jonathan Pierce; James A Knowles; Joseph Ames; Carl Kesselman; Arthur W Toga; Steven G Potkin; Marquis P Vawter; Fabio Macciardi
Journal:  Genes (Basel)       Date:  2012-08-30       Impact factor: 4.096

10.  PyroHMMsnp: an SNP caller for Ion Torrent and 454 sequencing data.

Authors:  Feng Zeng; Rui Jiang; Ting Chen
Journal:  Nucleic Acids Res       Date:  2013-05-21       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.