Literature DB >> 22674656

Biases and errors on allele frequency estimation and disease association tests of next-generation sequencing of pooled samples.

Xiaowei Chen1, Jennifer B Listman, Frank J Slack, Joel Gelernter, Hongyu Zhao.   

Abstract

Next-generation sequencing is widely used to study complex diseases because of its ability to identify both common and rare variants without prior single nucleotide polymorphism (SNP) information. Pooled sequencing of implicated target regions can lower costs and allow more samples to be analyzed, thus improving statistical power for disease-associated variant detection. Several methods for disease association tests of pooled data and for optimal pooling designs have been developed under certain assumptions of the pooling process, for example, equal/unequal contributions to the pool, sequencing depth variation, and error rate. However, these simplified assumptions may not portray the many factors affecting pooled sequencing data quality, such as PCR amplification during target capture and sequencing, reference allele preferential bias, and others. As a result, the properties of the observed data may differ substantially from those expected under the simplified assumptions. Here, we use real datasets from targeted sequencing of pooled samples, together with microarray SNP genotypes of the same subjects, to identify and quantify factors (biases and errors) affecting the observed sequencing data. Through simulations, we find that these factors have a significant impact on the accuracy of allele frequency estimation and the power of association tests. Furthermore, we develop a workflow protocol to incorporate these factors in data analysis to reduce the potential biases and errors in pooled sequencing data and to gain better estimation of allele frequencies. The workflow, Psafe, is available at http://bioinformatics.med.yale.edu/group/.
© 2012 Wiley Periodicals, Inc.

Entities:  

Mesh:

Year:  2012        PMID: 22674656      PMCID: PMC3477622          DOI: 10.1002/gepi.21648

Source DB:  PubMed          Journal:  Genet Epidemiol        ISSN: 0741-0395            Impact factor:   2.135


  16 in total

1.  The mutation spectrum revealed by paired genome sequences from a lung cancer patient.

Authors:  William Lee; Zhaoshi Jiang; Jinfeng Liu; Peter M Haverty; Yinghui Guan; Jeremy Stinson; Peng Yue; Yan Zhang; Krishna P Pant; Deepali Bhatt; Connie Ha; Stephanie Johnson; Michael I Kennemer; Sankar Mohan; Igor Nazarenko; Colin Watanabe; Andrew B Sparks; David S Shames; Robert Gentleman; Frederic J de Sauvage; Howard Stern; Ajay Pandita; Dennis G Ballinger; Radoje Drmanac; Zora Modrusan; Somasekar Seshagiri; Zemin Zhang
Journal:  Nature       Date:  2010-05-27       Impact factor: 49.962

2.  Deep sequencing to reveal new variants in pooled DNA samples.

Authors:  Astrid A Out; Ivonne J H M van Minderhout; Jelle J Goeman; Yavuz Ariyurek; Stephan Ossowski; Korbinian Schneeberger; Detlef Weigel; Michiel van Galen; Peter E M Taschner; Carli M J Tops; Martijn H Breuning; Gert-Jan B van Ommen; Johan T den Dunnen; Peter Devilee; Frederik J Hes
Journal:  Hum Mutat       Date:  2009-12       Impact factor: 4.878

3.  Resequencing of pooled DNA for detecting disease associations with rare variants.

Authors:  Tao Wang; Chang-Yun Lin; Thomas E Rohan; Kenny Ye
Journal:  Genet Epidemiol       Date:  2010-07       Impact factor: 2.135

4.  Design of association studies with pooled or un-pooled next-generation sequencing data.

Authors:  Su Yeon Kim; Yingrui Li; Yiran Guo; Ruiqiang Li; Johan Holmkvist; Torben Hansen; Oluf Pedersen; Jun Wang; Rasmus Nielsen
Journal:  Genet Epidemiol       Date:  2010-07       Impact factor: 2.135

5.  Diagnostic reliability of the Semi-structured Assessment for Drug Dependence and Alcoholism (SSADDA).

Authors:  Amira Pierucci-Lagha; Joel Gelernter; Richard Feinn; Joseph F Cubells; Deborah Pearson; Alisha Pollastri; Lindsay Farrer; Henry R Kranzler
Journal:  Drug Alcohol Depend       Date:  2005-12-12       Impact factor: 4.492

6.  The Sequence Alignment/Map format and SAMtools.

Authors:  Heng Li; Bob Handsaker; Alec Wysoker; Tim Fennell; Jue Ruan; Nils Homer; Gabor Marth; Goncalo Abecasis; Richard Durbin
Journal:  Bioinformatics       Date:  2009-06-08       Impact factor: 6.937

7.  Mutational evolution in a lobular breast tumour profiled at single nucleotide resolution.

Authors:  Sohrab P Shah; Ryan D Morin; Jaswinder Khattra; Leah Prentice; Trevor Pugh; Angela Burleigh; Allen Delaney; Karen Gelmon; Ryan Guliany; Janine Senz; Christian Steidl; Robert A Holt; Steven Jones; Mark Sun; Gillian Leung; Richard Moore; Tesa Severson; Greg A Taylor; Andrew E Teschendorff; Kane Tse; Gulisa Turashvili; Richard Varhol; René L Warren; Peter Watson; Yongjun Zhao; Carlos Caldas; David Huntsman; Martin Hirst; Marco A Marra; Samuel Aparicio
Journal:  Nature       Date:  2009-10-08       Impact factor: 49.962

8.  Deep resequencing of GWAS loci identifies independent rare variants associated with inflammatory bowel disease.

Authors:  Manuel A Rivas; Mélissa Beaudoin; Agnes Gardet; Christine Stevens; Yashoda Sharma; Clarence K Zhang; Gabrielle Boucher; Stephan Ripke; David Ellinghaus; Noel Burtt; Tim Fennell; Andrew Kirby; Anna Latiano; Philippe Goyette; Todd Green; Jonas Halfvarson; Talin Haritunians; Joshua M Korn; Finny Kuruvilla; Caroline Lagacé; Benjamin Neale; Ken Sin Lo; Phil Schumm; Leif Törkvist; Marla C Dubinsky; Steven R Brant; Mark S Silverberg; Richard H Duerr; David Altshuler; Stacey Gabriel; Guillaume Lettre; Andre Franke; Mauro D'Amato; Dermot P B McGovern; Judy H Cho; John D Rioux; Ramnik J Xavier; Mark J Daly
Journal:  Nat Genet       Date:  2011-10-09       Impact factor: 38.330

9.  Fast and accurate short read alignment with Burrows-Wheeler transform.

Authors:  Heng Li; Richard Durbin
Journal:  Bioinformatics       Date:  2009-05-18       Impact factor: 6.937

10.  Complex landscapes of somatic rearrangement in human breast cancer genomes.

Authors:  Philip J Stephens; David J McBride; Meng-Lay Lin; Ignacio Varela; Erin D Pleasance; Jared T Simpson; Lucy A Stebbings; Catherine Leroy; Sarah Edkins; Laura J Mudie; Chris D Greenman; Mingming Jia; Calli Latimer; Jon W Teague; King Wai Lau; John Burton; Michael A Quail; Harold Swerdlow; Carol Churcher; Rachael Natrajan; Anieta M Sieuwerts; John W M Martens; Daniel P Silver; Anita Langerød; Hege E G Russnes; John A Foekens; Jorge S Reis-Filho; Laura van 't Veer; Andrea L Richardson; Anne-Lise Børresen-Dale; Peter J Campbell; P Andrew Futreal; Michael R Stratton
Journal:  Nature       Date:  2009-12-24       Impact factor: 49.962

View more
  11 in total

1.  Detecting rare variants for psychiatric disorders using next generation sequencing: a methods primer.

Authors:  Andre Altmann; Carina Quast; Peter Weber
Journal:  Curr Psychiatry Rep       Date:  2013-01       Impact factor: 5.285

2.  Targeted Sequencing of Lung Function Loci in Chronic Obstructive Pulmonary Disease Cases and Controls.

Authors:  María Soler Artigas; Louise V Wain; Nick Shrine; Tricia M McKeever; Ian Sayers; Ian P Hall; Martin D Tobin
Journal:  PLoS One       Date:  2017-01-23       Impact factor: 3.240

3.  Sequence analysis of pooled bacterial samples enables identification of strain variation in group A streptococcus.

Authors:  Rigbe G Weldatsadik; Jingwen Wang; Kai Puhakainen; Hong Jiao; Jari Jalava; Kati Räisänen; Neeta Datta; Tiina Skoog; Jaana Vuopio; T Sakari Jokiranta; Juha Kere
Journal:  Sci Rep       Date:  2017-03-31       Impact factor: 4.379

4.  The presence and impact of reference bias on population genomic studies of prehistoric human populations.

Authors:  Torsten Günther; Carl Nettelblad
Journal:  PLoS Genet       Date:  2019-07-26       Impact factor: 5.917

Review 5.  Sequencing pools of individuals - mining genome-wide polymorphism data without big funding.

Authors:  Christian Schlötterer; Raymond Tobler; Robert Kofler; Viola Nolte
Journal:  Nat Rev Genet       Date:  2014-09-23       Impact factor: 53.242

6.  Case-control association testing of common variants from sequencing of DNA pools.

Authors:  Allan F McRae; Melinda M Richter; Penelope A Lind
Journal:  PLoS One       Date:  2013-06-07       Impact factor: 3.240

7.  A unified approach for allele frequency estimation, SNP detection and association studies based on pooled sequencing data using EM algorithms.

Authors:  Quan Chen; Fengzhu Sun
Journal:  BMC Genomics       Date:  2013-01-21       Impact factor: 3.969

8.  Evaluation of allele frequency estimation using pooled sequencing data simulation.

Authors:  Yan Guo; David C Samuels; Jiang Li; Travis Clark; Chung-I Li; Yu Shyr
Journal:  ScientificWorldJournal       Date:  2013-02-07

9.  Design of DNA pooling to allow incorporation of covariates in rare variants analysis.

Authors:  Weihua Guan; Chun Li
Journal:  PLoS One       Date:  2014-12-08       Impact factor: 3.240

10.  MotorPlex provides accurate variant detection across large muscle genes both in single myopathic patients and in pools of DNA samples.

Authors:  Marco Savarese; Giuseppina Di Fruscio; Margherita Mutarelli; Annalaura Torella; Francesca Magri; Filippo Maria Santorelli; Giacomo Pietro Comi; Claudio Bruno; Vincenzo Nigro
Journal:  Acta Neuropathol Commun       Date:  2014-09-11       Impact factor: 7.801

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.