Literature DB >> 24389657

Bias from removing read duplication in ultra-deep sequencing experiments.

Wanding Zhou1, Tenghui Chen1, Hao Zhao1, Agda Karina Eterovic2, Funda Meric-Bernstam2, Gordon B Mills2, Ken Chen1.   

Abstract

MOTIVATION: Identifying subclonal mutations and their implications requires accurate estimation of mutant allele fractions from possibly duplicated sequencing reads. Removing duplicate reads assumes that polymerase chain reaction amplification from library constructions is the primary source. The alternative-sampling coincidence from DNA fragmentation-has not been systematically investigated.
RESULTS: With sufficiently high-sequencing depth, sampling-induced read duplication is non-negligible, and removing duplicate reads can overcorrect read counts, causing systemic biases in variant allele fraction and copy number variation estimations. Minimal overcorrection occurs when duplicate reads are identified accounting for their mate reads, inserts are of a variety of lengths and samples are sequenced in separate batches. We investigate sampling-induced read duplication in deep sequencing data with 500× to 2000× duplicates-removed sequence coverage. We provide a quantitative solution to overcorrection and guidance for effective designs of deep sequencing platforms that facilitate accurate estimation of variant allele fraction and copy number variation.
AVAILABILITY AND IMPLEMENTATION: A Python implementation is freely available at https://bitbucket.org/wanding/duprecover/overview CONTACT: : wzhou1@mdanderson.org, kchen3@mdanderson.org Supplementary information: Supplementary data are available at Bioinformatics online.
© The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

Mesh:

Year:  2014        PMID: 24389657      PMCID: PMC3982159          DOI: 10.1093/bioinformatics/btt771

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  15 in total

1.  Digital RNA sequencing minimizes sequence-dependent bias and amplification noise with optimized single-molecule barcodes.

Authors:  Katsuyuki Shiroguchi; Tony Z Jia; Peter A Sims; X Sunney Xie
Journal:  Proc Natl Acad Sci U S A       Date:  2012-01-09       Impact factor: 11.205

2.  The impact of tumor heterogeneity on patient treatment decisions.

Authors:  Carol J Farhangfar; Funda Meric-Bernstam; John Mendelsohn; Gordon B Mills; Agda Karina Lucio-Eterovic
Journal:  Clin Chem       Date:  2012-11-09       Impact factor: 8.327

3.  The life history of 21 breast cancers.

Authors:  Serena Nik-Zainal; Peter Van Loo; David C Wedge; Ludmil B Alexandrov; Christopher D Greenman; King Wai Lau; Keiran Raine; David Jones; John Marshall; Manasa Ramakrishna; Adam Shlien; Susanna L Cooke; Jonathan Hinton; Andrew Menzies; Lucy A Stebbings; Catherine Leroy; Mingming Jia; Richard Rance; Laura J Mudie; Stephen J Gamble; Philip J Stephens; Stuart McLaren; Patrick S Tarpey; Elli Papaemmanuil; Helen R Davies; Ignacio Varela; David J McBride; Graham R Bignell; Kenric Leung; Adam P Butler; Jon W Teague; Sancha Martin; Goran Jönsson; Odette Mariani; Sandrine Boyault; Penelope Miron; Aquila Fatima; Anita Langerød; Samuel A J R Aparicio; Andrew Tutt; Anieta M Sieuwerts; Åke Borg; Gilles Thomas; Anne Vincent Salomon; Andrea L Richardson; Anne-Lise Børresen-Dale; P Andrew Futreal; Michael R Stratton; Peter J Campbell
Journal:  Cell       Date:  2012-05-17       Impact factor: 41.582

4.  The clonal and mutational evolution spectrum of primary triple-negative breast cancers.

Authors:  Sohrab P Shah; Andrew Roth; Rodrigo Goya; Arusha Oloumi; Gavin Ha; Yongjun Zhao; Gulisa Turashvili; Jiarui Ding; Kane Tse; Gholamreza Haffari; Ali Bashashati; Leah M Prentice; Jaswinder Khattra; Angela Burleigh; Damian Yap; Virginie Bernard; Andrew McPherson; Karey Shumansky; Anamaria Crisan; Ryan Giuliany; Alireza Heravi-Moussavi; Jamie Rosner; Daniel Lai; Inanc Birol; Richard Varhol; Angela Tam; Noreen Dhalla; Thomas Zeng; Kevin Ma; Simon K Chan; Malachi Griffith; Annie Moradian; S-W Grace Cheng; Gregg B Morin; Peter Watson; Karen Gelmon; Stephen Chia; Suet-Feung Chin; Christina Curtis; Oscar M Rueda; Paul D Pharoah; Sambasivarao Damaraju; John Mackey; Kelly Hoon; Timothy Harkins; Vasisht Tadigotla; Mahvash Sigaroudinia; Philippe Gascard; Thea Tlsty; Joseph F Costello; Irmtraud M Meyer; Connie J Eaves; Wyeth W Wasserman; Steven Jones; David Huntsman; Martin Hirst; Carlos Caldas; Marco A Marra; Samuel Aparicio
Journal:  Nature       Date:  2012-04-04       Impact factor: 49.962

5.  The Sequence Alignment/Map format and SAMtools.

Authors:  Heng Li; Bob Handsaker; Alec Wysoker; Tim Fennell; Jue Ruan; Nils Homer; Gabor Marth; Goncalo Abecasis; Richard Durbin
Journal:  Bioinformatics       Date:  2009-06-08       Impact factor: 6.937

6.  Power to detect selective allelic amplification in genome-wide scans of tumor data.

Authors:  Ninad Dewal; Matthew L Freedman; Thomas LaFramboise; Itsik Pe'er
Journal:  Bioinformatics       Date:  2009-12-23       Impact factor: 6.937

7.  Intratumor heterogeneity and branched evolution revealed by multiregion sequencing.

Authors:  Marco Gerlinger; Andrew J Rowan; Stuart Horswell; James Larkin; David Endesfelder; Eva Gronroos; Pierre Martinez; Nicholas Matthews; Aengus Stewart; Charles Swanton; M Math; Patrick Tarpey; Ignacio Varela; Benjamin Phillimore; Sharmin Begum; Neil Q McDonald; Adam Butler; David Jones; Keiran Raine; Calli Latimer; Claudio R Santos; Mahrokh Nohadani; Aron C Eklund; Bradley Spencer-Dene; Graham Clark; Lisa Pickering; Gordon Stamp; Martin Gore; Zoltan Szallasi; Julian Downward; P Andrew Futreal
Journal:  N Engl J Med       Date:  2012-03-08       Impact factor: 91.245

8.  A framework for variation discovery and genotyping using next-generation DNA sequencing data.

Authors:  Mark A DePristo; Eric Banks; Ryan Poplin; Kiran V Garimella; Jared R Maguire; Christopher Hartl; Anthony A Philippakis; Guillermo del Angel; Manuel A Rivas; Matt Hanna; Aaron McKenna; Tim J Fennell; Andrew M Kernytsky; Andrey Y Sivachenko; Kristian Cibulskis; Stacey B Gabriel; David Altshuler; Mark J Daly
Journal:  Nat Genet       Date:  2011-04-10       Impact factor: 38.330

9.  Clonal evolution in relapsed acute myeloid leukaemia revealed by whole-genome sequencing.

Authors:  Li Ding; Timothy J Ley; David E Larson; Christopher A Miller; Daniel C Koboldt; John S Welch; Julie K Ritchey; Margaret A Young; Tamara Lamprecht; Michael D McLellan; Joshua F McMichael; John W Wallis; Charles Lu; Dong Shen; Christopher C Harris; David J Dooling; Robert S Fulton; Lucinda L Fulton; Ken Chen; Heather Schmidt; Joelle Kalicki-Veizer; Vincent J Magrini; Lisa Cook; Sean D McGrath; Tammi L Vickery; Michael C Wendl; Sharon Heath; Mark A Watson; Daniel C Link; Michael H Tomasson; William D Shannon; Jacqueline E Payton; Shashikant Kulkarni; Peter Westervelt; Matthew J Walter; Timothy A Graubert; Elaine R Mardis; Richard K Wilson; John F DiPersio
Journal:  Nature       Date:  2012-01-11       Impact factor: 49.962

10.  iReckon: simultaneous isoform discovery and abundance estimation from RNA-seq data.

Authors:  Aziz M Mezlini; Eric J M Smith; Marc Fiume; Orion Buske; Gleb L Savich; Sohrab Shah; Sam Aparicio; Derek Y Chiang; Anna Goldenberg; Michael Brudno
Journal:  Genome Res       Date:  2012-11-29       Impact factor: 9.043

View more
  14 in total

1.  Comprehensive Genomic Profiling of Metastatic Squamous Cell Carcinoma of the Anal Canal.

Authors:  Van Morris; Xiayu Rao; Curtis Pickering; Wai Chin Foo; Asif Rashid; Karina Eterovic; Taebeom Kim; Ken Chen; Jing Wang; Kenna Shaw; Cathy Eng
Journal:  Mol Cancer Res       Date:  2017-08-07       Impact factor: 5.852

2.  Functional consequence of the MET-T1010I polymorphism in breast cancer.

Authors:  Shuying Liu; Funda Meric-Bernstam; Napa Parinyanitikul; Bailiang Wang; Agda K Eterovic; Xiaofeng Zheng; Mihai Gagea; Mariana Chavez-MacGregor; Naoto T Ueno; Xiudong Lei; Wanding Zhou; Lakshmy Nair; Debu Tripathy; Powel H Brown; Gabriel N Hortobagyi; Ken Chen; John Mendelsohn; Gordon B Mills; Ana M Gonzalez-Angulo
Journal:  Oncotarget       Date:  2015-02-20

3.  Detrimental effects of duplicate reads and low complexity regions on RNA- and ChIP-seq data.

Authors:  Mikhail G Dozmorov; Indra Adrianto; Cory B Giles; Edmund Glass; Stuart B Glenn; Courtney Montgomery; Kathy L Sivils; Lorin E Olson; Tomoaki Iwayama; Willard M Freeman; Christopher J Lessard; Jonathan D Wren
Journal:  BMC Bioinformatics       Date:  2015-09-25       Impact factor: 3.169

4.  De novo meta-assembly of ultra-deep sequencing data.

Authors:  Hamid Mirebrahim; Timothy J Close; Stefano Lonardi
Journal:  Bioinformatics       Date:  2015-06-15       Impact factor: 6.937

5.  A computational method for estimating the PCR duplication rate in DNA and RNA-seq experiments.

Authors:  Vikas Bansal
Journal:  BMC Bioinformatics       Date:  2017-03-14       Impact factor: 3.169

6.  Extracting allelic read counts from 250,000 human sequencing runs in Sequence Read Archive.

Authors:  Brian Tsui; Michelle Dow; Dylan Skola; Hannah Carter
Journal:  Pac Symp Biocomput       Date:  2019

Review 7.  Review of current methods, applications, and data management for the bioinformatics analysis of whole exome sequencing.

Authors:  Riyue Bao; Lei Huang; Jorge Andrade; Wei Tan; Warren A Kibbe; Hongmei Jiang; Gang Feng
Journal:  Cancer Inform       Date:  2014-09-21

8.  Biased estimates of clonal evolution and subclonal heterogeneity can arise from PCR duplicates in deep sequencing experiments.

Authors:  Erin N Smith; Kristen Jepsen; Mahdieh Khosroheidari; Laura Z Rassenti; Matteo D'Antonio; Emanuela M Ghia; Dennis A Carson; Catriona Hm Jamieson; Thomas J Kipps; Kelly A Frazer
Journal:  Genome Biol       Date:  2014-08-07       Impact factor: 13.583

9.  Multi-factor data normalization enables the detection of copy number aberrations in amplicon sequencing data.

Authors:  Valentina Boeva; Tatiana Popova; Maxime Lienard; Sebastien Toffoli; Maud Kamal; Christophe Le Tourneau; David Gentien; Nicolas Servant; Pierre Gestraud; Thomas Rio Frio; Philippe Hupé; Emmanuel Barillot; Jean-François Laes
Journal:  Bioinformatics       Date:  2014-07-12       Impact factor: 6.937

10.  Validation of picogram- and femtogram-input DNA libraries for microscale metagenomics.

Authors:  Christian Rinke; Serene Low; Ben J Woodcroft; Jean-Baptiste Raina; Adam Skarshewski; Xuyen H Le; Margaret K Butler; Roman Stocker; Justin Seymour; Gene W Tyson; Philip Hugenholtz
Journal:  PeerJ       Date:  2016-09-22       Impact factor: 2.984

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.