Literature DB >> 31725321

Predicting the Number of Bases to Attain Sufficient Coverage in High-Throughput Sequencing Experiments.

Chao Deng1, Timothy Daley2, Peter Calabrese1, Jie Ren1, Andrew D Smith1.   

Abstract

For many types of high-throughput sequencing experiments, success in downstream analysis depends on attaining sufficient coverage for individual positions in the genome. For example, when identifying single-nucleotide variants de novo, the number of reads supporting a particular variant call determines our confidence in that variant call. If sequenced reads are distributed uniformly along the genome, the coverage of a nucleotide position is easily approximated by a Poisson distribution, with rate equal to average sequencing depth. Unfortunately, as has become well known, high-throughput sequencing data are never uniform. The numerous factors contributing to variation in coverage have resisted attempts at direct modeling and change along with minor adjustments in the underlying technology. We propose a new nonparametric method to predict the portion of a genome that will attain some specified minimum coverage, as a function of sequencing effort, using information from a shallow sequencing experiment from the same library. Simulations show our approach performs well under an array of distributional assumptions that deviate from uniformity. We applied this approach to estimate coverage at varying depths in single-cell whole-genome sequencing data from multiple protocols. These resulted in highly accurate predictions, demonstrating the effectiveness of our approach in analyzing complexity of sequencing libraries and optimizing design of sequencing experiments.

Keywords:  Padé approximant; genome coverage; mixture of Poisson distributions; nonparametric; single-cell whole-genome sequencing; species accumulation curve

Year:  2019        PMID: 31725321      PMCID: PMC7398442          DOI: 10.1089/cmb.2019.0264

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  26 in total

1.  Comprehensive human genome amplification using multiple displacement amplification.

Authors:  Frank B Dean; Seiyu Hosono; Linhua Fang; Xiaohong Wu; A Fawad Faruqi; Patricia Bray-Ward; Zhenyu Sun; Qiuling Zong; Yuefen Du; Jing Du; Mark Driscoll; Wanmin Song; Stephen F Kingsmore; Michael Egholm; Roger S Lasken
Journal:  Proc Natl Acad Sci U S A       Date:  2002-04-16       Impact factor: 11.205

2.  Modeling genome coverage in single-cell sequencing.

Authors:  Timothy Daley; Andrew D Smith
Journal:  Bioinformatics       Date:  2014-08-08       Impact factor: 6.937

3.  Genomic mapping by fingerprinting random clones: a mathematical analysis.

Authors:  E S Lander; M S Waterman
Journal:  Genomics       Date:  1988-04       Impact factor: 5.736

Review 4.  Sequencing depth and coverage: key considerations in genomic analyses.

Authors:  David Sims; Ian Sudbery; Nicholas E Ilott; Andreas Heger; Chris P Ponting
Journal:  Nat Rev Genet       Date:  2014-02       Impact factor: 53.242

5.  Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation.

Authors:  Davis J McCarthy; Yunshun Chen; Gordon K Smyth
Journal:  Nucleic Acids Res       Date:  2012-01-28       Impact factor: 16.971

6.  Summarizing and correcting the GC content bias in high-throughput sequencing.

Authors:  Yuval Benjamini; Terence P Speed
Journal:  Nucleic Acids Res       Date:  2012-02-09       Impact factor: 16.971

7.  Single-cell whole-genome sequencing reveals the functional landscape of somatic mutations in B lymphocytes across the human lifespan.

Authors:  Lei Zhang; Xiao Dong; Moonsook Lee; Alexander Y Maslov; Tao Wang; Jan Vijg
Journal:  Proc Natl Acad Sci U S A       Date:  2019-04-16       Impact factor: 11.205

8.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.

Authors:  Mark D Robinson; Davis J McCarthy; Gordon K Smyth
Journal:  Bioinformatics       Date:  2009-11-11       Impact factor: 6.937

9.  Estimating Exceptionally Rare Germline and Somatic Mutation Frequencies via Next Generation Sequencing.

Authors:  Jordan Eboreime; Soo-Kung Choi; Song-Ro Yoon; Norman Arnheim; Peter Calabrese
Journal:  PLoS One       Date:  2016-06-24       Impact factor: 3.240

10.  Observation weights unlock bulk RNA-seq tools for zero inflation and single-cell applications.

Authors:  Koen Van den Berge; Fanny Perraudeau; Charlotte Soneson; Michael I Love; Davide Risso; Jean-Philippe Vert; Mark D Robinson; Sandrine Dudoit; Lieven Clement
Journal:  Genome Biol       Date:  2018-02-26       Impact factor: 13.583

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.