Literature DB >> 27074764

Evaluation of the evenness score in next-generation sequencing.

Konrad Oexle1.   

Abstract

The evenness score (E) in next-generation sequencing (NGS) quantifies the homogeneity in coverage of the NGS targets. Here I clarify the mathematical description of E, which is 1 minus the integral from 0 to 1 over the cumulative distribution function F(x) of the normalized coverage x, where normalization means division by the mean, and derive a computationally more efficient formula; that is, 1 minus the integral from 0 to 1 over the probability density distribution f(x) times 1-x. An analogous formula for empirical coverage data is provided as well as fast R command line scripts. This new formula allows for a general comparison of E with the coefficient of variation (=standard deviation σ of normalized data) which is the conventional measure of the relative width of a distribution. For symmetrical distributions, including the Gaussian, E can be predicted closely as 1-σ(2)/2⩾E⩾1-σ/2 with σ⩽1 owing to normalization and symmetry. In case of the log-normal distribution as a typical representative of positively skewed biological data, the analysis yields E≈exp(-σ*/2) with σ*(2)=ln(σ(2)+1) up to large σ (⩽3), and E≈1-F(exp(-1)) for very large σ (⩾2.5). In the latter kind of rather uneven coverage, E can provide direct information on the fraction of well-covered targets that is not immediately delivered by the normalized σ. Otherwise, E does not appear to have major advantages over σ or over a simple score exp(-σ) based on it. Actually, exp(-σ) exploits a much larger part of its range for the evaluation of realistic NGS outputs.

Mesh:

Year:  2016        PMID: 27074764     DOI: 10.1038/jhg.2016.21

Source DB:  PubMed          Journal:  J Hum Genet        ISSN: 1434-5161            Impact factor:   3.172


  11 in total

1.  Gene expression profiling in single cells from the pancreatic islets of Langerhans reveals lognormal distribution of mRNA levels.

Authors:  Martin Bengtsson; Anders Ståhlberg; Patrik Rorsman; Mikael Kubista
Journal:  Genome Res       Date:  2005-10       Impact factor: 9.043

2.  The distribution of incubation periods of infectious disease.

Authors:  P E SARTWELL
Journal:  Am J Hyg       Date:  1950-05

3.  Telomere length distribution and Southern blot analysis.

Authors:  K Oexle
Journal:  J Theor Biol       Date:  1998-02-21       Impact factor: 2.691

4.  Age at onset of Alzheimer's disease: clue to the relative importance of etiologic factors?

Authors:  R D Horner
Journal:  Am J Epidemiol       Date:  1987-09       Impact factor: 4.897

5.  Accurate SNP and mutation detection by targeted custom microarray-based genomic enrichment of short-fragment sequencing libraries.

Authors:  Michal Mokry; Harma Feitsma; Isaac J Nijman; Ewart de Bruijn; Pieter J van der Zaag; Victor Guryev; Edwin Cuppen
Journal:  Nucleic Acids Res       Date:  2010-02-17       Impact factor: 16.971

6.  Performance comparison of whole-genome sequencing platforms.

Authors:  Hugo Y K Lam; Michael J Clark; Rui Chen; Rong Chen; Georges Natsoulis; Maeve O'Huallachain; Frederick E Dewey; Lukas Habegger; Euan A Ashley; Mark B Gerstein; Atul J Butte; Hanlee P Ji; Michael Snyder
Journal:  Nat Biotechnol       Date:  2011-12-18       Impact factor: 68.164

7.  Neuronal variability during handwriting: lognormal distribution.

Authors:  Valery I Rupasov; Mikhail A Lebedev; Joseph S Erlichman; Michael Linderman
Journal:  PLoS One       Date:  2012-04-13       Impact factor: 3.240

8.  Comparison of Exome and Genome Sequencing Technologies for the Complete Capture of Protein-Coding Regions.

Authors:  Stefan H Lelieveld; Malte Spielmann; Stefan Mundlos; Joris A Veltman; Christian Gilissen
Journal:  Hum Mutat       Date:  2015-06-11       Impact factor: 4.878

9.  Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing.

Authors:  Andreas Gnirke; Alexandre Melnikov; Jared Maguire; Peter Rogov; Emily M LeProust; William Brockman; Timothy Fennell; Georgia Giannoukos; Sheila Fisher; Carsten Russ; Stacey Gabriel; David B Jaffe; Eric S Lander; Chad Nusbaum
Journal:  Nat Biotechnol       Date:  2009-02-01       Impact factor: 54.908

10.  Low budget analysis of Direct-To-Consumer genomic testing familial data.

Authors:  Gustavo Glusman; Mike Cariaso; Rafael Jimenez; Daniel Swan; Bastian Greshake; Jong Bhak; Darren W Logan; Manuel Corpas
Journal:  F1000Res       Date:  2012-07-16
View more
  3 in total

1.  Novel bioinformatics quality control metric for next-generation sequencing experiments in the clinical context.

Authors:  Maxim Ivanov; Mikhail Ivanov; Artem Kasianov; Ekaterina Rozhavskaya; Sergey Musienko; Ancha Baranova; Vladislav Mileyko
Journal:  Nucleic Acids Res       Date:  2019-12-02       Impact factor: 16.971

2.  A commentary on evaluation of the evenness score in next-generation sequencing.

Authors:  Paul Horton
Journal:  J Hum Genet       Date:  2016-04-14       Impact factor: 3.172

3.  Ultralow amounts of DNA from long-term archived serum samples produce high-quality methylomes.

Authors:  Marcin W Wojewodzic; Magnus Leithaug; Marianne Lauritzen; Robert Lyle; Sofia Haglund; Carl-Johan Rubin; Philip A Ewels; Tom Grotmol; Trine B Rounge
Journal:  Clin Epigenetics       Date:  2021-05-12       Impact factor: 6.551

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.