Literature DB >> 28418726

U50: A New Metric for Measuring Assembly Output Based on Non-Overlapping, Target-Specific Contigs.

Christina J Castro1, Terry Fei Fan Ng2.   

Abstract

Advances in next-generation sequencing technologies enable routine genome sequencing, generating millions of short reads. A crucial step for full genome analysis is the de novo assembly, and currently, performance of different assembly methods is measured by a metric called N50. However, the N50 value can produce skewed, inaccurate results when complex data are analyzed, especially for viral and microbial datasets. To provide a better assessment of assembly output, we developed a new metric called U50. The U50 identifies unique, target-specific contigs by using a reference genome as baseline, aiming at circumventing some limitations that are inherent to the N50 metric. Specifically, the U50 program removes overlapping sequence of multiple contigs by utilizing a mask array, so the performance of the assembly is only measured by unique contigs. We compared simulated and real datasets by using U50 and N50, and our results demonstrated that U50 has the following advantages over N50: (1) reducing erroneously large N50 values due to a poor assembly, (2) eliminating overinflated N50 values caused by large measurements from overlapping contigs, (3) eliminating diminished N50 values caused by an abundance of small contigs, and (4) allowing comparisons across different platforms or samples based on the new percentage-based metric UG50%. The use of the U50 metric allows for a more accurate measure of assembly performance by analyzing only the unique, non-overlapping contigs. In addition, most viral and microbial sequencing have high background noise (i.e., host and other non-targets), which contributes to having a skewed, misrepresented N50 value-this is corrected by U50. Also, the UG50% can be used to compare assembly results from different samples or studies, the cross-comparisons of which cannot be performed with N50.

Entities:  

Keywords:  N50; U50; genome assembly; next-generation sequencing

Mesh:

Year:  2017        PMID: 28418726      PMCID: PMC5783553          DOI: 10.1089/cmb.2017.0013

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  16 in total

1.  SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing.

Authors:  Anton Bankevich; Sergey Nurk; Dmitry Antipov; Alexey A Gurevich; Mikhail Dvorkin; Alexander S Kulikov; Valery M Lesin; Sergey I Nikolenko; Son Pham; Andrey D Prjibelski; Alexey V Pyshkin; Alexander V Sirotkin; Nikolay Vyahhi; Glenn Tesler; Max A Alekseyev; Pavel A Pevzner
Journal:  J Comput Biol       Date:  2012-04-16       Impact factor: 1.479

2.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs.

Authors:  Daniel R Zerbino; Ewan Birney
Journal:  Genome Res       Date:  2008-03-18       Impact factor: 9.043

3.  A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data.

Authors:  Heng Li
Journal:  Bioinformatics       Date:  2011-09-08       Impact factor: 6.937

4.  QUAST: quality assessment tool for genome assemblies.

Authors:  Alexey Gurevich; Vladislav Saveliev; Nikolay Vyahhi; Glenn Tesler
Journal:  Bioinformatics       Date:  2013-02-19       Impact factor: 6.937

5.  Fast gapped-read alignment with Bowtie 2.

Authors:  Ben Langmead; Steven L Salzberg
Journal:  Nat Methods       Date:  2012-03-04       Impact factor: 28.547

6.  An ensemble strategy that significantly improves de novo assembly of microbial genomes from metagenomic next-generation sequencing data.

Authors:  Xutao Deng; Samia N Naccache; Terry Ng; Scot Federman; Linlin Li; Charles Y Chiu; Eric L Delwart
Journal:  Nucleic Acids Res       Date:  2015-01-13       Impact factor: 16.971

7.  Biopython: freely available Python tools for computational molecular biology and bioinformatics.

Authors:  Peter J A Cock; Tiago Antao; Jeffrey T Chang; Brad A Chapman; Cymon J Cox; Andrew Dalke; Iddo Friedberg; Thomas Hamelryck; Frank Kauff; Bartek Wilczynski; Michiel J L de Hoon
Journal:  Bioinformatics       Date:  2009-03-20       Impact factor: 6.937

8.  SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler.

Authors:  Ruibang Luo; Binghang Liu; Yinlong Xie; Zhenyu Li; Weihua Huang; Jianying Yuan; Guangzhu He; Yanxiang Chen; Qi Pan; Yunjie Liu; Jingbo Tang; Gengxiong Wu; Hao Zhang; Yujian Shi; Yong Liu; Chang Yu; Bo Wang; Yao Lu; Changlei Han; David W Cheung; Siu-Ming Yiu; Shaoliang Peng; Zhu Xiaoqian; Guangming Liu; Xiangke Liao; Yingrui Li; Huanming Yang; Jian Wang; Tak-Wah Lam; Jun Wang
Journal:  Gigascience       Date:  2012-12-27       Impact factor: 6.524

9.  GAGE-B: an evaluation of genome assemblers for bacterial organisms.

Authors:  Tanja Magoc; Stephan Pabinger; Stefan Canzar; Xinyue Liu; Qi Su; Daniela Puiu; Luke J Tallon; Steven L Salzberg
Journal:  Bioinformatics       Date:  2013-05-10       Impact factor: 6.937

10.  De novo likelihood-based measures for comparing genome assemblies.

Authors:  Mohammadreza Ghodsi; Christopher M Hill; Irina Astrovskaya; Henry Lin; Dan D Sommer; Sergey Koren; Mihai Pop
Journal:  BMC Res Notes       Date:  2013-08-22
View more
  5 in total

1.  The effect of variant interference on de novo assembly for viral deep sequencing.

Authors:  Christina J Castro; Rachel L Marine; Edward Ramos; Terry Fei Fan Ng
Journal:  BMC Genomics       Date:  2020-06-22       Impact factor: 3.969

Review 2.  New approaches for metagenome assembly with short reads.

Authors:  Martin Ayling; Matthew D Clark; Richard M Leggett
Journal:  Brief Bioinform       Date:  2020-03-23       Impact factor: 11.622

3.  Characterization of the Genome Sequences of Enterovirus C109 from Two Respiratory Disease Cases in Florida, 2016.

Authors:  Terry Fei Fan Ng; Joseph A Yglesias; Tiffany A Stevenson-Yuen; Caitlin M Wolfe; Marshall R Cone; Lea A Heberlein-Larson; Kaija Maher; Shannon Rogers; Shur-Wern Wang Chern; Anna Montmayeur; Christina Castro; W Allan Nix
Journal:  Microbiol Resour Announc       Date:  2018-07-26

4.  CStone: A de novo transcriptome assembler for short-read data that identifies non-chimeric contigs based on underlying graph structure.

Authors:  Raquel Linheiro; John Archer
Journal:  PLoS Comput Biol       Date:  2021-11-23       Impact factor: 4.475

5.  Genetic diversity of human sapovirus across the Americas.

Authors:  Marta Diez-Valcarce; Christina J Castro; Rachel L Marine; Natasha Halasa; Holger Mayta; Mayuko Saito; Laura Tsaknaridis; Chao-Yang Pan; Filemon Bucardo; Sylvia Becker-Dreps; Maria Renee Lopez; Laura Cristal Magaña; Terry Fei Fan Ng; Jan Vinjé
Journal:  J Clin Virol       Date:  2018-05-06       Impact factor: 3.168

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.