Literature DB >> 23303509

ALE: a generic assembly likelihood evaluation framework for assessing the accuracy of genome and metagenome assemblies.

Scott C Clark1, Rob Egan, Peter I Frazier, Zhong Wang.   

Abstract

MOTIVATION: Researchers need general purpose methods for objectively evaluating the accuracy of single and metagenome assemblies and for automatically detecting any errors they may contain. Current methods do not fully meet this need because they require a reference, only consider one of the many aspects of assembly quality or lack statistical justification, and none are designed to evaluate metagenome assemblies.
RESULTS: In this article, we present an Assembly Likelihood Evaluation (ALE) framework that overcomes these limitations, systematically evaluating the accuracy of an assembly in a reference-independent manner using rigorous statistical methods. This framework is comprehensive, and integrates read quality, mate pair orientation and insert length (for paired-end reads), sequencing coverage, read alignment and k-mer frequency. ALE pinpoints synthetic errors in both single and metagenomic assemblies, including single-base errors, insertions/deletions, genome rearrangements and chimeric assemblies presented in metagenomes. At the genome level with real-world data, ALE identifies three large misassemblies from the Spirochaeta smaragdinae finished genome, which were all independently validated by Pacific Biosciences sequencing. At the single-base level with Illumina data, ALE recovers 215 of 222 (97%) single nucleotide variants in a training set from a GC-rich Rhodobacter sphaeroides genome. Using real Pacific Biosciences data, ALE identifies 12 of 12 synthetic errors in a Lambda Phage genome, surpassing even Pacific Biosciences' own variant caller, EviCons. In summary, the ALE framework provides a comprehensive, reference-independent and statistically rigorous measure of single genome and metagenome assembly accuracy, which can be used to identify misassemblies or to optimize the assembly process. AVAILABILITY: ALE is released as open source software under the UoI/NCSA license at http://www.alescore.org. It is implemented in C and Python.

Entities:  

Mesh:

Year:  2013        PMID: 23303509     DOI: 10.1093/bioinformatics/bts723

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  65 in total

Review 1.  The Genome 10K Project: a way forward.

Authors:  Klaus-Peter Koepfli; Benedict Paten; Stephen J O'Brien
Journal:  Annu Rev Anim Biosci       Date:  2015       Impact factor: 8.923

Review 2.  Tutorial: assessing metagenomics software with the CAMI benchmarking toolkit.

Authors:  Fernando Meyer; Till-Robin Lesker; David Koslicki; Adrian Fritz; Alexey Gurevich; Aaron E Darling; Alexander Sczyrba; Andreas Bremges; Alice C McHardy
Journal:  Nat Protoc       Date:  2021-03-01       Impact factor: 13.491

3.  Identifying wrong assemblies in de novo short read primary sequence assembly contigs.

Authors:  Vandna Chawla; Rajnish Kumar; Ravi Shankar
Journal:  J Biosci       Date:  2016-09       Impact factor: 1.826

4.  Metagenomic assembly through the lens of validation: recent advances in assessing and improving the quality of genomes assembled from metagenomes.

Authors:  Nathan D Olson; Todd J Treangen; Christopher M Hill; Victoria Cepeda-Espinoza; Jay Ghurye; Sergey Koren; Mihai Pop
Journal:  Brief Bioinform       Date:  2019-07-19       Impact factor: 11.622

5.  NxRepair: error correction in de novo sequence assembly using Nextera mate pairs.

Authors:  Rebecca R Murphy; Jared O'Connell; Anthony J Cox; Ole Schulz-Trieglaff
Journal:  PeerJ       Date:  2015-06-02       Impact factor: 2.984

Review 6.  Music of metagenomics-a review of its applications, analysis pipeline, and associated tools.

Authors:  Bilal Wajid; Faria Anwar; Imran Wajid; Haseeb Nisar; Sharoze Meraj; Ali Zafar; Mustafa Kamal Al-Shawaqfeh; Ali Riza Ekti; Asia Khatoon; Jan S Suchodolski
Journal:  Funct Integr Genomics       Date:  2021-10-18       Impact factor: 3.410

7.  Antibiotic failure mediated by a resistant subpopulation in Enterobacter cloacae.

Authors:  Victor I Band; Emily K Crispell; Brooke A Napier; Carmen M Herrera; Greg K Tharp; Kranthi Vavikolanu; Jan Pohl; Timothy D Read; Steven E Bosinger; M Stephen Trent; Eileen M Burd; David S Weiss
Journal:  Nat Microbiol       Date:  2016-05-09       Impact factor: 17.745

8.  Re-examination of two diatom reference genomes using long-read sequencing.

Authors:  Gina V Filloramo; Bruce A Curtis; Emma Blanche; John M Archibald
Journal:  BMC Genomics       Date:  2021-05-24       Impact factor: 3.969

9.  Assembling the 20 Gb white spruce (Picea glauca) genome from whole-genome shotgun sequencing data.

Authors:  Inanc Birol; Anthony Raymond; Shaun D Jackman; Stephen Pleasance; Robin Coope; Greg A Taylor; Macaire Man Saint Yuen; Christopher I Keeling; Dana Brand; Benjamin P Vandervalk; Heather Kirk; Pawan Pandoh; Richard A Moore; Yongjun Zhao; Andrew J Mungall; Barry Jaquish; Alvin Yanchuk; Carol Ritland; Brian Boyle; Jean Bousquet; Kermit Ritland; John Mackay; Jörg Bohlmann; Steven J M Jones
Journal:  Bioinformatics       Date:  2013-05-22       Impact factor: 6.937

10.  Advantages of Single-Molecule Real-Time Sequencing in High-GC Content Genomes.

Authors:  Seung Chul Shin; Do Hwan Ahn; Su Jin Kim; Hyoungseok Lee; Tae-Jin Oh; Jong Eun Lee; Hyun Park
Journal:  PLoS One       Date:  2013-07-23       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.