Literature DB >> 32167530

Apollo: a sequencing-technology-independent, scalable and accurate assembly polishing algorithm.

Can Firtina1, Jeremie S Kim1,2, Mohammed Alser1, Damla Senol Cali2, A Ercument Cicek3, Can Alkan3, Onur Mutlu1,2,3.   

Abstract

MOTIVATION: Third-generation sequencing technologies can sequence long reads that contain as many as 2 million base pairs. These long reads are used to construct an assembly (i.e. the subject's genome), which is further used in downstream genome analysis. Unfortunately, third-generation sequencing technologies have high sequencing error rates and a large proportion of base pairs in these long reads is incorrectly identified. These errors propagate to the assembly and affect the accuracy of genome analysis. Assembly polishing algorithms minimize such error propagation by polishing or fixing errors in the assembly by using information from alignments between reads and the assembly (i.e. read-to-assembly alignment information). However, current assembly polishing algorithms can only polish an assembly using reads from either a certain sequencing technology or a small assembly. Such technology-dependency and assembly-size dependency require researchers to (i) run multiple polishing algorithms and (ii) use small chunks of a large genome to use all available readsets and polish large genomes, respectively.
RESULTS: We introduce Apollo, a universal assembly polishing algorithm that scales well to polish an assembly of any size (i.e. both large and small genomes) using reads from all sequencing technologies (i.e. second- and third-generation). Our goal is to provide a single algorithm that uses read sets from all available sequencing technologies to improve the accuracy of assembly polishing and that can polish large genomes. Apollo (i) models an assembly as a profile hidden Markov model (pHMM), (ii) uses read-to-assembly alignment to train the pHMM with the Forward-Backward algorithm and (iii) decodes the trained model with the Viterbi algorithm to produce a polished assembly. Our experiments with real readsets demonstrate that Apollo is the only algorithm that (i) uses reads from any sequencing technology within a single run and (ii) scales well to polish large assemblies without splitting the assembly into multiple parts.
AVAILABILITY AND IMPLEMENTATION: Source code is available at https://github.com/CMU-SAFARI/Apollo. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Year:  2020        PMID: 32167530     DOI: 10.1093/bioinformatics/btaa179

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  10 in total

1.  Complete chloroplast genomes of two medicinal Swertia species: the comparative evolutionary analysis of Swertia genus in the Gentianaceae family.

Authors:  Jing Li; Liqiang Wang; Qing Du; Haimei Chen; Mei Jiang; Zhuoer Chen; Chuanbei Jiang; Haidong Gao; Bin Wang; Chang Liu
Journal:  Planta       Date:  2022-09-09       Impact factor: 4.540

2.  The complete chloroplast genome sequence of Clerodendranthus spicatus, a medicinal plant for preventing and treating kidney diseases from Lamiaceae family.

Authors:  Qing Du; Mei Jiang; Sihui Sun; Liqiang Wang; Shengyu Liu; Chuanbei Jiang; Haidong Gao; Haimei Chen; Yong Li; Bin Wang; Chang Liu
Journal:  Mol Biol Rep       Date:  2022-01-21       Impact factor: 2.316

3.  Chewie Nomenclature Server (chewie-NS): a deployable nomenclature server for easy sharing of core and whole genome MLST schemas.

Authors:  Rafael Mamede; Pedro Vila-Cerqueira; Mickael Silva; João A Carriço; Mário Ramirez
Journal:  Nucleic Acids Res       Date:  2021-01-08       Impact factor: 16.971

4.  Gamete binning: chromosome-level and haplotype-resolved genome assembly enabled by high-throughput single-cell sequencing of gamete genomes.

Authors:  José A Campoy; Hequan Sun; Manish Goel; Wen-Biao Jiao; Kat Folz-Donahue; Nan Wang; Manuel Rubio; Chang Liu; Christian Kukat; David Ruiz; Bruno Huettel; Korbinian Schneeberger
Journal:  Genome Biol       Date:  2020-12-29       Impact factor: 13.583

5.  Hapo-G, haplotype-aware polishing of genome assemblies with accurate reads.

Authors:  Jean-Marc Aury; Benjamin Istace
Journal:  NAR Genom Bioinform       Date:  2021-05-03

Review 6.  Technology dictates algorithms: recent developments in read alignment.

Authors:  Mohammed Alser; Jeremy Rotman; Onur Mutlu; Serghei Mangul; Dhrithi Deshpande; Kodi Taraszka; Huwenbo Shi; Pelin Icer Baykal; Harry Taegyun Yang; Victor Xue; Sergey Knyazev; Benjamin D Singer; Brunilda Balliu; David Koslicki; Pavel Skums; Alex Zelikovsky; Can Alkan
Journal:  Genome Biol       Date:  2021-08-26       Impact factor: 13.583

7.  Comparative evaluation of Nanopore polishing tools for microbial genome assembly and polishing strategies for downstream analysis.

Authors:  Jin Young Lee; Minyoung Kong; Jinjoo Oh; JinSoo Lim; Sung Hee Chung; Jung-Min Kim; Jae-Seok Kim; Ki-Hwan Kim; Jae-Chan Yoo; Woori Kwak
Journal:  Sci Rep       Date:  2021-10-20       Impact factor: 4.379

8.  B-assembler: a circular bacterial genome assembler.

Authors:  Fengyuan Huang; Li Xiao; Min Gao; Ethan J Vallely; Kevin Dybvig; T Prescott Atkinson; Ken B Waites; Zechen Chong
Journal:  BMC Genomics       Date:  2022-05-11       Impact factor: 4.547

Review 9.  From molecules to genomic variations: Accelerating genome analysis via intelligent algorithms and architectures.

Authors:  Mohammed Alser; Joel Lindegger; Can Firtina; Nour Almadhoun; Haiyu Mao; Gagandeep Singh; Juan Gomez-Luna; Onur Mutlu
Journal:  Comput Struct Biotechnol J       Date:  2022-08-18       Impact factor: 6.155

10.  The Interspecific Fungal Hybrid Verticillium longisporum Displays Subgenome-Specific Gene Expression.

Authors:  Fabian van Beveren; Luis Rodriguez-Moreno; H Martin Kramer; Edgar A Chavarro Carrero; Thomas A Wood; Bart P H J Thomma; Michael F Seidl; Jasper R L Depotter; Gabriel L Fiorin; Grardy C M van den Berg
Journal:  mBio       Date:  2021-07-20       Impact factor: 7.867

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.