Literature DB >> 27076460

ISEA: Iterative Seed-Extension Algorithm for De Novo Assembly Using Paired-End Information and Insert Size Distribution.

Min Li, Zhongxiang Liao, Yiming He, Jianxin Wang, Junwei Luo, Yi Pan.   

Abstract

The purpose of de novo assembly is to report more contiguous, complete, and less error prone contigs. Thanks to the advent of the next generation sequencing (NGS) technologies, the cost of producing high depth reads is reduced greatly. However, due to the disadvantages of NGS, de novo assembly has to face the difficulties brought by repeat regions, error rate, and low sequencing coverage in some regions. Although many de novo algorithms have been proposed to solve these problems, the de novo assembly still remains a challenge. In this article, we developed an iterative seed-extension algorithm for de novo assembly, called ISEA. To avoid the negative impact induced by error rate, ISEA utilizes reads overlap and paired-end information to correct error reads before assemblying. During extending seeds in a De Bruijn graph, ISEA uses an elaborately designed score function based on paired-end information and the distribution of insert size to solve the repeat region problem. By employing the distribution of insert size, the score function can also reduce the influence of error reads. In scaffolding, ISEA adopts a relaxed strategy to join contigs that were terminated for low coverage during the extension. The performance of ISEA was compared with six previous popular assemblers on four real datasets. The experimental results demonstrate that ISEA can effectively obtain longer and more accurate scaffolds.

Mesh:

Substances:

Year:  2016        PMID: 27076460     DOI: 10.1109/TCBB.2016.2550433

Source DB:  PubMed          Journal:  IEEE/ACM Trans Comput Biol Bioinform        ISSN: 1545-5963            Impact factor:   3.710


  4 in total

Review 1.  Genome sequence assembly algorithms and misassembly identification methods.

Authors:  Yue Meng; Yu Lei; Jianlong Gao; Yuxuan Liu; Enze Ma; Yunhong Ding; Yixin Bian; Hongquan Zu; Yucui Dong; Xiao Zhu
Journal:  Mol Biol Rep       Date:  2022-09-23       Impact factor: 2.742

2.  RepAHR: an improved approach for de novo repeat identification by assembly of the high-frequency reads.

Authors:  Xingyu Liao; Xin Gao; Xiankai Zhang; Fang-Xiang Wu; Jianxin Wang
Journal:  BMC Bioinformatics       Date:  2020-10-19       Impact factor: 3.169

3.  VAliBS: a visual aligner for bisulfite sequences.

Authors:  Min Li; Ping Huang; Xiaodong Yan; Jianxin Wang; Yi Pan; Fang-Xiang Wu
Journal:  BMC Bioinformatics       Date:  2017-10-16       Impact factor: 3.169

4.  A Sequence-Based Novel Approach for Quality Evaluation of Third-Generation Sequencing Reads.

Authors:  Wenjing Zhang; Neng Huang; Jiantao Zheng; Xingyu Liao; Jianxin Wang; Hong-Dong Li
Journal:  Genes (Basel)       Date:  2019-01-14       Impact factor: 4.096

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.