| Literature DB >> 30334805 |
Binbin Wu, Min Li, Xingyu Liao, Junwei Luo, Fangxiang Wu, Yi Pan, Jianxin Wang.
Abstract
The de novo assembly tools aim at reconstructing genomes from next-generation sequencing (NGS) data. However, the assembly tools usually generate a large amount of contigs containing many misassemblies, which are caused by problems of repetitive regions, chimeric reads and sequencing errors. As they can improve the accuracy of assembly results, detecting and correcting the misassemblies in contigs are appealing, yet challenging. In this study, a novel method, called MEC, is proposed to identify and correct misassemblies in contigs. Based on the insert size distribution of paired-end reads and the statistical analysis of GC-contents, MEC can identify more misassemblies accurately. We evaluate our MEC with the metrics (NA50, NGA50) on four datasets, compared it with the most available misassembly correction tools, and carry out experiments to analyze the influence of MEC on scaffolding results, which shows that MEC can reduce misassemblies effectively and result in quantitative improvements in scaffolding quality. MEC is publicly available at https://github.com/bioinfomaticsCSU/MEC.Entities:
Year: 2018 PMID: 30334805 DOI: 10.1109/TCBB.2018.2876855
Source DB: PubMed Journal: IEEE/ACM Trans Comput Biol Bioinform ISSN: 1545-5963 Impact factor: 3.710