Literature DB >> 30736001

An Efficient Trimming Algorithm based on Multi-Feature Fusion Scoring Model for NGS Data.

Xingyu Liao, Min Li, You Zou, Fang-Xiang Wu, Yi Pan, Jianxin Wang.   

Abstract

Next-generation sequencing (NGS) has enabled an exponential growth rate of sequencing data. However, several sequence artifacts, including error reads (base calling errors and small insertions or deletions) and poor quality reads, which can impose significant impact on the downstream sequence processing and analysis. Here, we present PE-Trimmer, a sensitive and special trimming algorithm for NGS sequence. First, PE-Trimmer removes technical sequences in paired-end reads based on the characteristics of low quality reads in NGS data. Second, PE-Trimmer determines the range of reads that need to be trimmed according to the quality score statistics histogram of reads in the library. To improve the accuracy of this algorithm, we design a light-weight and easy-to-explain scoring model to evaluate candidates in the pattern of trimming step. Finally, PE-Trimmer selects the appropriate trimming strategy to process the low quality reads based on the location determined by the scoring model. PE-Trimmer is able to locate and remove adapter residues from the paired-end reads. It is easily configurable and offers superior throughput in the multi-threaded mode. We test PE-Trimmer on five datasets, and compare it with the current five latest methods. The experimental results demonstrate that PE-Trimmer produces more superior results, compared with other trimmers.

Mesh:

Year:  2019        PMID: 30736001     DOI: 10.1109/TCBB.2019.2897558

Source DB:  PubMed          Journal:  IEEE/ACM Trans Comput Biol Bioinform        ISSN: 1545-5963            Impact factor:   3.710


  6 in total

1.  msRepDB: a comprehensive repetitive sequence database of over 80 000 species.

Authors:  Xingyu Liao; Kang Hu; Adil Salhi; You Zou; Jianxin Wang; Xin Gao
Journal:  Nucleic Acids Res       Date:  2022-01-07       Impact factor: 16.971

2.  RepAHR: an improved approach for de novo repeat identification by assembly of the high-frequency reads.

Authors:  Xingyu Liao; Xin Gao; Xiankai Zhang; Fang-Xiang Wu; Jianxin Wang
Journal:  BMC Bioinformatics       Date:  2020-10-19       Impact factor: 3.169

3.  CSA: a web service for the complete process of ChIP-Seq analysis.

Authors:  Min Li; Li Tang; Fang-Xiang Wu; Yi Pan; Jianxin Wang
Journal:  BMC Bioinformatics       Date:  2019-12-24       Impact factor: 3.169

4.  A sensitive repeat identification framework based on short and long reads.

Authors:  Xingyu Liao; Min Li; Kang Hu; Fang-Xiang Wu; Xin Gao; Jianxin Wang
Journal:  Nucleic Acids Res       Date:  2021-09-27       Impact factor: 16.971

5.  MAC: Merging Assemblies by Using Adjacency Algebraic Model and Classification.

Authors:  Li Tang; Min Li; Fang-Xiang Wu; Yi Pan; Jianxin Wang
Journal:  Front Genet       Date:  2020-01-31       Impact factor: 4.599

6.  A Large-Scale and Serverless Computational Approach for Improving Quality of NGS Data Supporting Big Multi-Omics Data Analyses.

Authors:  Dariusz Mrozek; Krzysztof Stępień; Piotr Grzesik; Bożena Małysiak-Mrozek
Journal:  Front Genet       Date:  2021-07-13       Impact factor: 4.599

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.