Literature DB >> 30239587

Sequence clustering in bioinformatics: an empirical study.

Quan Zou1,2, Gang Lin1, Xingpeng Jiang3, Xiangrong Liu4, Xiangxiang Zeng4.   

Abstract

Sequence clustering is a basic bioinformatics task that is attracting renewed attention with the development of metagenomics and microbiomics. The latest sequencing techniques have decreased costs and as a result, massive amounts of DNA/RNA sequences are being produced. The challenge is to cluster the sequence data using stable, quick and accurate methods. For microbiome sequencing data, 16S ribosomal RNA operational taxonomic units are typically used. However, there is often a gap between algorithm developers and bioinformatics users. Different software tools can produce diverse results and users can find them difficult to analyze. Understanding the different clustering mechanisms is crucial to understanding the results that they produce. In this review, we selected several popular clustering tools, briefly explained the key computing principles, analyzed their characters and compared them using two independent benchmark datasets. Our aim is to assist bioinformatics users in employing suitable clustering tools effectively to analyze big sequencing data. Related data, codes and software tools were accessible at the link http://lab.malab.cn/∼lg/clustering/.

Year:  2018        PMID: 30239587     DOI: 10.1093/bib/bby090

Source DB:  PubMed          Journal:  Brief Bioinform        ISSN: 1467-5463            Impact factor:   11.622


  38 in total

1.  A critical analysis of state-of-the-art metagenomics OTU clustering algorithms.

Authors:  Ashaq Hussain Bhat; Puniethaa Prabhu; Kalpana Balakrishnan
Journal:  J Biosci       Date:  2019-12       Impact factor: 1.826

2.  Critical assessment of computational tools for prokaryotic and eukaryotic promoter prediction.

Authors:  Meng Zhang; Cangzhi Jia; Fuyi Li; Chen Li; Yan Zhu; Tatsuya Akutsu; Geoffrey I Webb; Quan Zou; Lachlan J M Coin; Jiangning Song
Journal:  Brief Bioinform       Date:  2022-03-10       Impact factor: 11.622

3.  Using sequence clustering to identify clinically relevant subphenotypes in patients with COVID-19 admitted to the intensive care unit.

Authors:  Wonsuk Oh; Pushkala Jayaraman; Ashwin S Sawant; Lili Chan; Matthew A Levin; Alexander W Charney; Patricia Kovatch; Benjamin S Glicksberg; Girish N Nadkarni
Journal:  J Am Med Inform Assoc       Date:  2022-01-29       Impact factor: 4.497

4.  AncestralClust: Clustering of Divergent Nucleotide Sequences by Ancestral Sequence Reconstruction using Phylogenetic Trees.

Authors:  Lenore Pipes; Rasmus Nielsen
Journal:  Bioinformatics       Date:  2021-10-20       Impact factor: 6.931

5.  iDNA-MT: Identification DNA Modification Sites in Multiple Species by Using Multi-Task Learning Based a Neural Network Tool.

Authors:  Xiao Yang; Xiucai Ye; Xuehong Li; Lesong Wei
Journal:  Front Genet       Date:  2021-03-31       Impact factor: 4.599

6.  DNN-m6A: A Cross-Species Method for Identifying RNA N6-Methyladenosine Sites Based on Deep Neural Network with Multi-Information Fusion.

Authors:  Lu Zhang; Xinyi Qin; Min Liu; Ziwei Xu; Guangzhong Liu
Journal:  Genes (Basel)       Date:  2021-02-28       Impact factor: 4.096

7.  Accurate identification of RNA D modification using multiple features.

Authors:  Lijun Dou; Wenyang Zhou; Lichao Zhang; Lei Xu; Ke Han
Journal:  RNA Biol       Date:  2021-03-17       Impact factor: 4.652

8.  i4mC-EL: Identifying DNA N4-Methylcytosine Sites in the Mouse Genome Using Ensemble Learning.

Authors:  Yanjuan Li; Zhengnan Zhao; Zhixia Teng
Journal:  Biomed Res Int       Date:  2021-05-29       Impact factor: 3.411

9.  Stable DNA Sequence Over Close-Ending and Pairing Sequences Constraint.

Authors:  Xue Li; Ziqi Wei; Bin Wang; Tao Song
Journal:  Front Genet       Date:  2021-05-17       Impact factor: 4.599

10.  Identification of Causal Genes of COVID-19 Using the SMR Method.

Authors:  Yan Zong; Xiaofei Li
Journal:  Front Genet       Date:  2021-07-05       Impact factor: 4.599

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.