Literature DB >> 26355506

A New Unsupervised Binning Approach for Metagenomic Sequences Based on N-grams and Automatic Feature Weighting.

Ruiqi Liao, Ruichang Zhang, Jihong Guan, Shuigeng Zhou.   

Abstract

The rapid development of high-throughput technologies enables researchers to sequence the whole metagenome of a microbial community sampled directly from the environment. The assignment of these sequence reads into different species or taxonomical classes is a crucial step for metagenomic analysis, which is referred to as binning of metagenomic data. Most traditional binning methods rely on known reference genomes for accurate assignment of the sequence reads, therefore cannot classify reads from unknown species without the help of close references. To overcome this drawback, unsupervised learning based approaches have been proposed, which need not any known species' reference genome for help. In this paper, we introduce a novel unsupervised method called MCluster for binning metagenomic sequences. This method uses N-grams to extract sequence features and utilizes automatic feature weighting to improve the performance of the basic K-means clustering algorithm. We evaluate MCluster on a variety of simulated data sets and a real data set, and compare it with three latest binning methods: AbundanceBin, MetaCluster 3.0, and MetaCluster 5.0. Experimental results show that MCluster achieves obviously better overall performance (F-measure) than AbundanceBin and MetaCluster 3.0 on long metagenomic reads (≥800 bp); while compared with MetaCluster 5.0, MCluster obtains a larger sensitivity, and a comparable yet more stable F-measure on short metagenomic reads (<300 bp). This suggests that MCluster can serve as a promising tool for effectively binning metagenomic sequences.

Entities:  

Mesh:

Year:  2014        PMID: 26355506     DOI: 10.1109/TCBB.2013.137

Source DB:  PubMed          Journal:  IEEE/ACM Trans Comput Biol Bioinform        ISSN: 1545-5963            Impact factor:   3.710


  11 in total

1.  The ParaHox gene Cdx4 induces acute erythroid leukemia in mice.

Authors:  Silvia Thoene; Tamoghna Mandal; Naidu M Vegi; Leticia Quintanilla-Martinez; Reinhild Rösler; Sebastian Wiese; Klaus H Metzeler; Tobias Herold; Torsten Haferlach; Konstanze Döhner; Hartmut Döhner; Luisa Schwarzmüller; Ursula Klingmüller; Christian Buske; Vijay P S Rawat; Michaela Feuring-Buske
Journal:  Blood Adv       Date:  2019-11-26

2.  Exploiting topic modeling to boost metagenomic reads binning.

Authors:  Ruichang Zhang; Zhanzhan Cheng; Jihong Guan; Shuigeng Zhou
Journal:  BMC Bioinformatics       Date:  2015-03-18       Impact factor: 3.169

3.  A two-phase binning algorithm using l-mer frequency on groups of non-overlapping reads.

Authors:  Le Van Vinh; Tran Van Lang; Le Thanh Binh; Tran Van Hoai
Journal:  Algorithms Mol Biol       Date:  2015-01-16       Impact factor: 1.405

4.  A-GAME: improving the assembly of pooled functional metagenomics sequence data.

Authors:  Matteo Chiara; Antonio Placido; Ernesto Picardi; Luigi Ruggiero Ceci; David Stephen Horner; Graziano Pesole
Journal:  BMC Genomics       Date:  2018-01-12       Impact factor: 3.969

5.  A high-resolution genomic composition-based method with the ability to distinguish similar bacterial organisms.

Authors:  Yizhuang Zhou; Wenting Zhang; Huixian Wu; Kai Huang; Junfei Jin
Journal:  BMC Genomics       Date:  2019-10-21       Impact factor: 3.969

6.  Higher-Order Chromatin Structures of Chromosomally Integrated HHV-6A Predict Integration Sites.

Authors:  Michael Mariani; Cosima Zimmerman; Princess Rodriguez; Ellie Hasenohr; Giulia Aimola; Diana Lea Gerrard; Alyssa Richman; Andrea Dest; Louis Flamand; Benedikt Kaufer; Seth Frietze
Journal:  Front Cell Infect Microbiol       Date:  2021-02-26       Impact factor: 5.293

7.  A New Binning Method for Metagenomics by One-Dimensional Cellular Automata.

Authors:  Ying-Chih Lin
Journal:  Int J Genomics       Date:  2015-10-18       Impact factor: 2.326

8.  Assessment of k-mer spectrum applicability for metagenomic dissimilarity analysis.

Authors:  Veronika B Dubinkina; Dmitry S Ischenko; Vladimir I Ulyantsev; Alexander V Tyakht; Dmitry G Alexeev
Journal:  BMC Bioinformatics       Date:  2016-01-16       Impact factor: 3.169

9.  Decontaminating eukaryotic genome assemblies with machine learning.

Authors:  Janna L Fierst; Duncan A Murdock
Journal:  BMC Bioinformatics       Date:  2017-12-01       Impact factor: 3.169

10.  Deconvolute individual genomes from metagenome sequences through short read clustering.

Authors:  Kexue Li; Yakang Lu; Li Deng; Lili Wang; Lizhen Shi; Zhong Wang
Journal:  PeerJ       Date:  2020-04-08       Impact factor: 2.984

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.