Literature DB >> 26162018

Application of Subspace Clustering in DNA Sequence Analysis.

Tim Wallace1, Ali Sekmen1, Xiaofei Wang2.   

Abstract

Identification and clustering of orthologous genes plays an important role in developing evolutionary models such as validating convergent and divergent phylogeny and predicting functional proteins in newly sequenced species of unverified nucleotide protein mappings. Here, we introduce an application of subspace clustering as applied to orthologous gene sequences and discuss the initial results. The working hypothesis is based upon the concept that genetic changes between nucleotide sequences coding for proteins among selected species and groups may lie within a union of subspaces for clusters of the orthologous groups. Estimates for the subspace dimensions were computed for a small population sample. A series of experiments was performed to cluster randomly selected sequences. The experimental design allows for both false positives and false negatives, and estimates for the statistical significance are provided. The clustering results are consistent with the main hypothesis. A simple random mutation binary tree model is used to simulate speciation events that show the interdependence of the subspace rank versus time and mutation rates. The simple mutation model is found to be largely consistent with the observed subspace clustering singular value results. Our study indicates that the subspace clustering method may be applied in orthology analysis.

Entities:  

Keywords:  algorithms; statistics.

Mesh:

Year:  2015        PMID: 26162018      PMCID: PMC4589114          DOI: 10.1089/cmb.2015.0084

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  11 in total

1.  Multiple sequence alignment with the Clustal series of programs.

Authors:  Ramu Chenna; Hideaki Sugawara; Tadashi Koike; Rodrigo Lopez; Toby J Gibson; Desmond G Higgins; Julie D Thompson
Journal:  Nucleic Acids Res       Date:  2003-07-01       Impact factor: 16.971

2.  Generalized principal component analysis (GPCA).

Authors:  René Vidal; Yi Ma; Shankar Sastry
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2005-12       Impact factor: 6.226

3.  A roadmap of clustering algorithms: finding a match for a biomedical application.

Authors:  Bill Andreopoulos; Aijun An; Xiaogang Wang; Michael Schroeder
Journal:  Brief Bioinform       Date:  2009-02-24       Impact factor: 11.622

4.  BAG: a graph theoretic sequence clustering algorithm.

Authors:  Sun Kim; Jason Lee
Journal:  Int J Data Min Bioinform       Date:  2006       Impact factor: 0.667

5.  Extension of the COG and arCOG databases by amino acid and nucleotide sequences.

Authors:  Florian Meereis; Michael Kaufmann
Journal:  BMC Bioinformatics       Date:  2008-11-13       Impact factor: 3.169

6.  OrthoMCL: identification of ortholog groups for eukaryotic genomes.

Authors:  Li Li; Christian J Stoeckert; David S Roos
Journal:  Genome Res       Date:  2003-09       Impact factor: 9.043

7.  Consensus embedding: theory, algorithms and application to segmentation and classification of biomedical data.

Authors:  Satish Viswanath; Anant Madabhushi
Journal:  BMC Bioinformatics       Date:  2012-02-08       Impact factor: 3.169

8.  ANCAC: amino acid, nucleotide, and codon analysis of COGs--a tool for sequence bias analysis in microbial orthologs.

Authors:  Arno Meiler; Claudia Klinger; Michael Kaufmann
Journal:  BMC Bioinformatics       Date:  2012-09-08       Impact factor: 3.169

9.  Updated clusters of orthologous genes for Archaea: a complex ancestor of the Archaea and the byways of horizontal gene transfer.

Authors:  Yuri I Wolf; Kira S Makarova; Natalya Yutin; Eugene V Koonin
Journal:  Biol Direct       Date:  2012-12-14       Impact factor: 4.540

10.  Clusters of orthologous genes for 41 archaeal genomes and implications for evolutionary genomics of archaea.

Authors:  Kira S Makarova; Alexander V Sorokin; Pavel S Novichkov; Yuri I Wolf; Eugene V Koonin
Journal:  Biol Direct       Date:  2007-11-27       Impact factor: 4.540

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.