Literature DB >> 20130028

A visual framework for sequence analysis using n-grams and spectral rearrangement.

Stefan R Maetschke1, Karin S Kassahn, Jasmyn A Dunn, Siew-Ping Han, Eva Z Curley, Katryn J Stacey, Mark A Ragan.   

Abstract

MOTIVATION: Protein sequences are often composed of regions that have distinct evolutionary histories as a consequence of domain shuffling, recombination or gene conversion. New approaches are required to discover, visualize and analyze these sequence regions and thus enable a better understanding of protein evolution.
RESULTS: Here, we have developed an alignment-free and visual approach to analyze sequence relationships. We use the number of shared n-grams between sequences as a measure of sequence similarity and rearrange the resulting affinity matrix applying a spectral technique. Heat maps of the affinity matrix are employed to identify and visualize clusters of related sequences or outliers, while n-gram-based dot plots and conservation profiles allow detailed analysis of similarities among selected sequences. Using this approach, we have identified signatures of domain shuffling in an otherwise poorly characterized family, and homology clusters in another. We conclude that this approach may be generally useful as a framework to analyze related, but highly divergent protein sequences. It is particularly useful as a fast method to study sequence relationships prior to much more time-consuming multiple sequence alignment and phylogenetic analysis. AVAILABILITY: A software implementation (MOSAIC) of the framework described here can be downloaded from http://bioinformatics.org.au/mosaic/ CONTACT: m.ragan@uq.edu.au SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Mesh:

Substances:

Year:  2010        PMID: 20130028     DOI: 10.1093/bioinformatics/btq042

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  9 in total

1.  A non-negative matrix factorization framework for identifying modular patterns in metagenomic profile data.

Authors:  Xingpeng Jiang; Joshua S Weitz; Jonathan Dushoff
Journal:  J Math Biol       Date:  2011-06-01       Impact factor: 2.259

2.  The mammalian PYHIN gene family: phylogeny, evolution and expression.

Authors:  Jasmyn A Cridland; Eva Z Curley; Michelle N Wykes; Kate Schroder; Matthew J Sweet; Tara L Roberts; Mark A Ragan; Karin S Kassahn; Katryn J Stacey
Journal:  BMC Evol Biol       Date:  2012-08-07       Impact factor: 3.260

3.  N-gram analysis of 970 microbial organisms reveals presence of biological language models.

Authors:  Hatice Ulku Osmanbeyoglu; Madhavi K Ganapathiraju
Journal:  BMC Bioinformatics       Date:  2011-01-10       Impact factor: 3.169

Review 4.  From Molecular Phylogenetics to Quantum Chemistry: Discovering Enzyme Design Principles through Computation.

Authors:  Troy Wymore; Charles L Brooks
Journal:  Comput Struct Biotechnol J       Date:  2012-11-30       Impact factor: 7.271

Review 5.  Information theory applications for biological sequence analysis.

Authors:  Susana Vinga
Journal:  Brief Bioinform       Date:  2013-09-20       Impact factor: 11.622

6.  Functional biogeography of ocean microbes revealed through non-negative matrix factorization.

Authors:  Xingpeng Jiang; Morgan G I Langille; Russell Y Neches; Marie Elliot; Simon A Levin; Jonathan A Eisen; Joshua S Weitz; Jonathan Dushoff
Journal:  PLoS One       Date:  2012-09-18       Impact factor: 3.240

7.  Mining for class-specific motifs in protein sequence classification.

Authors:  Satish M Srinivasan; Suleyman Vural; Brian R King; Chittibabu Guda
Journal:  BMC Bioinformatics       Date:  2013-03-15       Impact factor: 3.169

Review 8.  Alignment-free sequence comparison: benefits, applications, and tools.

Authors:  Andrzej Zielezinski; Susana Vinga; Jonas Almeida; Wojciech M Karlowski
Journal:  Genome Biol       Date:  2017-10-03       Impact factor: 13.583

Review 9.  Alignment-free inference of hierarchical and reticulate phylogenomic relationships.

Authors:  Guillaume Bernard; Cheong Xin Chan; Yao-Ban Chan; Xin-Yi Chua; Yingnan Cong; James M Hogan; Stefan R Maetschke; Mark A Ragan
Journal:  Brief Bioinform       Date:  2019-03-22       Impact factor: 11.622

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.