Literature DB >> 1880802

A new family of powerful multivariate statistical sequence analysis techniques.

M van Heel1.   

Abstract

A novel multivariate statistical approach is presented for extracting and exploiting intrinsic information present in our ever-growing sequence data banks. The information extraction from the sequences avoids the pitfalls of intersequence alignment by analyzing secondary invariant functions derived from the sequences in the data bank rather than the sequences themselves. Such typical invariant function is a 20 x 20 histogram of occurrences of amino acid pairs in a given sequence or fragment thereof. To illustrate the potential of the approach an analysis of 10,000 protein sequences from the National Biomedical Research Foundation Protein Identification Resource is presented, whose analysis already reveals great biological detail. For example, zeta-hemoglobin is found to lie close to amphibian and fish chi-hemoglobin which, in turn, is an important clue to the physiological function of this mammalian early embryonic hemoglobin. The multivariate statistical framework presented unifies such apparently unrelated issues as phylogenetic comparisons between a set of sequences and distance matrices between the constituents of the biological sequences. The Multivariate Statistical Sequence Analysis (MSSA) principles can be used for a wide spectrum of sequence analysis problems such as: assignment of family memberships to new sequences, validation of new incoming sequences to be entered into the database, prediction of structure from sequence, discrimination of coding from non-coding DNA regions, and automatic generation of an atlas of protein or DNA sequences. The MSSA techniques represent a self-contained approach to learning continuously and automatically from the growing stream of new sequences. The MSSA approach is particularly likely to play a significant role in major sequencing efforts such as the human genome project.

Entities:  

Mesh:

Substances:

Year:  1991        PMID: 1880802     DOI: 10.1016/0022-2836(91)90360-i

Source DB:  PubMed          Journal:  J Mol Biol        ISSN: 0022-2836            Impact factor:   5.469


  19 in total

1.  Use of residue pairs in protein sequence-sequence and sequence-structure alignments.

Authors:  J Jung; B Lee
Journal:  Protein Sci       Date:  2000-08       Impact factor: 6.725

2.  Metagenomic Classification Using an Abstraction Augmented Markov Model.

Authors:  Xiujun Sylvia Zhu; Monnie McGee
Journal:  J Comput Biol       Date:  2015-11-30       Impact factor: 1.479

3.  Sequence physical properties encode the global organization of protein structure space.

Authors:  S Rackovsky
Journal:  Proc Natl Acad Sci U S A       Date:  2009-08-12       Impact factor: 11.205

4.  Topological maps of protein sequences.

Authors:  E A Ferrán; P Ferrara
Journal:  Biol Cybern       Date:  1991       Impact factor: 2.086

5.  Self-organizing tree-growing network for the classification of protein sequences.

Authors:  H C Wang; J Dopazo; L G de la Fraga; Y P Zhu; J M Carazo
Journal:  Protein Sci       Date:  1998-12       Impact factor: 6.725

6.  Prediction of G protein-coupled receptor encoding sequences from the synganglion transcriptome of the cattle tick, Rhipicephalus microplus.

Authors:  Felix D Guerrero; Anastasia Kellogg; Alexandria N Ogrey; Andrew M Heekin; Roberto Barrero; Matthew I Bellgard; Scot E Dowd; Ming-Ying Leung
Journal:  Ticks Tick Borne Dis       Date:  2016-02-22       Impact factor: 3.744

7.  Protein sequence randomness and sequence/structure correlations.

Authors:  R S Rahman; S Rackovsky
Journal:  Biophys J       Date:  1995-04       Impact factor: 4.033

8.  Kinome-wide interaction modelling using alignment-based and alignment-independent approaches for kinase description and linear and non-linear data analysis techniques.

Authors:  Maris Lapins; Jarl Es Wikberg
Journal:  BMC Bioinformatics       Date:  2010-06-22       Impact factor: 3.169

9.  Spectral diffusion and electron-phonon coupling of the B800 BChl a molecules in LH2 complexes from three different species of purple bacteria.

Authors:  J Baier; M Gabrielsen; S Oellerich; H Michel; M van Heel; R J Cogdell; J Köhler
Journal:  Biophys J       Date:  2009-11-04       Impact factor: 4.033

10.  A novel alignment-free method for comparing transcription factor binding site motifs.

Authors:  Minli Xu; Zhengchang Su
Journal:  PLoS One       Date:  2010-01-20       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.