Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 A new family of powerful multivariate statistical sequence analysis techniques.

Literature DB >> 1880802

A new family of powerful multivariate statistical sequence analysis techniques.

Abstract

A novel multivariate statistical approach is presented for extracting and exploiting intrinsic information present in our ever-growing sequence data banks. The information extraction from the sequences avoids the pitfalls of intersequence alignment by analyzing secondary invariant functions derived from the sequences in the data bank rather than the sequences themselves. Such typical invariant function is a 20 x 20 histogram of occurrences of amino acid pairs in a given sequence or fragment thereof. To illustrate the potential of the approach an analysis of 10,000 protein sequences from the National Biomedical Research Foundation Protein Identification Resource is presented, whose analysis already reveals great biological detail. For example, zeta-hemoglobin is found to lie close to amphibian and fish chi-hemoglobin which, in turn, is an important clue to the physiological function of this mammalian early embryonic hemoglobin. The multivariate statistical framework presented unifies such apparently unrelated issues as phylogenetic comparisons between a set of sequences and distance matrices between the constituents of the biological sequences. The Multivariate Statistical Sequence Analysis (MSSA) principles can be used for a wide spectrum of sequence analysis problems such as: assignment of family memberships to new sequences, validation of new incoming sequences to be entered into the database, prediction of structure from sequence, discrimination of coding from non-coding DNA regions, and automatic generation of an atlas of protein or DNA sequences. The MSSA techniques represent a self-contained approach to learning continuously and automatically from the growing stream of new sequences. The MSSA approach is particularly likely to play a significant role in major sequencing efforts such as the human genome project.

Entities: Species

Mesh：

Substances：
Hemoglobins

Year: 1991 PMID： 1880802 DOI： 10.1016/0022-2836(91)90360-i

Source DB: PubMed Journal: J Mol Biol ISSN： 0022-2836 Impact factor: 5.469

Keyword Cloud
Cited

19 in total

1. Use of residue pairs in protein sequence-sequence and sequence-structure alignments.

Authors: J Jung; B Lee
Journal: Protein Sci Date: 2000-08 Impact factor: 6.725

2. Metagenomic Classification Using an Abstraction Augmented Markov Model.

Authors: Xiujun Sylvia Zhu; Monnie McGee
Journal: J Comput Biol Date: 2015-11-30 Impact factor: 1.479

3. Sequence physical properties encode the global organization of protein structure space.

Authors: S Rackovsky
Journal: Proc Natl Acad Sci U S A Date: 2009-08-12 Impact factor: 11.205

4. Topological maps of protein sequences.

Authors: E A Ferrán; P Ferrara
Journal: Biol Cybern Date: 1991 Impact factor: 2.086

5. Self-organizing tree-growing network for the classification of protein sequences.

Authors: H C Wang; J Dopazo; L G de la Fraga; Y P Zhu; J M Carazo
Journal: Protein Sci Date: 1998-12 Impact factor: 6.725

6. Prediction of G protein-coupled receptor encoding sequences from the synganglion transcriptome of the cattle tick, Rhipicephalus microplus.

Authors: Felix D Guerrero; Anastasia Kellogg; Alexandria N Ogrey; Andrew M Heekin; Roberto Barrero; Matthew I Bellgard; Scot E Dowd; Ming-Ying Leung
Journal: Ticks Tick Borne Dis Date: 2016-02-22 Impact factor: 3.744

7. Protein sequence randomness and sequence/structure correlations.

Authors: R S Rahman; S Rackovsky
Journal: Biophys J Date: 1995-04 Impact factor: 4.033

8. Kinome-wide interaction modelling using alignment-based and alignment-independent approaches for kinase description and linear and non-linear data analysis techniques.

Authors: Maris Lapins; Jarl Es Wikberg
Journal: BMC Bioinformatics Date: 2010-06-22 Impact factor: 3.169

9. Spectral diffusion and electron-phonon coupling of the B800 BChl a molecules in LH2 complexes from three different species of purple bacteria.

Authors: J Baier; M Gabrielsen; S Oellerich; H Michel; M van Heel; R J Cogdell; J Köhler
Journal: Biophys J Date: 2009-11-04 Impact factor: 4.033

10. A novel alignment-free method for comparing transcription factor binding site motifs.

Authors: Minli Xu; Zhengchang Su
Journal: PLoS One Date: 2010-01-20 Impact factor: 3.240