Literature DB >> 8019421

Self-organized neural maps of human protein sequences.

E A Ferrán1, B Pflugfelder, P Ferrara.   

Abstract

We have recently described a method based on artificial neural networks to cluster protein sequences into families. The network was trained with Kohonen's unsupervised learning algorithm using, as inputs, the matrix patterns derived from the dipeptide composition of the proteins. We present here a large-scale application of that method to classify the 1,758 human protein sequences stored in the SwissProt database (release 19.0), whose lengths are greater than 50 amino acids. In the final 2-dimensional topologically ordered map of 15 x 15 neurons, proteins belonging to known families were associated with the same neuron or with neighboring ones. Also, as an attempt to reduce the time-consuming learning procedure, we compared 2 learning protocols: one of 500 epochs (100 SUN CPU-hours [CPU-h]), and another one of 30 epochs (6.7 CPU-h). A further reduction of learning-computing time, by a factor of about 3.3, with similar protein clustering results, was achieved using a matrix of 11 x 11 components to represent the sequences. Although network training is time consuming, the classification of a new protein in the final ordered map is very fast (14.6 CPU-seconds). We also show a comparison between the artificial neural network approach and conventional methods of biosequence analysis.

Entities:  

Mesh:

Substances:

Year:  1994        PMID: 8019421      PMCID: PMC2142706          DOI: 10.1002/pro.5560030316

Source DB:  PubMed          Journal:  Protein Sci        ISSN: 0961-8368            Impact factor:   6.725


  42 in total

1.  An assessment of neural network and statistical approaches for prediction of E. coli promoter sites.

Authors:  P B Horton; M Kanehisa
Journal:  Nucleic Acids Res       Date:  1992-08-25       Impact factor: 16.971

2.  Predicting protein secondary structure using neural net and statistical methods.

Authors:  P Stolorz; A Lapedes; Y Xia
Journal:  J Mol Biol       Date:  1992-05-20       Impact factor: 5.469

3.  Exhaustive matching of the entire protein sequence database.

Authors:  G H Gonnet; M A Cohen; S A Benner
Journal:  Science       Date:  1992-06-05       Impact factor: 47.728

4.  Determination of eukaryotic protein coding regions using neural networks and information theory.

Authors:  R Farber; A Lapedes; K Sirotkin
Journal:  J Mol Biol       Date:  1992-07-20       Impact factor: 5.469

5.  Prediction of protein secondary structure by an enhanced neural network.

Authors:  M Vieth; A Koliński
Journal:  Acta Biochim Pol       Date:  1991       Impact factor: 2.149

6.  Protein classification artificial neural system.

Authors:  C Wu; G Whitson; J McLarty; A Ermongkonchai; T C Chang
Journal:  Protein Sci       Date:  1992-05       Impact factor: 6.725

7.  Identification of coding regions in genomic DNA sequences: an application of dynamic programming and neural networks.

Authors:  E E Snyder; G D Stormo
Journal:  Nucleic Acids Res       Date:  1993-02-11       Impact factor: 16.971

8.  G+C-rich tract in 5' end of human introns.

Authors:  J Engelbrecht; S Knudsen; S Brunak
Journal:  J Mol Biol       Date:  1992-09-05       Impact factor: 5.469

9.  The C. elegans genome sequencing project: a beginning.

Authors:  J Sulston; Z Du; K Thomas; R Wilson; L Hillier; R Staden; N Halloran; P Green; J Thierry-Mieg; L Qiu
Journal:  Nature       Date:  1992-03-05       Impact factor: 49.962

Review 10.  Prediction of structural and functional features of protein and nucleic acid sequences by artificial neural networks.

Authors:  J D Hirst; M J Sternberg
Journal:  Biochemistry       Date:  1992-08-18       Impact factor: 3.162

View more
  6 in total

1.  Exploring the nonlinear geometry of protein homology.

Authors:  Michael A Farnum; Huafeng Xu; Dimitris K Agrafiotis
Journal:  Protein Sci       Date:  2003-08       Impact factor: 6.725

2.  Self-organizing tree-growing network for the classification of protein sequences.

Authors:  H C Wang; J Dopazo; L G de la Fraga; Y P Zhu; J M Carazo
Journal:  Protein Sci       Date:  1998-12       Impact factor: 6.725

3.  Back-propagation and counter-propagation neural networks for phylogenetic classification of ribosomal RNA sequences.

Authors:  C Wu; S Shivakumar
Journal:  Nucleic Acids Res       Date:  1994-10-11       Impact factor: 16.971

4.  The distance-profile representation and its application to detection of distantly related protein families.

Authors:  Chin-Jen Ku; Golan Yona
Journal:  BMC Bioinformatics       Date:  2005-11-29       Impact factor: 3.169

5.  KemaDom: a web server for domain prediction using kernel machine with local context.

Authors:  Lusheng Chen; Wei Wang; Shaoping Ling; Caiyan Jia; Fei Wang
Journal:  Nucleic Acids Res       Date:  2006-07-01       Impact factor: 16.971

6.  A novel bioinformatics strategy for function prediction of poorly-characterized protein genes obtained from metagenome analyses.

Authors:  Takashi Abe; Shigehiko Kanaya; Hiroshi Uehara; Toshimichi Ikemura
Journal:  DNA Res       Date:  2009-10-03       Impact factor: 4.458

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.