| Literature DB >> 11241220 |
Z P Feng1.
Abstract
A new representation of protein sequence is devoted in this paper, in which each protein can be represented by a 20-dimensional (20D) vector of unit length. Inspired by the principle of superposition of state in quantum mechanics, the squares of the 20 components of the vector correspond to the amino acid composition. Using the new representation of the primary sequence and Bayes Discriminant Algorithm, the subcellular location of prokaryotic proteins was predicted. The overall predictive accuracy in the jackknife test can be 3% higher than the result of using amino acid composition directly for the database of sequence identity is less than 90%, but 5% higher when sequence identity is less than 80%. The higher predictive accuracy indicates that the current measure of extracting the information from the primary sequence is efficient. Since the subcellular location restricting a protein's possible function, the present method should also be a useful measure for the systematic analysis of genome data. The program used in this paper is available on request.Mesh:
Substances:
Year: 2001 PMID: 11241220 DOI: 10.1002/1097-0282(20010415)58:5<491::AID-BIP1024>3.0.CO;2-I
Source DB: PubMed Journal: Biopolymers ISSN: 0006-3525 Impact factor: 2.505