| Literature DB >> 23690747 |
Amrita Roy Choudhury1, Nikolay Zhukov, Marjana Novič.
Abstract
Graphical bioinformatics has paved a unique way of mathematical characterization of proteins and proteomic maps. The graphics representations and the corresponding mathematical descriptors have proved to be useful and have provided unique solutions to problems related to identification, comparisons, and analyses of protein sequences and proteomics maps. Based on sequence information alone, these descriptors are independent from physiochemical properties of amino acids and evolutionary information. In this work, we have presented invariants from amino acid adjacency matrix and decagonal isometries matrix as potential descriptors of protein sequences. Encoding protein sequences into amino acid adjacency matrix is already well established. We have shown its application in classification of transmembrane and nontransmembrane regions of membrane protein sequences. We have introduced the dodecagonal isometries matrix, which is a novel method of encoding protein sequences based on decagonal isometries group.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23690747 PMCID: PMC3649804 DOI: 10.1155/2013/607830
Source DB: PubMed Journal: ScientificWorldJournal ISSN: 1537-744X
Figure 1Amino acid adjacency matrix. The 20 × 20 matrix presenting the amino acid adjacency and abundance information in the given sequence. The nonzero elements show the number of times that corresponding amino acids are present adjacent to each other. The 20-dimensional row sum vector is used as a descriptor to numerically characterize the protein sequence.
Figure 2Decagonal isometries matrix D10. (a) and (b) show correspondance between group elements and amino acids and indicate the initial step of coding. Decagons below show the explicit transformation for the given step (WW → WWN) and the resultant DIM (c). The 20-dimensional vector constructed from the matrix is used as a descriptor to numerically characterize the protein sequence (d).
Training and test sets.
| Sets | Number of segments | ||
|---|---|---|---|
| Total segments | Transmembrane | Nontransmembrane | |
| Training | 4204 | 1867 | 2337 |
| Test | 450 | 200 | 250 |
Figure 3Principal component analysis. The transmembrane (black) and nontransmembrane (blue) segments form two different clusters.
Figure 4Top map of the optimized network. The transmembrane (green) and nontransmembrane (brown) segments form two different clusters. Empty neurons are dark blue.
Classification model using amino acid adjacency matrix.
| Sets | Network results | ||
|---|---|---|---|
| Total segments | Segments correctly classified | % error | |
| Training | 4204 | 4022 | 4.33 |
| Test | 450 | 411 | 8.67 |
Classification model using decagonal isometries matrix.
| Sets | Network results | |
|---|---|---|
| Correlation coefficient ( | % error | |
| Training | 0.61 | 14.3 |
| Test | 0.255 | 27.1 |