| Literature DB >> 19111559 |
Cristian Robert Munteanu1, Alexandre L Magalhães, Eugenio Uriarte, Humberto González-Díaz.
Abstract
The cancer diagnostic is a complex process and, sometimes, the specific markers can interfere or produce negative results. Thus, new simple and fast theoretical models are required. One option is the complex network graphs theory that permits us to describe any real system, from the small molecules to the complex genetic, neural or social networks by transforming real properties in topological indices. This work converts the protein primary structure data in specific Randic's star networks topological indices using the new sequence to star networks (S2SNet) application. A set of 1054 proteins were selected from previous works and contains proteins related or not with two types of cancer, human breast cancer (HBC) and human colon cancer (HCC). The general discriminant analysis method generates an input-coded multi-target classification model with the training/predicting set accuracies of 90.0% for the forward stepwise model type. In addition, a protein subset was modified by single amino acid mutations with higher log-odds PAM250 values and tested with the new classification if can be related with HBC or HCC. In conclusion, we shown that, using simple input data such is the primary protein sequence and the simples linear analysis, it is possible to obtain accurate classification models that can predict if a new protein related with two types of cancer. These results promote the use of the S2SNet in clinical proteomics.Entities:
Mesh:
Substances:
Year: 2008 PMID: 19111559 PMCID: PMC7094125 DOI: 10.1016/j.jtbi.2008.11.017
Source DB: PubMed Journal: J Theor Biol ISSN: 0022-5193 Impact factor: 2.691
Fig. 1(A) The non-embedded star graphs for PRPS1 and (B) the embedded star graphs for PRPS1.
Single amino acid mutations and the corresponding log-odd PAM250 value.
| Original AA | Mutated AA | log-odd PAM250 | Notation |
|---|---|---|---|
| D | N | 2 | D→N/2DN |
| E | Q | 2 | E→Q/2EQ |
| F | L | 2 | F→L /2FL |
| H | N | 2 | H→N/2HN |
| H | R | 2 | H→R/2HR |
| L | I | 2 | L→I/2LI |
| M | I | 2 | M→I/2MI |
| Q | D | 2 | Q→D/2QD |
| V | L | 2 | V→L/2VL |
| V | M | 2 | V→M/2VM |
| W | R | 2 | W→R/2WR |
| E | D | 3 | E→D/3ED |
| H | Q | 3 | H→Q/3HQ |
| K | R | 3 | K→R/3KR |
| M | L | 4 | M→L/4ML |
| V | I | 4 | V→I/4VI |
| Y | F | 7 | Y→F/7YF |
Training/predicting accuracies of Cancer (C)/non-cancer (nC) models using embedded (E) and non-embedded (nE) star graph TIs, pTIs and dTIs.
| Star graph type | Attributes | Train | Cross-validation | Total | Eq. vars. | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Total (%) | Total (%) | Total (%) | |||||||||
| 90.4 | 69.4 | 88.5 | 91.4 | 66.0 | 89.1 | 90.7 | 68.6 | 88.7 | 4 | ||
| TI, | 90.4 | 68.1 | 88.3 | 90.8 | 66.0 | 88.5 | 90.5 | 67.5 | 88.4 | 5 | |
| 86.0 | 79.9 | 85.4 | 87.0 | 74.5 | 85.9 | 86.2 | 78.5 | 85.5 | 2 | ||
| TI, | 88.1 | 74.3 | 86.9 | 88.9 | 72.3 | 87.4 | 88.3 | 73.8 | 87.0 | 4 | |
| TI, | 91.1 | 66.0 | 88.8 | 91.8 | 61.7 | 89.1 | 91.3 | 64.9 | 88.9 | 6 | |
| 92.3 | 70.1 | 90.3 | 93.1 | 70.2 | 91.0 | 92.5 | 70.2 | 90.5 | 6 | ||
| TI | 92.7 | 69.4 | 90.6 | 93.3 | 70.2 | 91.2 | 92.8 | 69.6 | 90.7 | 6 | |
| 88.1 | 78.5 | 87.3 | 88.3 | 76.6 | 87.2 | 88.2 | 78.0 | 87.3 | 4 | ||
| TI | 91.4 | 75.7 | 89.9 | 91.8 | 74.5 | 90.3 | 91.5 | 75.4 | 90.0 | 5 | |
| TI | 93.1 | 68.1 | 90.8 | 93.3 | 66.0 | 90.8 | 93.1 | 67.5 | 90.8 | 8 | |
| 90.2 | 70.1 | 88.4 | 91.2 | 68.1 | 89.1 | 90.5 | 69.6 | 88.6 | 4 | ||
| TI, TI | 92.3 | 68.8 | 90.1 | 92.0 | 66.0 | 89.7 | 92.2 | 68.1 | 90.0 | 8 | |
| 90.3 | 78.5 | 89.2 | 90.6 | 76.6 | 89.3 | 90.4 | 78.0 | 89.2 | 6 | ||
| TI, TI | 90.9 | 72.9 | 89.3 | 91.4 | 72.3 | 89.7 | 91.1 | 72.8 | 89.4 | 7 | |
| TI, TI | 92.3 | 68.8 | 90.1 | 92.2 | 70.2 | 90.3 | 92.3 | 69.1 | 90.2 | 8 | |
Accuracy of input-coded multi-target and individual HBC and HCC classification models based on the embedded TIs (TIe+dTIe and pTIe).
| Eq. | Cancer | Correct | Incorrect | Accuracy (%) |
|---|---|---|---|---|
| TI | ||||
| 14 | Both | 307 | 1795 | 90.0 |
| 14a | HBC | 168 | 880 | 91.8 |
| 14b | HCC | 139 | 915 | 88.2 |
| 15 | Both | 277 | 1825 | 90.5 |
| 15a | HBC | 170 | 878 | 91.8 |
| 15b | HCC | 107 | 947 | 89.2 |
Fig. 2Graphical representation of two-way joining cluster analysis of the HBC probability after the mutations.
Fig. 3Graphical representation of reduced values of the reordered data matrix by tw–JCA method for HBC probability after the mutations.
Fig. 4Graphical representation of two-way joining cluster analysis of the HCC probability after the mutations.
Fig. 5Graphical representation of reduced values of the reordered data matrix by tw–JCA method for HCC probability after the mutations.
Fig. 6Graphical representation of two-way joining cluster analysis of the probability of the mutated HBC-related proteins to turn into non-cancer proteins.
Fig. 7Graphical representation of two-way joining cluster analysis of the probability of the mutated HCC-related proteins to turn into non-cancer proteins.