| Literature DB >> 32288048 |
Chun-Min Hung1, Yueh-Min Huang1, Ming-Shi Chang2.
Abstract
A hybrid evolutionary model is used to propose a hierarchical homology of protein sequences to identify protein functions systematically. The proposed model offers considerable potentials, considering the inconsistency of existing methods for predicting novel proteins. Because some novel proteins might align without meaningful conserved domains, maximizing the score of sequence alignment is not the best criterion for predicting protein functions. This work presents a decision model that can minimize the cost of making a decision for predicting protein functions using the hierarchical homologies. Particularly, the model has three characteristics: (i) it is a hybrid evolutionary model with multiple fitness functions that uses genetic programming to predict protein functions on a distantly related protein family, (ii) it incorporates modified robust point matching to accurately compare all feature points using the moment invariant and thin-plate spline theorems, and (iii) the hierarchical homologies holding up a novel protein sequence in the form of a causal tree can effectively demonstrate the relationship between proteins. This work describes the comparisons of nucleocapsid proteins from the putative polyprotein SARS virus and other coronaviruses in other hosts using the model.Entities:
Keywords: Bioinformatics protein databases; Evolutionary computing and genetic algorithms; Invariants; Moments; Splines
Year: 2005 PMID: 32288048 PMCID: PMC7117053 DOI: 10.1016/j.na.2005.09.048
Source DB: PubMed Journal: Nonlinear Anal Theory Methods Appl ISSN: 0362-546X Impact factor: 2.064
Fig. 1Causal tree of protein function.
Fig. 2Representation of protein fragments.
Fig. 3Correspondence matrix for a causal tree.
Fig. 4Inner-exchanged strategy of subpopulations in a population.
Fig. 5AGCT algorithm.
Target sequences
| S. No. | Name | Accession | Definition | Source |
|---|---|---|---|---|
| 1 | STW1 | AAP37015 | Putative polyprotein region: 819..3240 (putative nsp1) | SARS coronavirus TW1 |
| 2 | HcoV | NP_835345 | Coronavirus p195/p210 protein (nsp1) | Human coronavirus 229E |
| 3 | TGEV | NP_840002 | Putative coronavirus nsp1 | Transmissible gastroenteritis virus |
| 4 | BcoV | NP_742169 | Coronavirus nsp1 (PL1-PRO, PL2-PRO, HD) | Bovine coronavirus |
| 5 | MHV | NP_740609 | Coronavirus nsp1 (PL1-PRO, PL2-PRO, HD, HD-1 ) | Murine hepatitis virus |
| 6 | PEDV | NP_839958 | Putative coronavirus nsp1 | Porcine epidemic diarrhea virus |
| 7 | AIBV | NP_740622; | Coronavirus nsp1 (HD1) hydrophobic domain 1 | Avian infectious bronchitis virus |
Test parameters of AGCT model
| Case | Subpopulation size | Generation | Tr | ||||||
|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 2 | |||||||
| Population | Migration | Population | Migration | Population | Migration | ||||
| I | 10 | 2 | 10 | 2 | 8 | 1 | 50 | 1..0.1 | 0.97..0.8 |
| II | 100 | 10 | 50 | 10 | 50 | 10 | 100 | ||
| III | 20 | 10 | 20 | 10 | 20 | 10 | 100 | ||
| IV | 100 | 20 | 50 | 30 | 50 | 30 | 200 | ||
| V | 10 | 2 | 10 | 2 | 8 | 1 | 200 | ||
Test results of AGCT model
| Case no. and name | Subpopulation | Convergence generation | ||||
|---|---|---|---|---|---|---|
| 0 | 1 | 2 | ||||
| Energy | Sensitivity (%) | Specificity (%) | Energy | Energy | ||
| Case 1, I | 0.515 | 8.5 | 93.8 | 0.316 | 0.182 | 40 |
| Case 2, II | 0.419 | 16.2 | 94.2 | 0.210 | 0 | 30–50 |
| Case 3, III | 0.403 | 11.3 | 94.6 | 0.210 | 0 | 30–100 |
| Case 4, IV | 0.426 | 11.2 | 94.0 | 0.301 | 0.150 | 100–120 |
| Case 5, V | 0.413 | 10.8 | 94.4 | 0.212 | 0 | 40–80 |
Fig. 6Performance of convergence on fixed temperature in each subpopulation.
Fig. 7Comparison of energy convergence for fixed temperature and temperature that is decreased at various rates.
Fig. 8Comparison of variedly decreasing rate of temperature in case II.
Fig. 9Comparison of convergence case by case with varied rate of temperature decrease.
Fig. 10Comparison of convergence in each subpopulation for the AGCT model using multiple fitness functions.