| Literature DB >> 17298664 |
Abstract
BACKGROUND: Identification of coordinately regulated genes according to the level of their expression during the time course of a process allows for discovering functional relationships among genes involved in the process.Entities:
Mesh:
Year: 2007 PMID: 17298664 PMCID: PMC1804277 DOI: 10.1186/1471-2164-8-49
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Comparison of classification performance of the parallel and sequential GA with other classification algorithms |x| = 30.
| Pattern | Sequent GA | Parallel GA (2 islands) | Parallel GA (4 islands) | Binary SVM | Single SVM | LogitBoost | LR | LDA | LS | |||||||||
| 1 | 1 | 0.9874 | 1 | 0.988 | 1 | 0.9902 | 1 | 0.987 | 0.95 | 0.8576 | 1 | 0.837 | 1 | 0.8168 | 1 | 0.8664 | 0.9875 | 0.8666 |
| 2 | 1 | 0.9764 | 1 | 0.9772 | 1 | 0.9784 | 1 | 0.9766 | 0.95 | 0.6488 | 0.9875 | 0.8296 | 1 | 0.2624 | 0.9375 | 0.8766 | 0.9 | 0.8768 |
| 3 | 1 | 0.9644 | 1 | 0.966 | 1 | 0.9708 | 1 | 0.9682 | 0.8375 | 0.9378 | 0.9625 | 0.9018 | 0.9625 | 0.8876 | 0.95 | 0.9016 | 0.9375 | 0.9018 |
Sequential GA, parallel GA (2 and 4 islands mode) and support vector machines (SVM binary and single), logitBoost, linear discriminant analysis (LDA), logistic regression (LR), and linear least squares regression (LS) for three sets of template vectors of different dimensions were tested. Se and Sp are defined in Eq. 7 and 8.
Comparison of classification performance of the parallel and sequential GA with other classification algorithms |x| = 40.
| Pattern | Sequent GA | Parallel GA (2 islands) | Parallel GA (4 islands) | Binary SVM | Single SVM | LogitBoost | LR | LDA | LS | |||||||||
| 01 | 1 | 0.9786 | 1 | 0.9862 | 1 | 0.9922 | 1 | 0.9884 | 0.925 | 0.959 | 1 | 0.4906 | 1 | 0.6602 | 1 | 0.2076 | 0.9625 | 0.6234 |
| 02 | 1 | 0.9836 | 1 | 0.9858 | 1 | 0.9866 | 1 | 0.9556 | 0.8375 | 0.9918 | 0.9875 | 0.6846 | 1 | 0.326 | 1 | 0.2152 | 0.975 | 0.8174 |
| 03 | 1 | 0.9928 | 1 | 0.995 | 1 | 0.9972 | 1 | 0.9844 | 0.8875 | 0.8452 | - | - | 1 | 0.7042 | 1 | 0.3732 | 0.3375 | 0.3866 |
Sequential GA, parallel GA (2 and 4 islands mode) and support vector machines (SVM binary and single), logitBoost, linear discriminant analysis (LDA), logistic regression (LR), and linear least squares regression (LS) for three sets of template vectors of different dimensions were tested. Se and Sp are defined in Eq. 7 and 8.
Figure 1Three 30 (a), and 40 (b) – dimensional patterns used to test the algorithm. A training set of 20 patterns from each of the templates was created by adding 50% random Gaussian noise to these templates.
Figure 2SVD corrected gene expression profiles of 11 antibiotic and secondary metabolite gene clusters. The horizontal axis represents time points, whereas the vertical axis represents normalized gene expression level. Expression profiles of RED and coelicheline chromosomal clusters were used in training of the algorithm.
Figure 3Scores for identification of expression profiles using (a) coelicheline gene cluster profiles as a training set and (b) RED gene cluster profiles. The procedure was performed 200 times with random initiation of the chromosome. If a profile was identified in each run it got a score of 200, if it was not found at all it got a score of zero. Vertical axis – scores, horizontal axis – genes sorted according to decreasing scores.
Results of the search of eleven secondary metabolic gene clusters of S. coelicolor using the GA algorithm trained with kinetic profiles of RED antibiotic and coelicheline gene clusters (in bold).
| cluster | SCO beginning | SCO end | n | RED template | % | Coelicheline template | % |
| CAD complex | 3210 | 3249 | 39 | 28 | 72 | 2 | 5 |
| whiE | 5327 | 5350 | 5 | 0 | 0 | 0 | 0 |
| 5877 | 5898 | 22 | NA | NA | 2 | 9 | |
| desferioxamines | 2782 | 2785 | 4 | 1 | 25 | 3 | 75 |
| 489 | 499 | 11 | 3 | 27 | NA | NA | |
| TW95a | 5314 | 5320 | 7 | 1 | 14 | 0 | 0 |
| isorenicratein | 185 | 191 | 7 | 0 | 0 | 0 | 0 |
| eicosapentoic acid | 124 | 129 | 6 | 0 | 0 | 0 | 0 |
| NRPS | 6429 | 6438 | 9 | 0 | 0 | 0 | 0 |
| siderophore synthase | 5799 | 5801 | 3 | 0 | 0 | 0 | 0 |
| deoxysugar synthase | 381 | 401 | 21 | 1 | 5 | 12 | 57 |
Gene clusters for act, coelibactine, tetrahydroxy naftalene, type I polyketide, chalcone synthase, sesquiterpene, type III fatty acid synthase were not present in the chip data matrix. Geosmine and butyrolactone represented only one gene and were therefore excluded from evaluation. SCO beginning and end represent beginning and end of the gene cluster on the chromosome, where n = number of genes in a gene cluster, RED template refers to the number of genes of a gene cluster identified using RED gene cluster as a training set, and coelicheline template refers to the number of genes of a gene cluster identified using the coelicheline gene cluster as a training set.
Figure 4Expression profiles of RED gene cluster (black) and ECR genes identified by Huang et al. [21] (red).
Figure 5Overall scheme of the algorithm.