| Literature DB >> 23351383 |
Kyu-Baek Hwang1, Beom-Yong Ha, Sanghun Ju, Sangsoo Kim.
Abstract
Identifying genes indispensable for an organism's life and their characteristics is one of the central questions in current biological research, and hence it would be helpful to develop computational approaches towards the prediction of essential genes. The performance of a predictor is usually measured by the area under the receiver operating characteristic curve (AUC). We propose a novel method by implementing genetic algorithms to maximize the partial AUC that is restricted to a specific interval of lower false positive rate (FPR), the region relevant to follow-up experimental validation. Our predictor uses various features based on sequence information, protein-protein interaction network topology, and gene expression profiles. A feature selection wrapper was developed to alleviate the over-fitting problem and to weigh each feature's relevance to prediction. We evaluated our method using the proteome of budding yeast. Our implementation of genetic algorithms maximizing the partial AUC below 0.05 or 0.10 of FPR outperformed other popular classification methods.Entities:
Mesh:
Year: 2013 PMID: 23351383 PMCID: PMC4133830 DOI: 10.5483/bmbrep.2013.46.1.159
Source DB: PubMed Journal: BMB Rep ISSN: 1976-6696 Impact factor: 4.778
The number of timesa each feature has been selected in each fold
| Features | Fold number | Average | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
|
| |||||||||||
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | ||
|
| |||||||||||
| Cytosol1 | 17 | 20 | 23 | 3 | 23 | 6 | 7 | 13 | 14 | 5 | 13.1 |
| Extracellular1 | 0 | 2 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0.3 |
| Plasma membrane1 | 2 | 5 | 6 | 1 | 14 | 4 | 21 | 3 | 16 | 12 | 8.4 |
| Mitochondria1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0.1 |
| Nucleus1 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 |
| TM_HELIX2 | 44 | 44 | 43 | 16 | 47 | 43 | 38 | 45 | 50 | 18 | 37.8 |
| T3s3 | 38 | 21 | 23 | 28 | 38 | 10 | 7 | 19 | 16 | 25 | 22.5 |
| C3s3 | 29 | 14 | 14 | 24 | 23 | 14 | 7 | 13 | 15 | 23 | 17.6 |
| A3s3 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 49 | 49.9 |
| G3s3 | 19 | 29 | 32 | 22 | 21 | 49 | 43 | 41 | 33 | 49 | 33.8 |
| CAI3 | 2 | 0 | 1 | 6 | 1 | 0 | 3 | 2 | 1 | 2 | 1.8 |
| CBI3 | 34 | 10 | 24 | 27 | 43 | 19 | 38 | 22 | 18 | 30 | 26.5 |
| Fop3 | 12 | 42 | 18 | 22 | 10 | 32 | 11 | 25 | 29 | 19 | 22 |
| Nc3 | 15 | 0 | 5 | 1 | 8 | 2 | 7 | 3 | 1 | 1 | 4.3 |
| GC3s3 | 39 | 24 | 19 | 27 | 36 | 12 | 8 | 16 | 17 | 18 | 21.6 |
| GC3 | 2 | 2 | 17 | 0 | 11 | 2 | 2 | 10 | 7 | 9 | 6.2 |
| L_sym3 | 4 | 6 | 16 | 9 | 2 | 9 | 4 | 9 | 17 | 7 | 8.3 |
| L_aa3 | 4 | 6 | 15 | 9 | 2 | 9 | 4 | 9 | 17 | 6 | 8.1 |
| Gravy3 | 50 | 49 | 49 | 49 | 50 | 50 | 48 | 49 | 48 | 47 | 48.9 |
| Aromo3 | 28 | 3 | 7 | 0 | 6 | 34 | 5 | 16 | 8 | 2 | 10.9 |
| RAREAMINO_ACID4 | 9 | 4 | 2 | 0 | 6 | 6 | 0 | 5 | 2 | 0 | 3.4 |
| CLOSE STOP RAIO4 | 46 | 44 | 46 | 50 | 50 | 48 | 50 | 45 | 46 | 47 | 47.2 |
| Phyletic retention5 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 |
| DegreeK6 | 2 | 5 | 0 | 6 | 3 | 7 | 2 | 29 | 8 | 1 | 6.3 |
| CCo6 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 3 | 2 | 1.7 |
| BC6 | 5 | 4 | 3 | 43 | 4 | 3 | 0 | 5 | 0 | 2 | 6.9 |
| CC6 | 33 | 26 | 34 | 38 | 25 | 37 | 16 | 50 | 41 | 25 | 32.5 |
| KL6 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 |
| EI6 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 |
| Mean7 | 49 | 48 | 50 | 47 | 45 | 41 | 47 | 50 | 49 | 47 | 47.3 |
| Variance7 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 |
aThe number of times a feature was selected. The colored background highlights the cells that were selected either every time (dark blue), more than 90% (puple), or 80% (cyan) of the 50 trials in each fold.1subcellular localization probabilities calculated with ConLoc. 2Number of transmembrane helices calculated with TMHMM. 3Condon usage freqeuncies calculated with CondonW. 4Ratios derived from the results of CondonW. 5Number of organisms having an ortholog. 6Protein-protein interaction network features. 7Gene expression features.