| Literature DB >> 21849049 |
Krishna Kumar Kandaswamy1, Ganesan Pugalenthi, Mehrnaz Khodam Hazrati, Kai-Uwe Kalies, Thomas Martinetz.
Abstract
BACKGROUND: Bioluminescence is a process in which light is emitted by a living organism. Most creatures that emit light are sea creatures, but some insects, plants, fungi etc, also emit light. The biotechnological application of bioluminescence has become routine and is considered essential for many medical and general technological advances. Identification of bioluminescent proteins is more challenging due to their poor similarity in sequence. So far, no specific method has been reported to identify bioluminescent proteins from primary sequence.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21849049 PMCID: PMC3176267 DOI: 10.1186/1471-2105-12-345
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Workflow of BLProt.
Performance of the SVM using different feature subsets selected by ReliefF
| Feature subset | Sensitivity | Specificity | MCC | Test Accuracy (%) | CV Accuracy |
|---|---|---|---|---|---|
| 75 features | 69.50 | 77.13 | 0.4663 | 73.86 | 77.16 |
| 100 features | 74.47 | 84.21 | 0.5904 | 80.06 | 80.00 |
| 200 features | 68.09 | 81.58 | 0.5022 | 75.83 | 78.00 |
| 300 features | 67.38 | 82.11 | 0.5017 | 75.83 | 78.67 |
| 400 features | 64.54 | 86.32 | 0.5260 | 77.04 | 78.00 |
| 500 features | 65.96 | 85.79 | 0.5323 | 77.34 | 78.00 |
| All features | 63.12 | 78.19 | 0.4182 | 71.73 | 75.16 |
MCC - Matthew's correlation coefficient, CV-Cross validation
Performance of the SVM using different feature subsets selected by Info Gain
| Feature subset | Sensitivity | Specificity | MCC | Test Accuracy (%) | CV |
|---|---|---|---|---|---|
| 100 features | 69.50 | 74.21 | 0.4351 | 72.21 | 74.83 |
| 200 features | 76.60 | 75.79 | 0.5193 | 76.13 | 78.00 |
| 300 features | 70.92 | 77.37 | 0.4821 | 74.62 | 78.33 |
| 400 features | 68.09 | 77.89 | 0.4611 | 73.72 | 78.17 |
| 500 features | 68.09 | 84.21 | 0.5326 | 77.34 | 78.33 |
| All features | 63.12 | 78.19 | 0.4182 | 71.73 | 75.16 |
MCC - Matthew's correlation coefficient, CV-Cross validation
Performance of the SVM using different feature subsets selected by mRMR
| Feature subset | Sensitivity | Specificity | MCC | Test Accuracy (%) | CV Accuracy |
|---|---|---|---|---|---|
| 100 features | 65.96 | 84.21 | 0.5134 | 76.44 | 78.33 |
| 200 features | 65.25 | 84.74 | 0.5132 | 76.44 | 78.5 |
| 300 features | 65.96 | 83.68 | 0.5072 | 76.13 | 78.5 |
| 400 features | 65.96 | 83.68 | 0.5072 | 76.13 | 78.33 |
| 500 features | 65.96 | 83.68 | 0.5072 | 76.13 | 78.5 |
| All features | 63.12 | 78.19 | 0.4182 | 71.73 | 75.16 |
MCC - Matthew's correlation coefficient, CV-Cross validation
Figure 2ROC Plot for SVM models using all and the top 100 features (ReliefF).
Prediction result for 9 potential Bioluminescent proteins
| GI | BLProt | PSI-BLAST | HMM | Source of annotation |
|---|---|---|---|---|
| 156529049 | BLP | Non-BLP | BLP | INTERPRO |
| 37528019 | BLP | BLP | Non-BLP | KEGG |
| 37528018 | BLP | BLP | BLP | CDD |
| 45440453 | BLP | Non-BLP | BLP | INTERPRO |
| 45440453 | BLP | Non-BLP | BLP | INTERPRO |
| 153796564 | BLP | Non-BLP | Non-BLP | INTERPRO |
| 49257059 | BLP | BLP | BLP | CDD |
| 159576911 | BLP | BLP | Non-BLP | CDD |
| 49257059 | BLP | Non-BLP | BLP | INTERPRO |
BLP - Bioluminescent protein; Non-BLP - Non-bioluminescent protein
CDD - Conserved Domain Database
Comparison of BLProt with other machine learning methods
| Method | Sensitivity | Specificity | MCC | Accuracy |
|---|---|---|---|---|
| J4.8 | 69.50 | 75.79 | 0.4518 | 73.11 |
| PART | 63.12 | 72.11 | 0.3519 | 68.28 |
| IBK | 76.60 | 69.47 | 0.4556 | 72.51 |
| Random Forest | 75.18 | 73.16 | 0.4787 | 74.02 |
| AdaBoost | 68.79 | 72.63 | 0.4117 | 71.00 |
| BLProt | 74.47 | 84.21 | 0.5904 | 80.06 |
MCC - Matthew's correlation coefficient