| Literature DB >> 24067167 |
Sun Chen1, Chun-ying Zhang, Kai Song.
Abstract
BACKGROUND: Significant efforts have been made to address the problem of identifying short genes in prokaryotic genomes. However, most known methods are not effective in detecting short genes. Because of the limited information contained in short DNA sequences, it is very difficult to accurately distinguish between protein coding and non-coding sequences in prokaryotic genomes. We have developed a new Iteratively Adaptive Sparse Partial Least Squares (IASPLS) algorithm as the classifier to improve the accuracy of the identification process.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24067167 PMCID: PMC3852556 DOI: 10.1186/1745-6150-8-23
Source DB: PubMed Journal: Biol Direct ISSN: 1745-6150 Impact factor: 4.540
Figure 1Accuracies of Four Programs to detect short genes.
Datasets of the organisms
| [60,100) | 705 | 3403 |
| [100,200) | 1693 | 5657 |
| [200,300) | 2603 | 2728 |
| [300,400] | 2132 | 1372 |
The best recognition results obtained by different methods*
| Orphelia | 90.21 | 22.13 | 60.07 | 58.90 | 83.57 | 83.65 | 81.97 | 90.87 |
| GeneMarks | 31.91 | 59.48 | 85.17 | 63.51 | 95.34 | 64.84 | 98.08 | 61.44 |
| HA | 16.60 | 79.25 | 76.43 | 74.70 | 94.48 | 74.41 | 96.96 | 71.64 |
| Metagene | # | # | 54.45 | 57.23 | 88.70 | 55.64 | 95.29 | 70.84 |
| IASPLS | 83.44 | 92.80 | 84.57 | 84.92 | 94.91 | 95.32 | 97.82 | 97.50 |
* ‘#’ represents there is no prediction result.
Figure 2Sensitivities of the Five Prediction Programs.
Figure 3Specificities of the Five Prediction Programs.
Performance comparison of IASPLS with Goli and Nair’ method*
| Goli and Nair’ method | 91.26 | 89.89 | 90.67 | 0.81 |
| IASPLS | ||||
*The better results between these two algorithms evaluated here were shown in boldface.
The best results obtained by different classifiers*
| | | | | ||
|---|---|---|---|---|---|
| IASPLS | 151.2 s | ||||
| Logistic | 93.58 | 87.72 | 91.05 | 0.82 | |
| SPLS | 93.14 | 93.48 | 93.29 | 0.86 | 146.2 s |
| KNN(k = 1) | 89.33 | 83.73 | 86.91 | 0.73 | 1584.1 s |
| Random Forest(trees = 500) | 88.65 | 87.71 | 88.24 | 0.76 | 1646.5 s |
*The best results among the algorithms evaluated here were shown in boldface.
**Time was the computational time of one round of training-testing process.