| Literature DB >> 20385582 |
Li-Yeh Chuang1, Jui-Hung Tsai, Cheng-Hong Yang.
Abstract
An operon is a fundamental unit of transcription and contains specific functional genes for the construction and regulation of networks at the entire genome level. The correct prediction of operons is vital for understanding gene regulations and functions in newly sequenced genomes. As experimental methods for operon detection tend to be nontrivial and time consuming, various methods for operon prediction have been proposed in the literature. In this study, a binary particle swarm optimization is used for operon prediction in bacterial genomes. The intergenic distance, participation in the same metabolic pathway, the cluster of orthologous groups, the gene length ratio and the operon length are used to design a fitness function. We trained the proper values on the Escherichia coli genome, and used the above five properties to implement feature selection. Finally, our study used the intergenic distance, metabolic pathway and the gene length ratio property to predict operons. Experimental results show that the prediction accuracy of this method reached 92.1%, 93.3% and 95.9% on the Bacillus subtilis genome, the Pseudomonas aeruginosa PA01 genome and the Staphylococcus aureus genome, respectively. This method has enabled us to predict operons with high accuracy for these three genomes, for which only limited data on the properties of the operon structure exists.Entities:
Mesh:
Year: 2010 PMID: 20385582 PMCID: PMC2896535 DOI: 10.1093/nar/gkq204
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.BPSO flowchart. The algorithm starts out by initializing a population of random particles. Then the fitness values of particles are calculated and searches for pbest and gbest are executed at each generation. Afterwards, the position and velocity of the i-th particle are updated by pbest and gbest in the swarm, and the search for the best solution is continued by updating the generations until the stopping criteria are satisfied. Each particle makes use of its own memory and knowledge gained by the swarm as a whole to find the best solution. The updated position and velocity of the particles confined within and are obtained.
Figure 2.ROC curves of operon prediction. This study estimates the predictive ability under the circumstances of leaving a single property out on the B. subtilis, P. aeruginosa PA01 and S. aureus data, respectively.
Prediction features used by each computational method on the data set of B. subtilis
| Methodology | Features used | |||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ID | PA | GLR | HG | COG | PD | MI | MO | GO | GOC | PP | CGA | CF | PF | CAI | GCC | PR | TE | |
| BPSO | √ | √ | √ | |||||||||||||||
| UNIPOP ( | √ | |||||||||||||||||
| GA ( | √ | √ | √ | √ | ||||||||||||||
| Using both genome-specific and general genomic information ( | √ | √ | √ | √ | √ | √ | ||||||||||||
| SVM ( | √ | √ | √ | √ | ||||||||||||||
| ODB ( | √ | √ | √ | √ | ||||||||||||||
| DVDA ( | √ | |||||||||||||||||
| OFS ( | √ | √ | √ | |||||||||||||||
| VIMSS ( | √ | √ | √ | √ | ||||||||||||||
| FGA (40 | √ | √ | √ | √ | ||||||||||||||
| JPOP ( | √ | √ | √ | |||||||||||||||
| OPERON ( | √ | |||||||||||||||||
| FGENESB ( | √ | √ | √ | √ | ||||||||||||||
ID, intergenic distance; PA, pathway; GLR, gene length ratio; HG, homologous genes; COG, cluster of orthologous groups; PD, phylogenetic distance; MI, microarray; MO, motif; GO, gene ontology; GOC, gene order conservation; PP, phylogenetic profile; CGA, common gene annotation; CF, comparative features; PF, protein functions; CAI, codon adaptation index; GCC, gene cluster conservation; PR, promoter; TE, terminator.
Accuracy, sensitivity, and specificity of operon prediction on three genomes
| Genome | Methodology | Accuracy | Sensitivity | Specificity |
|---|---|---|---|---|
| BPSO | 0.899 | |||
| BPSO (initiation threshold = 300 bp) | 0.905 | 0.887 | 0.945 | |
| UNIPOP ( | 0.792 | 0.782 | 0.821 | |
| GA ( | 0.883 | 0.873 | 0.897 | |
| Using both genome-specific and general genomic information ( | 0.902 | N/A | N/A | |
| SVM ( | 0.889 | 0.900 | 0.860 | |
| ODB ( | 0.632 | 0.499 | ||
| DVDA ( | 0.485 | 0.319 | 0.932 | |
| OFS ( | 0.683 | 0.765 | 0.439 | |
| VIMSS ( | 0.780 | 0.764 | 0.871 | |
| FGA ( | 0.882 | N/A | N/A | |
| JPOP ( | 0.746 | 0.720 | 0.900 | |
| OPERON ( | 0.629 | 0.531 | 0.892 | |
| FGENESB ( | 0.771 | 0.721 | 0.904 | |
| BPSO | 0.939 | |||
| BPSO (initiation threshold = 300 bp) | 0.910 | 0.885 | ||
| GA ( | 0.813 | 0.870 | 0.763 | |
| BPSO | ||||
| BPSO (initiation threshold = 300 bp) | 0.936 | 0.924 | ||
| Genome-wide operon prediction in | 0.920 | N/A | N/A |
N/A: Data not available.
Highest values in bold type.