| Literature DB >> 17959654 |
Salim Charaniya1, Sarika Mehra, Wei Lian, Karthik P Jayapal, George Karypis, Wei-Shou Hu.
Abstract
Streptomyces spp. produce a variety of valuable secondary metabolites, which are regulated in a spatio-temporal manner by a complex network of inter-connected gene products. Using a compilation of genome-scale temporal transcriptome data for the model organism, Streptomyces coelicolor, under different environmental and genetic perturbations, we have developed a supervised machine-learning method for operon prediction in this microorganism. We demonstrate that, using features dependent on transcriptome dynamics and genome sequence, a support vector machines (SVM)-based classification algorithm can accurately classify >90% of gene pairs in a set of known operons. Based on model predictions for the entire genome, we verified the co-transcription of more than 250 gene pairs by RT-PCR. These results vastly increase the database of known operons in S. coelicolor and provide valuable information for exploring gene function and regulation to harness the potential of this differentiating microorganism for synthesis of natural products.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17959654 PMCID: PMC2175336 DOI: 10.1093/nar/gkm501
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Definition of known operon pairs (KOPs), non-operon pairs (NOPs), same-strand pairs and opposite-strand pairs. Closed-block arrows indicate genes in a known operon. Open-block arrows represent genes with unknown operon status.
Figure 2.Density distribution of intergenic distance in KOPs and NOPs. (continous line) KOPs; (dashed line) NOPs.
Figure 3.Comparison of Pearson correlation between transcript levels of adjacent genes in KOPs and NOPs. The microarray experiments were divided into nine sets and correlation between transcript levels of adjacent genes in every pair was calculated for each set. The histogram of the number of sets in which correlation exceeds 0.7 in (a) Known operon pairs (KOPs); (b) Non-operon pairs (NOPs); (c) Randomly selected pairs.
Comparison of different classifiers using leave-one-out cross-validation
| Classifier | Kernel function | Feature(s) | Recall (%) | Precision (%) | Total error rate (%) | False positive rate (%) | F-factor |
|---|---|---|---|---|---|---|---|
| I | Radial (γ = 0.01) | Distance | 82 | 86 | 17 | 16 | 0.838 |
| II | Linear | Distance (discretized) | 78 | 86 | 19 | 16 | 0.817 |
| III | Radial (γ = 0.0025) | 78 | 86 | 19 | 16 | 0.817 | |
| IV | Linear | Transcriptome | 78 | 82 | 22 | 21 | 0.798 |
| V | Radial (γ = 0.02) | 80 | 82 | 21 | 21 | 0.810 | |
| VI | Linear | Terminator prediction | 100 | 58 | 39 | 87 | 0.734 |
| VII | Linear | Distance and transcriptome | 90 | 88 | 12 | 15 | 0.890 |
| VIII | Radial (γ = 0.25) | 90 | 88 | 12 | 15 | 0.890 | |
| IX | Linear | Distance, transcriptome and terminator prediction | 90 | 88 | 12 | 15 | 0.887 |
| X | Radial (γ = 0.25) | 92 | 89 | 11 | 14 | 0.904 |
Figure 4.Comparison of different classifiers by ROC curve. False positive rate is the percentage of non-operon pairs (NOPs) misclassified as operon pairs and recall is the percentage of known operon pairs (KOPs) correctly classified as operon pairs. The ROC curves were generated for each classifier by a 5-fold cross-validation as described in the text. (Open circle) classifier I; (Inverted triangle) classifier V; (Open triangle) classifier VIII; (Open square) classifier X.
Comparison of different classifiers by 5-fold cross-validation. The null hypothesis was tested by comparing the AUC of the 25 ROC graphs for each classifier by Wilcoxon signed rank test
| Classifier | Feature(s) | Average AUC | Null hypothesis | |
|---|---|---|---|---|
| I | Distance | 0.81 | – | |
| V | Transcriptome | 0.81 | 6.5 × 10−1 | AUCV − AUCI = 0 |
| VIII | Distance and transcriptome | 0.89 | 1.1 × 10−4 | AUCVIII − AUCI = 0 |
| X | Distance, transcriptome and terminator prediction | 0.91 | 1.2 × 10−5 | AUCX − AUCI = 0 |
| 6.7 × 10−3 | AUCX − AUCVIII = 0 |
Distribution of scores of same-strand gene pairs with unknown operon status
| Score | Number of gene pairs | Number of pairs with | Number of pairs with short intergenic distance ( |
|---|---|---|---|
| 1452 | 161 (11%) | 3 (<1%) | |
| −1 ⩽ | 1352 | 597 (44%) | 123 (9%) |
| 0 ⩽ | 1369 | 658 (48%) | 1074 (78%) |
| 1 ⩽ | 301 | 230 (76%) | 264 (88%) |
| 342 | 329 (96%) | 307 (90%) | |
| Total | 4816 | 1975 | 1771 |
ar is correlation between transcript profiles of the adjacent genes in a same-strand pair.
Functional analysis of same-strand gene pairs
| Score | Number of gene pairs | Number of annotated gene pairs | Number of pairs in same functional class |
|---|---|---|---|
| 1452 | 605 | 106 (18%) | |
| −1 ⩽ | 1352 | 521 | 117 (22%) |
| 0 ⩽ | 1369 | 667 | 317 (48%) |
| 1 ⩽ | 301 | 169 | 100 (59%) |
| 342 | 206 | 137 (67%) |
Figure 5.Experimental verification of co-transcription of adjacent genes by RT-PCR. RNA isolation and RT-PCR was performed as described in the Materials and Methods section. Primers were used to amplify across adjacent genes and the products were analyzed by gel electrophoresis. The expected size of the amplicons in bp is: SCO1866-1867—1035; SCO1920-1921—1637; SCO1922-1923—885; SCO1935-1936—2298; SCO1936-1937—1633; SCO1945-1946—1659; SCO1946-1947—1353; SCO1949-1950—964; SCO1968-1969—1426; SCO2049-2050—980; SCO5737-5738—1615. For every gene pair, a negative control (No RT) in which the RT enzyme was not added was also performed. The negative control is shown next to each RT reaction.
RT-PCR based verification of co-transcription of gene pairs
| Score | Number of gene pairs tested | Number of gene pairs verified | Range of intergenic distance (bp) |
|---|---|---|---|
| 163 | 122 (75%) | −32 to 131 | |
| 1 ⩽ | 91 | 61 (67%) | −8 to 178 |
| 0 ⩽ | 114 | 67 (59%) | −13 to 178 |
Extension of cistron boundary of known operons
| No. | Known operon | Known size | Predicted operon | Gene pairs verified by RT-PCR | Reference |
|---|---|---|---|---|---|
| 1 | 2 | ( | |||
| 2 | 5 | ( | |||
| 3 | 3 | ( |
Size of the predicted transcription units
| Cistron size | Number of cistrons | Number of cistrons with | Number of cistrons with |
|---|---|---|---|
| 1 | 4386 | – | – |
| 2 | 839 | 203 | 85 |
| 3 | 235 | 33 | 13 |
| 4 | 111 | 17 | 6 |
| 5 | 46 | 5 | 3 |
| >5 | 47 | 2 | 0 |