| Literature DB >> 24391870 |
Myco Umemura1, Hideaki Koike2, Nozomi Nagano3, Tomoko Ishii1, Jin Kawano1, Noriko Yamane2, Ikuko Kozone4, Katsuhisa Horimoto5, Kazuo Shin-ya6, Kiyoshi Asai3, Jiujiang Yu7, Joan W Bennett8, Masayuki Machida1.
Abstract
Many bioactive natural products are produced as "secondary metabolites" by plants, bacteria, and fungi. During the middle of the 20th century, several secondary metabolites from fungi revolutionized the pharmaceutical industry, for example, penicillin, lovastatin, and cyclosporine. They are generally biosynthesized by enzymes encoded by clusters of coordinately regulated genes, and several motif-based methods have been developed to detect secondary metabolite biosynthetic (SMB) gene clusters using the sequence information of typical SMB core genes such as polyketide synthases (PKS) and non-ribosomal peptide synthetases (NRPS). However, no detection method exists for SMB gene clusters that are functional and do not include core SMB genes at present. To advance the exploration of SMB gene clusters, especially those without known core genes, we developed MIDDAS-M, a motif-independent de novodetection algorithm for SMB gene clusters. We integrated virtual gene cluster generation in an annotated genome sequence with highly sensitive scoring of the cooperative transcriptional regulation of cluster member genes. MIDDAS-M accurately predicted 38 SMB gene clusters that have been experimentally confirmed and/or predicted by other motif-based methods in 3 fungal strains. MIDDAS-M further identified a new SMB gene cluster for ustiloxin B, which was experimentally validated. Sequence analysis of the cluster genes indicated a novel mechanism for peptide biosynthesis independent of NRPS. Because it is fully computational and independent of empirical knowledge about SMB core genes, MIDDAS-M allows a large-scale, comprehensive analysis of SMB gene clusters, including those with novel biosynthetic mechanisms that do not contain any functionally characterized genes.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24391870 PMCID: PMC3877130 DOI: 10.1371/journal.pone.0084028
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Principle of the MIDDAS-M algorithm.
(A) Virtual cluster (VC) generation for SMB gene cluster detection. Gene clusters on a genome are evaluated comprehensively by a moving window with a specific cluster size; the cluster size can be changed from 3 to 30 or another appropriate size. (B) Schematic representation of MIDDAS-M. Candidate SMB gene clusters show large deviations from the standard deviation after summing the induction ratios of member genes and statistical enhancement. (C) Flow chart of the MIDDAS-M algorithm.
Figure 2Behavior and performance of MIDDAS-M in A. oryzae.
(A) Histograms of M scores at ncl = 1, 3, 5, 7, and 10 in the transcriptomes at 7 vs. 4 days of cultivation in kojic acid (KA)-production medium. The symmetry broke at a cluster size of 3 because of the emergence of large M scores due to the induction of the KA cluster genes. Arrows at the termini of the x-axis indicate the smallest and the largest values. (B) Emergence of a ωmax peak by MIDDAS-M from the raw induction ratio. The x-axis designates relative position of the genes on the A. oryzae RIB40 genome when eight chromosomes are concatenated into one. The y-axis scales are the same for all three datasets in the same raw. The ωmax peak indicated by the red arrow corresponds exactly to the three genes responsible for KA production.
Figure 3Clear detection of known SMB gene clusters in F. verticillioides by MIDDAS-M.
(A) Expression levels of each gene on the F. verticillioides genome in 4 samples of a transcriptome time series at 24, 48, 72, 96 h in liquid fumonisin-inducing media. The highest value of the 4 expression levels was plotted for each gene. (B) Absolute maximum cluster scores (|ωmax|) by the comprehensive pair-wise calculation (4C2) for each gene detected from the same transcriptome data as A. The step line plot in gray denotes the individual chromosomes. The peaks designated by a through e correspond to the 5 experimentally validated SMB clusters: a, fumonisin; b, perithecium pigment; c, fusaric acid; d, bikaverin; e, fusarin. Two peaks to which any known gene clusters do not correspond were designated as y1 and y2.
Experimentally-validated or SMURF-annotated SMB gene clusters detected by MIDDAS-M.
| Fungus | Compound/SMURFa |
| Gene IDb | Cluster size | Source | |
| MIDDAS-Mc | Otherd | |||||
|
| Kojic acid | 9544 | AO090113000136 - AO090113000138 | 3 | 3 |
|
|
| Bikaverin (Cluster 7) | 11708 | FVEG_03379 – FVEG_03383 | 4 | 6 |
|
| Fumonisin (Cluster 3) | 9780 | FVEG_00316 – FVEG_00329 | 14 | 15 |
| |
| Fusaric acid (Cluster 27) | 6398 | FVEG_12519 – FVEG_12535 | 17 | 5 |
| |
| Fusarin | 840 | FVEG_11078 – FVEG_11086 | 9 | 9 |
| |
| Perithecium pigment (Cluster 9) | 12533 | FVEG_03696 – FVEG_03699 | 6 | 4 |
| |
| Cluster 10 | 1700 | FVEG_05526 – FVEG_05530 | 5 | 10 | SMURF | |
| Cluster 24 | 866 | FVEG_11927 – FVEG_11931 | 5 | 7 | SMURF | |
|
| Aflatoxin (Cluster 54) | 99087 | AFLA_139150 - AFLA_139320 | 18+5 | 29 |
|
| 24302 | AFLA_139370 – AFLA_139410 | |||||
| Aflatrem | 3670 | AFLA_096380 - AFLA_096400 (ATM1) | 3 | 3 |
| |
| (Cluster 14) | 8984 | AFLA_045490 - AFLA_045540 (ATM2) | 6 | 5 | ||
| Cyclopiazonic acid | 36281 | AFLA_139460 – AFLA_139490 | 4 | 3 |
| |
| Gliotoxin-like (Cluster 22) | 32872 | AFLA_064380 – AFLA_064590 | 22 | 26 | Annotation, SMURF | |
| Kojic acid | 8273 | AFLA_096030 - AFLA_096060 | 4 | 3 |
| |
| Ustiloxin B | 21857 | AFLA_094940 – AFLA_095110 | 18 | ? | This study | |
| Cluster 3 | 7369 | AFLA_005320 - AFLA_005350 | 4 | 8 | SMURF | |
| Cluster 5 | 1960 | AFLA_006170 - AFLA_006190 | 3 | 7 | SMURF | |
| Cluster 7 | 5193 | AFLA_009980 - AFLA_010030 | 6 | 8 | SMURF | |
| Cluster 8 | 9341 | AFLA_010600 - AFLA_010630 | 4 | 10 | SMURF | |
| Cluster 10 | 18356 | AFLA_023000 – AFLA_023040 | 5 | 15 | SMURF | |
| Cluster 17 | 1423 | AFLA_054370 – AFLA_054390 | 3 | 25 | SMURF | |
| Cluster 18 | 1072 | AFLA_060030 - AFLA_060050 | 3 | 15 | SMURF | |
| Cluster 19 | 26351 | AFLA_060660 - AFLA_060700 | 5 | 9 | SMURF | |
| Cluster 20 | 2079 | AFLA_062820 - AFLA_062900 | 9 | 18 | SMURF | |
| Cluster 21 | 8227 | AFLA_064260 - AFLA_064330 | 8 | 21 | SMURF | |
| Cluster 23 | 5702 | AFLA_066690 – AFLA_066720 | 4+6 | 33 | SMURF | |
| 2888 | AFLA_066890 - AFLA_066940 | |||||
| Cluster 24 | 4508 | AFLA_069320 - AFLA_069340 | 3 | 10 | SMURF | |
| Cluster 25 | 4219 | AFLA_070860 – AFLA_080890 | 4+4 | 26 | SMURF | |
| 5148 | AFLA_070910 - AFLA_070950 | |||||
| Cluster 27 | 2012 | AFLA_082140 - AFLA_082160 | 3 | 14 | SMURF | |
| Cluster 33 | 5797 | AFLA_101700 - AFLA_101770 | 8 | 6 | SMURF | |
| Cluster 36 | 1026 | AFLA_105410 – AFLA_105450 | 5 | 5 | SMURF | |
| Cluster 37 | 13236 | AFLA_108550 – AFLA_108580 | 4 | 18 | SMURF | |
| Cluster 41 | 2503 | AFLA_116130 – AFLA_116150 | 3+3 | 26 | SMURF | |
| 1331 | AFLA_116170 – AFLA_116190 | |||||
| Cluster 44 | 6277 | AFLA_118390 – AFLA_118410 | 3 | 11 | SMURF | |
| Cluster 45 | 2494 | AFLA_118940 – AFLA_119000 | 7 | 19 | SMURF | |
| Cluster 46 | 4420 | AFLA_119080 - AFLA_119120 | 5 | 6 | SMURF | |
| Cluster 47 | 12300 | AFLA_121470 - AFLA_121540 | 8 | 8 | SMURF | |
| Cluster 49 | 1429 | AFLA_128030 - AFLA_128110 | 9 | 13 | SMURF | |
| Cluster 53 | 2813 | AFLA_137830 – AFLA_137860 | 4+3 | 15 | SMURF | |
| 1844 | AFLA_137890 – AFLA_137910 | |||||
The detection threshold is >95th quantile (false positive rate 0.05).
The most induced combinations of culture conditions are listed in Appendix S2.
aClusters with numbers are those predicted by SMURF. The list of the predicted gene clusters can be downloaded from http://jcvi.org/smurf/precomputed.php.
bGene IDs are for annotated genome sequences in GenBank (A. oryzae, F. verticillioides, and A. flavus) as described in Appendix S1.
cTwo numbers are described when the predicted clusters are divided into two regions and represent the corresponding clusters.
dCluster size experimentally validated or predicted by SMURF (refer to Source in detail).
Figure 4SMB gene cluster detection by MIDDAS-M in A. flavus.
(A) A 3D view of the ωmax scores for all genes and combinations of culture conditions. Comprehensive detection of SMB gene clusters was performed on all 378 pairwise combinations of culture conditions from 28 transcriptomes. The gray and green areas denote blocks of synteny and non-synteny, respectively, with the A. nidulans genome. The positions of gene clusters possessing PKS and NRPS core genes predicted by SMURF are shown in orange and blue, respectively. The chemical structures of four A. flavus secondary metabolites are shown at the positions of corresponding SMB gene clusters; the ustiloxin B gene cluster was first identified in this paper. (B) Magnified view of the area on chromosome 2 corresponding to the black square in A. As an example, a yellow circle designates the peak observed specifically at particular positions, from which conditions for producing the corresponding compound were determined.
Figure 5Frequency of SMB-related genes in clusters detected by MIDDAS-M.
(A) Ratios of SMB-related genes (Q-genes) detected by KOG analysis with the cluster genes detected by MIDDAS-M (hatched bars) and all the genes in the corresponding genome (gray bars). (B) The proportion of clusters containing genes annotated as P450 enzymes (pink), C6 transcription factors (blue), and major facilitator superfamily members (green) were calculated for detected clusters with the threshold score of ωmax in A. flavus. The value is plotted to a ωmax of 18,350, at which 10 clusters remain to be detected.
Figure 6Identification of the ustiloxin B cluster in A. flavus based on the MIDDAS-M prediction.
(A) MIDDAS-M results from a combination of culture conditions in maize at 28°C versus 37°C. The leftmost distinct peak corresponds to the aflatoxin gene cluster. The other two peaks were designated as clusters a and b. The step line plot in gray denotes the chromosomes. (B) Peaks at a retention time of 8.9 min detected in the extracted ion chromatograms of m/z 644.2±0.1 in negative ion mode were not observed in the A. flavus deletion mutants of the genes in cluster a (red). Chromatograms are for medium only (blue, negative control), the control strain (pyrG revertant, black), the aflatoxin cluster deletion mutant, and three mutants with deletions in cluster b (gray). (C) The mass spectra at of the 8.9 min retention peaks in the control strain (above) and the deletion mutant ΔAF_a (below). The MS peak of m/z 644.2 in the control strain was not present in the deletion mutant. (D) Comparison of the mass spectra for ustiloxin B and the compound with m/z 644.2 (in negative ion mode) isolated from the control strain. (E) Comparison of the chromatograms of the ustiloxin B reference standard and the compound isolated in this study. The extracted ion chromatogram of m/z 644.23 in negative ion mode and UV chromatograms at 290, 254, and 220 nm are indicated.