| Literature DB >> 18950525 |
Mohd Zeeshan Ansari1, Jyoti Sharma, Rajesh S Gokhale, Debasisa Mohanty.
Abstract
BACKGROUND: Secondary metabolites biosynthesized by polyketide synthase (PKS) and nonribosomal peptide synthetase (NRPS) family of enzymes constitute several classes of therapeutically important natural products like erythromycin, rapamycin, cyclosporine etc. In view of their relevance for natural product based drug discovery, identification of novel secondary metabolite natural products by genome mining has been an area of active research. A number of different tailoring enzymes catalyze a variety of chemical modifications to the polyketide or nonribosomal peptide backbone of these secondary metabolites to enhance their structural diversity. Therefore, development of powerful bioinformatics methods for identification of these tailoring enzymes and assignment of their substrate specificity is crucial for deciphering novel secondary metabolites by genome mining. <br> RESULTS: In this work, we have carried out a comprehensive bioinformatics analysis of methyltransferase (MT) domains present in multi functional type I PKS and NRPS proteins encoded by PKS/NRPS gene clusters having known secondary metabolite products. Based on the results of this analysis, we have developed a novel knowledge based computational approach for detecting MT domains present in PKS and NRPS megasynthases, delineating their correct boundaries and classifying them as N-MT, C-MT and O-MT using profile HMMs. Analysis of proteins in nr database of NCBI using these class specific profiles has revealed several interesting examples, namely, C-MT domains in NRPS modules, N-MT domains with significant homology to C-MT proteins, and presence of NRPS/PKS MTs in association with other catalytic domains. Our analysis of the chemical structures of the secondary metabolites and their site of methylation suggested that a possible evolutionary basis for the presence of a novel class of N-MT domains with significant homology to C-MT proteins could be the close resemblance of the chemical structures of the acceptor substrates, as in the case of pyochelin and yersiniabactin. These two classes of MTs recognize similar acceptor substrates, but transfer methyl groups to N and C positions on these substrates. <br> CONCLUSION: We have developed a novel knowledge based computational approach for identifying MT domains present in type I PKS and NRPS multifunctional enzymes and predicting their site of methylation. Analysis of nr database using this approach has revealed presence of several novel MT domains. Our analysis has also given interesting insight into the evolutionary basis of the novel substrate specificities of these MT proteins.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18950525 PMCID: PMC2613160 DOI: 10.1186/1471-2105-9-454
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1A schematic overview of different bioinformatics analyses carried out in the current work on MT domains present in type I PKS and NRPS proteins.
List of ORFs containing C-, O- and N-Methyltransferase domains
| Name of gene cluster | ORF | Accession no. | CDD search | Types of MT-domain | Total | ||
| C-MT | O-MT | N-MT | |||||
| NRPS clusters | |||||||
| Actinomycin | acmC | AAF42473 | PF08242 | - | - | 2 | 2 |
| Anabaenopeptilide | apdB | CAC01604 | PF08242 | - | - | 2 | |
| apdE | CAC01607 | PF08241 | - | 1 | - | 3 | |
| Complestatin | comC | AAK81826 | PF08242 | - | - | 1 | 1 |
| Cyclosporine | simA | CAA82227 | PF08242 | - | - | 7 | 7 |
| Enniatin | esyn1 | CAA79245 | PF08242 | - | - | 1 | 1 |
| Pristinamycin | snbDE | T30289 | PF08242 | - | - | 1 | 1 |
| Pyochelin | pchF | AAD55801 | PF08242 | - | - | 1 | 1 |
| Thaxtomin | txtA | AAG27087 | PF08242 | - | - | 1 | |
| txtB | AAG27088 | COG2226 | - | - | 1 | 2 | |
| PKS clusters | |||||||
| Compactin | mlcA | BAC20564 | PF08242 | 1 | - | - | |
| mlcB | BAC20566 | PF08242 | 1 | - | - | 2 | |
| Erythromycin | eryG | CAA42929 | COG2226 | - | 1 | - | 1 |
| Equisetin | eqiS | AAV66106 | PF08242 | 1 | - | - | 1 |
| Fumonisin | fum1 | AAD43562 | PF08242 | 1 | - | - | 1 |
| Lovastatin | lovB | Q9Y8A5 | PF08242 | 1 | - | - | |
| lovF | AAD34559 | PF08242 | 1 | - | - | 2 | |
| Stigmatellin | stiD | CAD19088 | PF08242 | - | 1 | - | |
| stiE | CAD19089 | PF08242 | - | 1 | - | ||
| stiK | CAD19094 | PF01209 | - | 1 | - | 3 | |
| Hybrid NRPS-PKS | |||||||
| Bleomycin | blmVIII | AAG02357 | PF08242 | 1 | - | - | 1 |
| Barbamide | barF | AAN32980 | PF08242 | - | 1 | - | |
| barG | AAN32981 | PF08242 | - | - | 1 | 2 | |
| Epothilone | epoD | AAF26922 | PF08242 | 1 | - | - | 1 |
| Jamaicamide A | jamJ | AAS98781 | PF08242 | 1 | - | - | |
| jamN | AAS98785 | COG0500 | - | 1 | - | 2 | |
| Leinamycin | lnmJ | AAN85523 | PF08242 | 1 | - | - | 1 |
| Melithiazol | melE | CAD89776 | PF08242 | - | 1 | - | |
| melF | CAD89777 | PF08242 | - | 1 | - | 2 | |
| Microcystin | mcyD | BAB12210 | PF08242 | 1 | - | - | |
| mcyE | BAB12211 | - | 1 | - | - | ||
| mcyG | BAB12213 | PF08242 | 1 | - | - | ||
| mcyA | BAA83992 | PF08242 | - | - | 1 | 4 | |
| Myxothiazol | mtaE | AAF19813 | PF08242 | - | 1 | - | |
| mtaF | AAF19814 | PF08242 | - | 1 | - | 2 | |
| Nodularin | ndaC | AAO64404 | - | 1 | - | - | |
| ndaD | AAO64405 | PF08242 | 1 | - | - | ||
| ndaF | AAO64407 | - | 1 | - | - | ||
| ndaA | AAO64403 | PF08242 | - | - | 1 | ||
| ndaE | AAO64406 | PF08242 | - | 1 | - | 5 | |
| Onnamide | onnB | AAV97870 | PF08242 | - | 1 | - | |
| onnD | AAV97872 | PF08242 | - | 1 | - | ||
| onnG | AAV97875 | PF08242 | - | 1 | - | ||
| onnH | AAV97876 | COG2226 | - | 1 | - | ||
| onnI | AAV97877 | PF08242 | - | 1 | - | 5 | |
| Pederin | pedF | AAS47564 | PF08242 | 1 | - | - | |
| pedA | AAS47557 | PF08241 | - | 1 | - | ||
| pedE | AAS47560 | COG2226 | - | 1 | - | 3 | |
| Tubulysin | tubF | CAF05651 | PF08242 | 1 | - | - | |
| tubB | CAF05647 | PF08242 | - | - | 1 | ||
| tubC | CAF05648 | PF08242 | - | - | 1 | 3 | |
| Yersiniabactin | HMWP-1 | AAC69588 | PF08242 | 2 | - | - | 2 |
| Total | 20 | 19 | 22 | 61 | |||
List of ORFs containing methyltransferase domains in various experimentally characterized NRPS/PKS gene clusters. Table also lists the number and type of MT domains in each of the ORFs and results from PFAM analysis using CDD search.
Figure 2Chemical structures of representative secondary metabolites like nodularin, leinamycin, pyochelin, yersiniabactin and stigmatellin containing methyl groups (highlighted by arrow sign) added by C-MT, N-MT and O-MT enzymatic domains.
Figure 3Alignment of the sequence stretch containing A and N-MT domains from a C-A-MT-T NRPS module with sequence of A domain from a C-A-T module. A split alignment is obtained, because N-MT domain is integrated between A-8 and A-9 signature motifs of A domain.
Threading analysis of 18 representative MT containing sequences
| melit01_OM_001 | 609 | C | - | C {100} | C | - | C | - | - | - | - | C |
| anaba01_OM_001 | 263 | - | - | C {100} | C {100} | C | - | C | - | {100} | - | - |
| onnam04_OM_001 | 268 | - | - | C {100} | C | C | - | - | - | {100} | - | - |
| peder01_OM_001 | 312 | - | - | C {100} | C | - | - | - | - | {100} | - | - |
| stigm03_OM_001 | 256 | - | - | C {100} | C {100} | C | - | - | - | {100} | - | - |
| eryth01_OM_001 | 306 | - | - | C {100} | C | - | C | - | - | {100} | - | - |
| barba01_OM_001 | 422 | - | - | C {100} | C | C | C | - | - | {100} | - | - |
| onnam01_OM_001 | 484 | - | - | {100} | C {100} | - | - | C | - | {100} | {100} | - |
| bleom01_CM_001 | 640 | C | - | C {100} | C {100} | C | C | - | - | {100} | {100} | C |
| nodul02_CM_001 | 720 | - | C | C {100} | C {100} | - | C | C | - | {100} | - | C |
| compa01_CM_001 | 490 | - | - | C {100} | C {100} | - | C | C | - | {100} | - | - |
| leina01_CM_001 | 432 | - | C | C {100} | C {100} | C {100} | C | C | - | - | - | - |
| yersi01_CM_002 | 479 | - | C | C | C | - | C | C | - | - | {95} | - |
| actin01_NM_001 | 422 | - | - | {100} | {100} | C {100} | C | - | - | {100} | - | - |
| anaba01_NM_001 | 367 | - | - | - | - | - | C | - | H {100} | - | - | - |
| cyclo01_NM_001 | 431 | - | - | {100} | {100} | - | H | - | - | {100} | - | - |
| pyoch01_NM_001 | 390 | - | H | {100} | C {100} | - | C | C | - | {100} | {100} | - |
| thaxt02_NM_001 | 362 | - | - | {100} | C {100} | C | C | - | - | {100} | - | - |
Column 1 gives the unique name assigned to each MT containing sequence stretch, while the second column indicates their length. Column 3–13 lists the PDB IDs for the structures which show alignment with these sequences in fold recognition analysis using GenTHREADER and PHYRE servers. Results from GenTHREADER corresponding to confidence level CERTAIN and HIGH are labeled as C and H respectively. Similarly, PHYRE results corresponding to precision level 100% and 95% are labeled as 100 and 95 respectively in curly braces. (-) indicates the absence of that particular fold.
Figure 4Schematic representation of the results of threading analysis for typical C-MT containing sequence (from bleomycin ORF blmVIII) stretches having large length. The central stretch aligns with various methyltransferase crystal structures like 1VLM and 1QZZ. A 200 amino acid C-terminal stretch aligns with the structural half of the KR domain in the crystal structure 2FR0, a 60 amino acid N-terminal stretch shows alignment with the terminal stretch of the KS-AT di-domain structure 2HG4. The query sequence containing the MT domain is represented as a black line, while rectangular colored boxes represent matches with various structural folds. The corresponding structures are shown in the same color.
Figure 5Multiple sequence alignments of N-MT domains from experimentally characterized NRPS/PKS clusters with the structural template 1VLM.
Figure 6Multiple sequence alignments of C-MT and domains from experimentally characterized NRPS/PKS clusters with the structural template 1VLM.
Figure 7Multiple sequence alignments of O-MT domains from experimentally characterized NRPS/PKS clusters with the structural template 1VLM.
Figure 8Threading alignment C-MT containing sequence (ORF blmVIII) stretch from bleomycin gene cluster. (a) Alignment of 60 amino acid N-terminal stretch with structure of KS-AT di-domain (2HG4) from erythromycin PKS (b) Alignment of 200 amino acid C-terminal region with structure (2FR0) of KR domain from erythromycin PKS.
Figure 9Dendrogram of 60 MT domains from experimentally characterized NRPS, PKS and hybrid NRPS/PKS biosynthetic clusters. The C-MT, O-MT and N-MT are colored pink, yellow and green respectively. The 18 representative MT sequences used as templates for detecting MT domains in a query are marked by "*". Two MT domains from onnamide-A which are annotated as O-MTs and cluster with C-MTs are marked with hash (#) symbol.
Scores and E-values for the alignment of 18 representative MT domains with the HMM profiles of N-MT, O-MT and C-MT.
| actin01_NM_001 | 390.1 | 1.10E-117 | - | - | - | - |
| anaba01_NM_001 | 252.8 | 2.40E-076 | - | - | - | - |
| anaba01_OM_001 | - | - | 375.5 | 2.80E-113 | - | - |
| barba01_OM_001 | - | - | 385.5 | 2.80E-116 | - | - |
| bleom01_CM_001 | - | - | 385.5 | 2.80E-116 | ||
| compa01_CM_001 | - | - | - | - | 332.7 | 2.10E-100 |
| cyclo01_NM_001 | 449.7 | 1.30E-135 | - | - | - | - |
| eryth01_OM_001 | - | - | 369.6 | 1.60E-111 | - | - |
| leina01_CM_001 | - | - | 395.5 | 2.60E-119 | ||
| melit01_OM_001 | - | - | 178.7 | 4.80E-054 | - | - |
| nodul02_CM_001 | - | - | 270.3 | 1.30E-081 | ||
| onnam01_OM_001 | - | - | 289.7 | 1.80E-087 | ||
| onnam04_OM_001 | - | - | 378.1 | 4.60E-114 | - | - |
| peder01_OM_001 | - | - | 234.7 | 6.90E-071 | - | - |
| pyoch01_NM_001 | 187.4 | 1.10E-056 | - | |||
| stigm03_OM_001 | - | - | 210.1 | 1.70E-063 | - | - |
| thaxt02_NM_001 | 318.9 | 3.10E-096 | - | - | - | - |
| yersi01_CM_002 | - | - | - | - | 321.4 | 5.20E-097 |
A '-'sign indicates that alignments resulted in scores with E-value higher than 1.0E-6. Several C-MTs align with O-MT profiles, while only N-MT of pyochelin synthase shows alignment with C-MT profile as well.
Figure 10Histograms showing the number of proteins in nr database having N-MT, O-MT and C-MT domains as identified by our HMM profile search. (a) PKS proteins, (b) NRPS proteins, (c) hybrid NRPS/PKS proteins and (d) proteins other than NRPS/PKS proteins.
List of protein sequences from nr database which contain C-MT domains adjacent to the core NRPS domains
| 108809363 | ACP-C-A- | 25 | yersi01_CM_002 | |
| 108809365 | C-A- | 33 | nodul02_CM_001 | |
| 108813376 | C-A- | 33 | nodul02_CM_001 | |
| 116215693 | 47 | yersi01_CM_002 | ||
| 148271509 | C-A- | 33 | nodul02_CM_001 | |
| 153947007 | C-A- | 33 | nodul02_CM_001 | |
| 153954130 | ACP-C-A- | 29 | nodul02_CM_001 | |
| 16121089 | C-A- | 33 | nodul02_CM_001 | |
| 16121091 | ACP-C-A- | 25 | yersi01_CM_002 | |
| 17546530 | C-A- | 33 | leina01_CM_001 | |
| 21225943 | C-A- | 33 | nodul02_CM_001 | |
| 22127285 | ACP-C-A- | 25 | yersi01_CM_002 | |
| 26248281 | ACP-C- | 100 | yersi01_CM_002 | |
| 28869792 | ACP-C-A- | 27 | leina01_CM_001 | |
| 41409840 | C-A- | 34 | yersi01_CM_002 | |
| 45443170 | C-A | 33 | nodul02_CM_001 | |
| 51597596 | C-A- | 33 | nodul02_CM_001 | |
| 89102588 | ACP-C-A- | 25 | yersi01_CM_002 | |
| 89895567 | C-A- | 37 | yersi01_CM_002 | |
| 90424526 | C-A- | 38 | yersi01_CM_002 |
List of protein sequences from nr database which contain C-MT domains adjacent to the core NRPS domains as identified by our MT HMM profile search. Gene identifier number, domain organization and organism name is listed in columns 1, 2 and 5. Columns 4 and 3 list the closest match (as identified by pair BLAST) with the representative MT domains from experimentally characterized NRPS/PKS clusters and the corresponding percentage identities.
List of representative protein sequences from nr database which contain MT domains in combination with other functional domains
| 113477217 | Glycos_transf_1- | 19 | |
| 110599935 | TPR_1-TPR_2-TPR_1-TPR_2-TPR_2-TPR_1-TPR_2- | 18 | |
| 154319269 | KS-KS-AT | 7 | |
| 126178605 | 4 | ||
| 110634799 | Abhydrolase_1- | 3 | |
| 31794901 | Radical_SAM- | 3 | |
| 30681189 | DUF248- | 2 | |
| 62290714 | HTH_3- | 2 | |
| 71013608 | Amidohydro_3- | 2 | |
| 107099798 | FA_hydroxylase- | 2 | |
| 147791135 | Amino_oxidase- | 2 | |
| 156064387 | E1-E2_ATPase- | 2 | |
| 157382467 | A-ACP-C-A- | 1 | |
| 157752876 | 1 | ||
| 153813751 | AstE_AspA- | 1 | |
| 71030506 | 1 | ||
| 66825109 | KS-KS-AT- | 1 | |
| 145608084 | HET- | 1 | |
| 147858936 | DUF642- | 1 | |
| 158520366 | DUF1365- | 1 | |
| 115402313 | 1 | ||
| 110681402 | C-A- | 1 | |
| 17546523 | 1 | ||
| 51892502 | Phosphodiest- | 1 | |
| 116207616 | MFS_1-KS-KS-AT- | 1 | |
| 41407518 | C-A-Strep_SA_rep-ACP-C-A- | 2 | |
| 26541536 | ACP-KR-DH-KS-KS-ACP-ACP-KS-KS-KR-DH-ACP | 1 | |
| 108757966 | A-ACP-C-A- | 1 |
List of representative protein sequences from nr database which contain C-MT, N-MT or O-MT domains in combination with functional domains other than core PKS or NRPS domains. Gene identifier number, domain organization and organism names are listed in columns 1, 2 and 4 respectively, while column 3 lists total number of proteins having similar domain organization.