Literature DB >> 29062932

Generate a bioactive natural product library by mining bacterial cytochrome P450 patterns.

Abstract

The increased number of annotated bacterial genomes provides a vast resource for genome mining. Several bacterial natural products with epoxide groups have been identified as pre-mRNA spliceosome inhibitors and antitumor compounds through genome mining. These epoxide-containing natural products feature a common biosynthetic characteristic that cytochrome P450s (CYPs) and its patterns such as epoxidases are employed in the tailoring reactions. The tailoring enzyme patterns are essential to both biological activities and structural diversity of natural products, and can be used for enzyme pattern-based genome mining. Recent development of direct cloning, heterologous expression, manipulation of the biosynthetic pathways and the CRISPR-CAS9 system have provided molecular biology tools to turn on or pull out nascent biosynthetic gene clusters to generate a microbial natural product library. This review focuses on a library of epoxide-containing natural products and their associated CYPs, with the intention to provide strategies on diversifying the structures of CYP-catalyzed bioactive natural products. It is conceivable that a library of diversified bioactive natural products will be created by pattern-based genome mining, direct cloning and heterologous expression as well as the genomic manipulation.

Entities: Chemical

Keywords: Genome mining; Microbial P450; Natural product library; Synthetic biology

Year: 2016 PMID： 29062932 PMCID： PMC5640691 DOI： 10.1016/j.synbio.2016.01.007

Source DB: PubMed Journal: Synth Syst Biotechnol ISSN： 2405-805X

“Discovery consists of seeing what everybody else has seen, and thinking what nobody else has thought.”

Introduction

The conception of the natural product library originated from the advocation for the Biological Resource Center (BRC) via the Organization for the Economic Cooperation and Development (OECD) in 1999. During the past 15 years, the production of a library of microbial natural products has been mainly driven by the development of high-throughput screening methods and the increasing number of reported marine natural products.2, 3, 4 Broadly speaking, researches on natural product libraries have included the activities of creating a genomic DNA prioritized strain library, a metagenome library,6, 7 a combinational library, a crude extract library,9, 10, 11 a fraction library12, 13, 14 and a pure compound library. Additionally, the bioactive compound library is commercially available from Selleckchem, OTAVA and Cromadex. Strategies for developing a high quality microbial natural product library include diversification and de-replication of strain, natural product structures and high throughput screening methods.2, 19, 20, 21, 22, 23 Even so, the low outcomes from the creation of a crude extract library and a fraction library are far from the expectation in drug discovery. These facts have diminished the interests of big pharma in the natural product library. Nature is always full of diversity and natural product biosynthesis is no exception. The “co-linearity” rule, and the diversity and variations in nonribosomal peptide synthetases (NRPSs), polyketide synthases (PKSs), hybrid NRPS/PKS systems and the beyond have been greatly explored.25, 26, 27, 28, 29, 30, 31 During last two decades, well over hundreds of natural product biosynthetic pathways and thousands of natural scaffolds such as nonribosomal peptide, polyketide, terpenoid and oligosaccharide have been characterized.32, 33 Except for the enzymes that catalyze formation of chemical scaffolds, there is a group of enzymes that decorate the functional group density, acting sequentially to transfer the building blocks into complex molecular structures. The tailoring enzymes can be classified into nonoxidative and oxidative enzymes. Nonoxidative enzymes includeacyltransferases, methyltransferases and glycosyltransferases, while oxidative enzymes encompass those who introduce water-solubilizing oxygen functionalities during maturation and are probably most consequential for introduction of both scaffold and functional group complexity. While considering how to diversify nature's small molecule inventory, the oxygenase enzymes have proven to be most valuable to characterize with the twin goals of prediction of new natural product scaffolds and combinatorial engineering of intermediates.33, 34 The most prominent of the oxidative enzymes are the heme iron cytochrome P450 (CYPs) monoxygenases found in microbes such as Streptomyces species, Bacillus species and Cyanobacteria, etc. Traditionally, natural product genome mining means “use analysis of DNA sequence data to predict structural elements of new natural products and then use this information to design strategies for rapidly identifying, purifying, and structurally characterizing the compounds”.39, 40 However, the peculiarities of biosynthetic pathways indicate that textbook “co-linearity” rules cannot be applied to deduce structures from all DNA data.41, 42 Thanks to the development of gene cluster prediction software such as antiSMASH, genome mining has become a quick and inexpensive way to analyze the biosynthetic potential of sequenced microbes. The current genome mining approaches include sequence-based genome mining,44, 45 bioactivity-guided genome mining, enzyme-based genome mining, pattern-based genome mining48, 49 and genome neighborhood network analysis.50, 51 Sequence-based genome mining is designed to detect and extract carboxyl (C) – and keto-synthase (KS) – domains from DNA or amino acid sequence data. The high degrees of sequence similarity (E-value <10e−40 for eSNAPD or 85–90% protein sequence identity for NaPDoS) suggest that identified biosynthetic gene clusters are responsible for the biosynthesis of similar natural products resulting from reference gene clusters.6, 7, 44, 52 For bioactivity-guided genome mining, the conserved structural motif of bioactive natural products can be used as a reference to perform genome mining. For example, the enediyne PKS (PKSE) is proposed to be involved in the formation of the highly reactive chromophore ring structure (or “warhead”) found in all enediynes. By PKSE based genome mining, the enediyne biosynthetic gene clusters have been identified from sequenced actinomycete genomes. FK228 is an antitumor drug with tumor cytotoxicity resulting from the functional disulfide group, which is catalyzed by the FAD-dependent disulfide oxidoreductase.54, 55 BLAST analysis of FAD-dependent disulfide oxidoreductase has led to the discovery of a homologous gene cluster of thailandepsins in Burkholderia thailandensis E264. Enzyme-based genome mining searches conserved synthase domains against the NCBI database of sequenced bacterial genomes in order to obtain the presumptive enzyme sequences. The “pattern-based genome mining” refers to the connection of MS/MS fragmentation pattern to the biosynthetic pathways genome mining and de-replication of certain bacterial species. For example, the MS/MS fragmentation pattern of the 827.492 for arenicolide A production was used to identify the uncharacterized gene cluster in S. pacifica strains CNQ-748 and CNT-138. The “genome neighborhood networks (GNN) analysis” is a bioinformatics strategy to predict enzymatic functions on a large scale based on their genomic context. In this case, bioinformatics of PepM and phosphonate GNN were applied for 278 sequenced bacterial genomes and led to the discovery of 19 new phosphonate natural products. Enediyne GNNs were generated for the virtual screening of the sequenced bacterial genomes resulted in 87 potential enediyne gene clusters from 78 different bacteria strains. The pattern-based genome mining and the genome neighborhood analysis provide a comprehensive method to identify the biosynthetic gene cluster of sequenced bacterial strains. With the physiological and medical Nobel Prize awarding to natural product avermectin and artemisinin, the discovery of natural products has entered a golden age. Our long term goal is to create a high quality, diversified natural product library. In the postgenomic era, the challenges to generate a microbial natural product library have been switched from the traditional de-replication strategies to issues of how to translate the annotated biosynthetic gene clusters of interest to a bioactive natural product library. The current review will focus on genome mining of CYPs which are involved in the biosynthetic gene clusters of bacterial secondary metabolites, especially those with epoxide functional groups, with an intention to share the considerations to build a diversified natural product library through CYP pattern-based genome mining, direct cloning and heterologous expression, and genome manipulation.

Genome mining of CYP-catalyzed bacterial natural products

This section will cover the introduction of CYP, two genome mining methods, and application of genome mining in two genera.

The importance of microbial CYPs

Microbial natural products catalyzed by CYP biosynthetic pathways have diversified biological activities including antitumor activities, antibacterial activities, antifungal activities, anti-HIV activities, anti-parasitic and anti-cholesterol activities. Natural products such as pladienolides/FD-895,59, 60 GEX1/herboxidiene, FR901464 (FR)/spliceostatins/thailanstatins (Fig. 1A) are known to have antitumor activities by targeting pre-mRNA spliceosome. Specifically, pladienolide B and spliceostatin A have been reported to express their antiproliferative activities against tumor cells at the nM range by targeting the splicing factor 3B subunit 1 (SF3B1) with the help of three-membered epoxide groups. Furthermore, CYP-catalyzed natural products with antitumor activities may have different mechanisms. For instance, epothilone A expresses the tubulin-binding activity, griseorhodin A is telomerase inhibitor, tirandamycin is RNA polymerase inhibitor and epoxomicin is proteasome-inhibitor, etc (Fig. 1B).

Fig. 1

P450 related microbial natural compounds. Except for staurosporine, all the selected compounds are epoxide-containing compounds. Other epoxide compounds without known biosynthetic pathways, such as antitumor compound trapoxin, are not listed. Panel A: Natural products with known spliceosome inhibitory activities; Panel B: Natural products with other activities such as tubulin-binding compounds epothilone A, telomerase inhibitor griseorhodin A, RNA polymerase inhibitor tirandamycin and proteasome-inhibitor epoxomicin, etc.

Microbial CYPs are one of the most widely distributed groups of tailoring enzymes to catalyze the formation of the final bioactive natural products.31, 35, 37, 67, 68, 69 CYPs catalyze the hydroxylation and/or epoxidation reactions in the late stages of biosynthesis after macrolide formation by PKSs.70, 71, 72 Reactions catalyzed by CYP monooxygenases include hydroxylation of saturated C—H bonds, epoxidation of CC double bonds, and oxidative decarboxylation (Table 1). For example, The CYP PldB hydroxylates 6-deoxy pladienolide B to pladienolide B in the pladienolide biosynthetic pathway. The CYP HerG catalyzes the stereospecific hydroxylation at C-18 of herboxidiene. For FR biosynthesis, the CYP Fr9R is not only involved in the hydroxylation at C-4 of FR901464, but is also related to the formation of the hemiketal group at C-1. CYPs in the biosynthetic pathways of natural products often work together with its patterns such as transferases and epoxidases to catalyze the production of epoxide groups. Moreover, dual-function CYPs have been reported to catalyze sequential epoxidation and hydroxylation of the same substrate.65, 88, 107

Table 1

CYPs involved in the biosynthesis of microbial natural products.

P450 enzymes	No. amino acid (AA)	Accession no.	Match in the databasea				Reference
P450 enzymes	No. amino acid (AA)	Accession no.	CYP name as annotated	Protein identifier	Identity%	AA overlap	Reference
Hydroxylation of saturated C—H bonds
AmphL	396	AAK73504	CYP107E	AAK73504	100%	396	⁷³
AziB1	401	B4XY99.1	CYP107-like	WP_018771011.1	34%	388	⁷⁴
ChmPI	407	AAS79447	CYP107B	AAS79447	100%	407	⁷⁵
EpnK	401	AHB38512	CYP162-like	WP_030872520.1	40%	399	⁶⁶
EryF	404	Q00441.2	CYP107A	WP_009950397.1	100%	404	⁷⁶
EryK	397	CYP113A1	CYP113A-like	WP_009950895.1	100%	397	⁷⁷
HerG	422	AEZ64507	CYP107B-like	WP_016577953.1	48%	404	⁶¹
MeiE	459	AAM97314	CYP171A	AAM97314	100%	459	⁷⁸
MycCI	383	BAC57023	CYP105U	BAC57023	100%	383	⁷⁹
NcsB3	410	AAM77997	CYP154J	AAM77997	100%	410	⁸⁰
NysL	394	AAF71769	CYP107E	AAF71769	100%	394	⁸¹
OxyD	396	CCD33151	CYP146A	CCD33151	100%	396	⁸²
PikC	416	O87605	CYP107L	AAC64105	100%	416	⁸³
PldB	399	BAH02272	CYP107B-like	BAH02272	100%	399	⁵⁹
PteC	399	BAC68123	CYP105P1	WP_010981850.1	100%	399	⁸⁴
TylHI	436	AAD41818	CYP105U	AAD41818	100%	436	⁸⁵
TylI	417	AAA21341	CYP107-like	AAA21341	100%	417	⁸⁶
ZbmVIIc	416	ACG60779	CYP185-like	WP_030613068.1	51%	430	⁷⁸
TxtC	395	AAL36838.1	CYP105A	WP_009073396.1	41%	406	⁸⁷
Dual function, hydroxylation and epoxidation
GsfF	414	BAJ16472	CYP105A	BAJ16472	100%	414	⁸⁸
MycG	397	BAA03672	CYP107B	BAA03672	100%	397	⁸⁹
Fma-P450	400	AHL19974	–b				⁹⁰
Fr9R	482	AIC32704	CYP136-like	WP_022984814.1	32%	455	⁶²
PenD	298	ADO85592	–				⁹¹
PntD	299	ADO85576	–				⁹²
TamI	413	ADC79647.1	CYP107B-like	ADC79647.1	100%	413	⁶⁵
TstR	482	AGN11891	CYP136-like	BAP15297	33%	455	⁹³
Catalyze epoxidation
Asm30	1005	AAM54108	CYP102F	AAM54108	100%	1005	⁹⁴
ChmPII	401	AAS79446	CYP107B	AAS79446	100%	401	⁷⁵
EpnI	431	AHB38510	CYP107B-like	WP_030776170.1	48%	412	⁶⁶
EpoK	419	Q9KIZ4	CYP167A	Q9KIZ4	100%	419	⁶³
EpxC	425	AHB38496	CYP107B-like	WP_020576451.1	42%	406	⁶⁶
GrhO3	416	AAM33670	CYP105-like	AAM33670	100%	416	⁶⁴
HedR	409	AAP85338	CYP105-like	AAP85338	100%	409	⁹⁵
OleP	407	AAA92553	CYP107B	AAA92553	100%	407	⁹⁶
PimD	397	CAC20932	CYP107E	CAC20932	100%	397	⁹⁷
PimG	398	CAC20928	CYP105H	CAC20928	100%	398	⁹⁷
TamI	413	ADC79647	CYP107B-like	ADC79647	100%	413	⁹⁸
Oxidation of methyl group
AmphN	399	AAK73509	CYP105H	AAK73509	100%	399	⁹⁹
NysN	398	AAF71771	CYP105H	AAF71771	100%	398	⁸¹
C-C coupling
DynOrf19	403	ACB47070	CYP107M	WP_015621195.1	51%	398	¹⁰⁰
HmtS	397	CBZ42153	CYP113A	WP_030360446.1	52%	398	¹⁰¹
OxyB	398	AAL90878	CYP165B	Q8RN04	100%	398	¹⁰²
OxyC	406	AAL90879	CYP165B	Q8RN03	100%	406	¹⁰³
C-N-C coupling
DynE10	400	ACB47071	CYP107B	WP_029899036.1	53%	400	¹⁰⁰
SpcN	390	AGL96571	CYP244A	AGL96571	100%	390	¹⁰⁴
Catalyzes the nitration using NO and O₂
TxtE	406	ELP66108.1	CYPP450TXTE	ELP66108.1	100%	406	⁸⁷
Oxidative decarboxylation
HmtN	419	4E2P_A	CYP113A	4E2P_A	100%	419	¹⁰¹
HmtT	418	4GGV_A	CYP113A	4GGV_A	100%	418	¹⁰¹
Mei-Orf4	398	ADC45514	CYP107-like	ADC45514	100%	398	⁷⁸
StaP	426	ABI94389	CYP245A	ABI94389	100%	426	¹⁰⁵
SpcP	427	AGL96575	CYP245A	AGL96575	100%	427	¹⁰⁴

CYP names as annotated at website: https://cyped.biocatnet.de/.

Not found in database.

CYPs involved in the biosynthesis of microbial natural products. CYP names as annotated at website: https://cyped.biocatnet.de/. Not found in database.

Bioinformatics of microbial CYPs

The information on three-dimensional structures is essential to understand the molecular basis for substrate recognition and specificity of CYPs. It is universally thought that three elements are essential in all CYPs sequences: (a) the conserved cysteine, which is the fifth ligand to the heme Fe atom and can be represented as FXXGXXXCXG, (b) the EXXR motif forming a charge pair in the K helix (possibly involved in heme binding) and (c) overall CYP fold topology, including stability. However, there are some unusual examples for bacterial CYPs, such as variations at conserved motifs, heme incorporation and topology, substrate binding and functionalities. For example, the amino acid sequence of CYP157C1 contains EQSLW in place of the conserved EXXR. The conserved threonine in CYP107A1 is not present and a hydroxyl group of the substrate 6-deoxyerythronolide B can directly donate a hydrogen bond to the Fe-linked dioxygen for proton transfer. Different from CYP107A1, CYP158A2 binds two molecules of flaviolin in its active site, and the 2-OH group of flaviolin is responsible for anchoring the substrate in the active site, while the 5-OH and 7-OH stabilize water molecules which are important for catalysis. For the discovery of bacterial CYPs in sequenced genomes, genes encoding the CYP heme binding domain (FXXGXXXCXG) can be screened for the presence of a highly conserved threonine in the putative I-helix, which is proposed to be involved in CYP oxygen activation in most CYPs, and the conserved EXXR motif is located in the K-helix. The sequences of polypeptides containing all three motifs can be further used as queries for BLAST searches of the GenBank non-redundant protein database (www.ncbi.nlm.nih.gov/BLAST/) to identify their closest homologues in other organisms and tentatively assign the CYP proteins to subfamilies. Briefly, >40% of amino acid sequence identity places a CYP in the same family and >55% places it in the same subfamily. Besides GenBank, there is a CYPED database which scours Genbank by BLAST-searching and retrieving the CYP sequences to put in their database (https://cyped.biocatnet.de/).112, 113, 114, 115 This database contains the most bacterial CYPs (Table 1) that are involved in the biosynthesis of bioactive natural products (Fig. 1). There is also another online CYP database (http://drnelson.uthsc.edu/CytochromeP450.html), which contains1042 bacterial CYP genes. P450 related microbial natural compounds. Except for staurosporine, all the selected compounds are epoxide-containing compounds. Other epoxide compounds without known biosynthetic pathways, such as antitumor compound trapoxin, are not listed. Panel A: Natural products with known spliceosome inhibitory activities; Panel B: Natural products with other activities such as tubulin-binding compounds epothilone A, telomerase inhibitor griseorhodin A, RNA polymerase inhibitor tirandamycin and proteasome-inhibitor epoxomicin, etc.

The logic for CYP pattern-based genome mining

There is evidence that CYPs can be used for genome mining for pre-mRNA spliceosome inhibitors. First, all current pre-mRNA spliceosome inhibitors are epoxide-containing natural products (Fig. 1A) whose biosynthesis gene clusters contain genes encoding putative CYP oxygenases or epoxidases. It has been reported that the presence of epoxide group is important in conferring activity to FR analogues.116, 117 Second, CYPs may not directly be involved in the formation of some epoxide groups,70, 118 but CYPs are present in the relevant biosynthetic gene clusters (Fig. 2). For example, in a recent metagenome mining report, all six epoxyketone gene clusters contain CYPs, indicating that CYP-containing gene clusters may be a rich source for the biosynthesis of epoxide compounds. Third, even antitumor activities of CYP-catalyzed compounds are related to different mechanisms. Small molecule screenings identified that oxaspiro compounds (the farnesyltransferase inhibitor manumycin A analogues) are pre-mRNA splicing inhibitors. In addition, a number of compounds such as the CYP-related non-epoxide compounds staurosporine (kinase inhibitor), are novel inhibitors of spliceosome assembly. Fourth, homologous CYPs can be identified through genome mining by a 50–60% similarity except for the 80%–90% rule for PKS and nonribosomal peptide synthetase (NRPS) domain searching. For example, BLAST analysis demonstrated a 50% identity between CYPs PldB and HerG from pladienolide and herboxidiene (both are pre-mRNA spliceosome inhibitors), respectively. In the biosynthesis of FD-891, the CYP GfsF showed 57% identity to the CYP monooxygenase Mflv2418. Finally, the CYP gene can be used as a signature gene for gene cluster cloning and gene identification. However, CYP pattern should be considered because CYPs and other tailoring enzymes such as transferases and epoxidases act together on the released PKS or NRPS megasynthases68, 122, 123 (Fig. 3).

Fig. 2

Fig. 3

Pattern-based genome mining of homologous gene clusters of spliceosome inhibitors which contain CYPs. The biosynthetic gene clusters of thailanstatins (Accession no. KJ461964.1) and FR901464 (Accession no. JX307851.1), pladienolide (Accession no. AB435553.1) and herboxidiene (Accession no. JN671974.1) as well as staurosporine (Accession no. AB088119.1) were used for the search of homologous gene clusters by using antiSMASH 3.0.

Neighbor-joining tree of selected P450 enzymes including 13 putative P450 enzymes from Kutzneria species. CYPs catalyze epoxide formation (●), hydroxylations (■), oxidation of methyl groups (♦), decarboxylations (▲) and C–C or CNC formations (▼). The tree was generated by using MEGA6 using the neighbor joining method. Significant bootstrap values are indicated at the nodes. The scale bar represents 0.1 mutational events per site. PldB, Fr9R, GfsF and StaP associated gene clusters produce pre-mRNA spliceosome inhibitors pladienolid, FR901464, FD-891 and staurosporine, respectively. Pattern-based genome mining of homologous gene clusters of spliceosome inhibitors which contain CYPs. The biosynthetic gene clusters of thailanstatins (Accession no. KJ461964.1) and FR901464 (Accession no. JX307851.1), pladienolide (Accession no. AB435553.1) and herboxidiene (Accession no. JN671974.1) as well as staurosporine (Accession no. AB088119.1) were used for the search of homologous gene clusters by using antiSMASH 3.0.

Discovery of thailanstatins from Burkholderia sp.

Recently, the genus Burkholderia has attracted the attention of several research groups to employ different genome mining strategies for the study of bioactive natural products. Among those products, FR is a general spliceosome inhibitor discovered in 1992 from Pseudomonas sp. No. 2663,125, 126, 127 now identified as Burkholderia sp. FERM BP3421. There are three oxygenase activities encoded in FR gene cluster: (a) the flavin-dependent monooxygenase (FMO) domain in the last module of fr9GH, (b) the CYP encoded by fr9R, and (c) the Fe(II)/α-ketoglutarate-dependent dioxygenase encoded by fr9P. The hemiketal FR is mainly biosynthesized through the epoxidation at C3 by the FMO domain of Fr9GH, the hydroxylation at C-4 by the Fr9R, the hydroxylation at C-1 by Fr9P and then decarboxylation. First, BLAST analysis shows that Fr9R has a high identity (82%) to TstR in the sequenced genome of B. thailandensis MSMB43 (currently being described as “Burkholderia hymptydooensis”).93, 128, 129 Second, Fr9K is a key 3-hydroxy-3-methylglutaryl-CoA synthase (HCS) homologue-enzyme catalyzing the transfer of —CH2COO— from acyl-S-acyl carrier protein (ACP) to a β-ketothioester polyketide intermediate. Fr9K has a high identity (87%) to one of the enzymes (TstK) predicted from the genome of B. thailandensis MSMB43. Further, a regulator gene expression studies for optimization of production media, and targeted isolation and purification efforts toward diene compounds (UV235 nm), led to the discovery of three compounds called thailanstatins and a group of similar or identical compounds called spliceostatins130, 131 (Fig. 4). Particularly, thailanstatin A contains a carboxyl group, which not only makes it more stable than FR in PBS solution, but also leads to increased activities compared to FR when carboxylic groups were esterified (with enhanced membrane permeability).

Fig. 4

Genome mining protocol of Burkholderia thailandensis MSMB43 for thailanstatins/spliceostatins. Starting from the two key biosynthetic enzymes CYP Fr9R and 3-hydroxy-3-methylglutaryl-CoA synthase (HCS) Fr9K that catalyze the formation of the hemiketal hydroxyl group and the epoxide group of FR901464, the homologous enzymes and biosynthetic gene cluster of thailanstatin were predicted. The Reverse Transcriptional (RT)-PCR help to identify the optimum medium for the expression of regulatory gene tstA. The novel compounds were finally obtained through microbial fermentation, natural product isolation and structural elucidation by tracking of the characteristic UV absorption for the diene motif at UV235 nm.

Kutzneria species are potential resources of CYP-catalyzed compounds

In the Actinobacteria phylum, the genus Kutzneria is a minor branch of the Pseudonocardiaceae family, currently containing eight species (Fig. 5). Only aculeximycin and kutznerides are known to be produced by Kutzneria albida DSM 43870T and Kutzneria sp. 744, respectively.132, 133 Genome mining of Kutzneria species suggests that they could produce compounds similar to the previously reported pre-mRNA spliceosome inhibitors and a group of interesting CYP catalyzed compounds. For example, the pladienolide biosynthetic gene cluster has only four genes (including the pldB CYP gene, and the putative epoxidase encoding gene pldD) besides the PKS genes. Inspired by the successful discovery of thailanstatins, pladienolide B CYP monooxygenase-based bioinformatics analysis was conducted and revealed that the Kutzneria sp. 744 genome contains the most homologous (78%) enzyme KUTG_06291 (EWM16617) when CYP PldB sequence was used as input for homology search of the GeneBank (Fig. 6). Therefore, the genus Kutzneria was postulated to produce compounds similar to pladienolide. However, there are still a lot of gaps in the sequenced genome of Kutzneria sp. 744.

Fig. 5

Fig. 6

Strategy for the discovery of novel compounds from Kutzneria species. By starting with the structure of pladienolide B, and moving through phylogenetic analysis of the streptomycetes-related cytochrome P450s, Kutzneria species were selected for further analysis of P450-related polyketides. The genome sequencing of the closest strain of Kutzneria sp. 744 was finished on February, 2014 without detailed annotation and there are still a lot of gaps waiting to be closed.

Phylogenetic tree for taxa of the genus Kutzneria and other antitumor compound-producing species (Streptomyces and Burkholderia). The tree was calculated from complete 16S rRNA gene sequences using the neighbor joining method, illustrating the genus Kutzneria position relative to other selected species. All species are type strains except for the strains Kutzneria sp. 744, Burkholderia sp., MSMB43 and MSMB121. The 16S RNA sequence of the pladienolide producer Streptomyces platensis Mer-11107 is not available, but DNA hybridization suggests that Streptomyces platensis Mer-11107 has 87% similarity to the type strain CGMCC4.1975. Percentages at nodes represent levels of bootstrap support from 1000 resampled datasets; Scale bar, 0.02 nucleotide substitutions per site. Strategy for the discovery of novel compounds from Kutzneria species. By starting with the structure of pladienolide B, and moving through phylogenetic analysis of the streptomycetes-related cytochrome P450s, Kutzneria species were selected for further analysis of P450-related polyketides. The genome sequencing of the closest strain of Kutzneria sp. 744 was finished on February, 2014 without detailed annotation and there are still a lot of gaps waiting to be closed. Kutzneria albida DSM 43870T is the only Kutzneria strain with a complete sequenced genome. So far, the K. albida genome (9.87 Mb) is among the largest actinobacterial genomes sequenced. Thus, Kutzneria albida DSM 43870T was selected for further phylogenetic analysis of CYPs in parallel with a library of 44 CYPs originate from biosynthetic gene clusters of 32 bioactive natural products.37, 61, 62, 63, 64, 73, 74, 76, 78, 80, 81, 82, 83, 84, 85, 86, 88, 89, 90, 91, 92, 94, 95, 96, 97, 98, 99, 100, 102, 103, 104, 105 AntiSMASH analysis revealed that the K. albida DSM 43870T genome encodes 47 biosynthetic gene clusters which contain at least 13 different CYPs distributed in eight gene clusters (GC4, GC10, GC11, GC18, GC19, GC33, GC34 and GC40) (Fig. 5, Table S1). Among those gene clusters, only GC40 is known to be responsible for the biosynthesis of aculeximycin. The CYP KALB_6568 is predicted to be responsible for catalyzing the formation of C14 hydroxyl group. However, the compounds biosynthesized by the other CYP-related gene clusters are not known yet. BLAST analysis (Fig. 2) suggests that in the K. albida DSM 43870T genome, GC19 encodes compounds with CYP (KALB_3944, possessing 99% identity to Mei-Orf4)-encoded functional groups similar to meilingmycin and tirandamycin B (66% identity to TamI). As for GC11, the 99% identity between CYP KALB_3411 and Stap (in the staurosporine biosynthesis gene cluster) suggests that GC11 may be responsible for the biosynthesis of compounds with functional groups formed by similar oxidative decarboxylations. Additionally, antiSMASH analysis indicates that GC11 has a 20% similarity to the biosynthetic gene clusters of the pre-mRNA spliceosome inhibitor staurosporine. Similarly, the 99% identity between CYP KALB_3295 (in GC10) and Asm30 (CYP in ansamitocin biosynthesis gene cluster) implies that GC10 may produce epoxide compounds similar to ansamitocin. Similar analysis suggests that tirandamycin-like compounds may be produced through GC19 based on the 66% identity between KALB_3945 and TamI.98, 135 Unfortunately, the 27% identity between KALB_5792 (in GC33) and AziB1 (CYP in the azinomycin B biosynthetic gene cluster), and the 19% identity between KALB_5804 (in GC33) and EpoK (CYP in the epothilone biosynthesis gene cluster) make it unpredictable for the natural products biosynthesized by GC33. Except for Kutzneria albida DSM 43870T, another top five strains, Actinoplanes sp. N902-109 (NC_021191.1), Micromonospora aurantiaca ATCC 27029 (CP002162.1), Streptomyces violaceusniger Tu 4113 (CP002994.1), Streptomyces bingchenggensis BCW-1 (CP002047.1) and Streptomyces sp. PVA 94-07 (CM002273.1), were prioritized through CYP pattern-based genome mining of pladienolide biosynthetic pathway. These strains have the potential CYP pattern pathways to produce unknown compounds (Tables S2–S6).

Conceivable methods to translate biosynthetic pathways to natural products

This section covers the methods (Fig. 7) or tools that have been used or have the potential to be applied to translate biosynthetic pathways to natural products.

Fig. 7

Schematic representation of methods to translate biosynthetic pathways to novel natural products.

Direct cloning and heterologous expression of biosynthetic gene clusters

Heterologous expression was primarily used to avoid the lack of genetic tools and regulatory complexity in native hosts via the use of strains that are more amenable to engineering. Direct cloning, genetic engineering and heterologous expression of microbial natural product biosynthesis pathways were reviewed in 2013. The heterologous expression methods can also be borrowed from those for the discovery of small molecules from metagenomics cosmid library.7, 137, 138 The approaches to cloning targeted gene clusters directly from genomic DNA include RecE-mediated homologous recombination, oriTdirected capture, transformation-associated recombination (TAR)141, 142, 143 to facilitate functional expression experiments and phageφBT1 integrase-mediated site-specific recombination. Among those, TAR cloning in Saccharomyces cerevisiae has been extensively applied to capture and express large biosynthetic gene clusters from environmental DNA samples.145, 146, 147 Recently, the TAR direct cloning approach was used to clone targeted whole gene cluster from rare actinomycetes and Gram-negative bacteria. By using TAR, the targeted gene cluster was cloned into plasmid pCAP01 by homologous recombination in Saccharomyces cerevisiae strain VL6-48 in order to obtain the captured vector, and then the captured vector was transformed into the corresponding host Streptomycete coelicolor and E. coli for rare actinomycete and Gram-negative bacteria, respectively. An example is the heterologous production of lipopeptide taromycin A (from marine actinomycete Saccharomonospora sp. CNQ-490) in Streptomycete coelicolor M512 and the heterologous production of alterochromide lipopeptides (from Pseudoalteromonas piscicida JCM 20779) in E. coli BL21 (DE3) utilizing native and E. coli-based T7 promoter sequences. Since the development of a TAR-based genetic platform allows for heterologous production of lipopeptides in different hosts, efforts have been focused on the development of the high throughput TAR capture method to express the pathway. Successful production of the desired products often requires an optimal relationship of timing and flux between primary and secondary cellular metabolism. Besides the use of genetic engineering for the direct cloning of a whole gene cluster, there has been considerable interest in the development of engineered bacterial strains for efficient heterologous production of secondary metabolites.148, 149, 150, 151 For example, the deletion of a 1.4 Mb segment from the left subtelomeric region of the 9.02 Mb Streptomyces avermitilis genome resulted in the generation of two large-deletion mutants. By using these large-deletion mutants, i.e., S. avermitilis SUKA17 or 22, twenty of the entire biosynthetic gene clusters for secondary metabolites, including aminoglycosides, nonribosomal peptides, polyketides and terpenes were successfully expressed. The biosynthetic gene cluster of the polyketide pladienolide has been expressed in a deletion mutant of Streptomyces avermitilis with an extra copy of the regulatory gene pldR under control of an alternative promoter. It is worthy to note that the engineered hosts are not only useful for the production of exogenous secondary metabolites, but they also facilitate scale-up production and preparation of promising natural products due to their “clean” background.

Manipulation of the biosynthetic pathways

The recent approaches on the activation and up-regulation of microbial biosynthetic pathways for the discovery of natural products have been reviewed.153, 154 Except for these methods, there are four types of research reports that may explain how to translate biosynthetic gene clusters to novel natural products by using synthetic biology methods. First, optimized bioactive compounds can be produced by engineering the biosynthetic operon and reconstituting the biosynthetic pathway. The group led by Müller and Brönstrup successfully deleted the gene in hydroxymalonyl-CoA biosynthetic operon of the bengamide biosynthetic gene cluster and inserted a promoter on the expression construct pBen32. Heterologous expression of the modified biosynthetic gene cluster led to the discovery of more potent compounds. Second, new compounds with increased activities are produced by inactivation of CYP pattern genes. Bills' team works on the manipulation of pneumocandin biosynthetic pathway. They focused on the inactivation of three genes (GLP450-1, GLP450-2, and GLOXY1) and generated 13 different pneumocandin analogues that lack one, two, three, or four hydroxyl groups on 4R, 5R-dihydroxy-ornithine and 3S, 4S-dihydroxy-homotyrosine of the parent hexapeptide. Among them, seven analogues are previously unreported. Third, natural product analogues can be produced by creating promoter-driven tailoring enzyme constructs. Brady's group created an ermE promoter cassette, which was introduced to the upstream of the first ORF of the biosynthetic gene cluster of interest. Further introduction of the cosmid containing the constitutively expressed tailoring operon into S. toyocaensis:ΔStaL resulted in the production of three new glycopeptides. Fourth, stable compounds with increased titer are produced by engineering the CYP pathway. Researchers in Pfizer engineered a pAE-PF29 vector which enabled the overexpression of Fr9R encoded by CYP fr9R in Burkholderia sp. FERM BP-3421, and led to the enhanced production of stable thailanstatin A from spliceostatin C.

Synthetic biology tool kits

Except for the four examples of genetic manipulations, synthetic biology tools to edit bacterial genomes have been reviewed by other researchers. The Clustered Regularly Interspaced Short Palindromic Repeats and Cas proteins (CRISPR-Cas) systems are composed of a powerful and broadly applicable set of tools to manipulate Streptomyces genomes. There are three research groups that worked on the gene deletion of Streptomyces species by using different CRISPR-Cas toolkits. The group led by Zhao developed a temperature-sensitive pCRISPomyces system, which is applicable to the genome editing of Streptomyces lividans 66, Streptomyces viridochromogenes DSM 40736 and Streptomyces albus J1074. Lee and Webber's group focused on the gene deletion efficiency of actinorhodin biosynthetic gene cluster in Streptomyces coelicolor A3 by an engineered CRISPR-Cas system with improved efficiency. Sun's team worked on the discovery of a novel CRISPR-Cas system in combination with the counterselection system CodA(sm), and the D314A mutant of cytosine deaminase, to delete the actinorhodin biosynthetic gene cluster in Streptomyces coelicolor M145 genome.

The advantages and challenges of the above mentioned strategies

The advantages and challenges to generate a bioactive natural product library by CYP pattern-based genome mining as compared to the traditional strategies such as microbial fermentation with different media and chromatographic fractionation for different polarities can be summarized as follows.

Advantages

CYP is a class of most important tailoring enzymes that can be explored and exploited for the structural diversification of bioactive natural products. Genome mining is an efficient method. The combination of PCR screening and genome mining will prioritize the cryptic gene clusters of interest for annotation and heterologous expression. This is more efficient when compared to the traditional conception of one strain many active compounds (OSMAC). Genome mining can be used for large scale and high throughput method to accelerate drug discovery, as exemplified by a four-year case study on genome mining of 10,000 actinomycetes for 19 novel phosphonic acid natural products. Both heterologous expression and activation of cryptic pathways can generate novel compounds that have unique biological activities but are difficult to access using synthetic chemistry. CRISPR-cas9 enabled genome editing tools are very efficient. First, compared to the 5% (2/38) efficiency in the traditional in-frame gene deletion method, the efficiency of the CRISPR-Cas9 systems can be 100% without unwanted side- or off-target effects. Second, CRISPR-Cas9 systems can overcome the employment of the counter selectable marker for the selection of double crossover mutants. Third, CRISPR-Cas9 systems can avoid the scar sequence left at the target site and realize multiple gene deletions. Fourth, CRISPR/Cas system provides new modularity, which can target the site of interest by inserting of a short spacer into a CRISPR array/sgRNA construct. The insertion can be achieved with high throughput using modern DNA assembly techniques. CRISPR/Cas9 genome editing can shorten the two-month period of previous gene deletion method to the current 1–2 week time course.

Main challenges

How to prioritize the characterization of orphan biosynthetic gene clusters, and how to rapidly connect genes to biosynthesized small molecules will come with increase in DNA-sequenced data. Pathway manipulations in native hosts are subjected to the genetic complexity of the bacteria. For example, researchers have developed a procedure which takes about 10 days for the chromosomal knock out of Gram-negative bacteria B. pseudomallei by using the pheS-gat cassette-assisted by plasmid pKaKa2. However, this method is not applicable to the gene knockout of another strain called B. gladioli, which requires another pCR4Blunt-TOPO-based vector for the recombination assisted by plasmid pKD46. Up to now, Cas9 targeting is still limited by protospacer adjacent motif (PAM) sequences. Also, targeting efficiency is site-dependent. Although the development of genetic manipulation tools could greatly enhance the chances toward discovery and production of natural products, a major challenge in the process of microbial genome mining is to produce compounds in high titers. Bacterial genomes are publically available but the accessibility of strains of interest is limited due to the regulations of international or domestic properties. Decreasing DNA synthesis costs and advances in DNA assembly could help to solve the issue with limited material access. A larger fraction of strains that are isolated in research labs worldwide will be the future challenge.

Conclusion

The current review provides strategies on the discovery of a group of epoxide-containing natural products (Fig. 8), highlights (1) CYP and its pattern enzyme-based genome mining as a guidance for the generation of a diversified natural product library, (2) direct cloning and heterologous expression of biosynthetic pathways and genomic manipulation methods and tools will translate the selected biosynthetic gene clusters to the bioactive natural product library. It is worthy to note that tailoring reactions play important roles on diversifications of scaffolds such as polyketide, peptide and hybrid polyketide-peptide backbones, and often tailoring enzyme patterns such as CYPs, ligases, Cyclases, ketoreductases, transferases, and oxygenases can be used for genome mining. This advanced genome mining will avoid the de-replications of biosynthetic gene clusters to quickly identify the annotated biosynthetic gene clusters from the vast pool of sequenced bacterial genomes. It is predicted that we will be sequencing genomes for pennies as nearly as 2020 (https://youtu.be/j88APStUcp4). Synthetic biology tools pioneered by different researchers will be continuously developed toward a high-throughput potential due to the increasing numbers of sequenced bacterial genomes. This review also points out the potential mechanisms and diversity of CYPs in the microbial biosynthesis of natural product antitumor agents. The Kutzneria strains, as rare actinobacterial species, can be explored and exploited for CYP catalyzed compounds, and can be used as a rich resource for the diversification of microbial CYPs.

Fig. 8

The proposed protocol for the development of a natural product library in postgenomic stage.

161 in total

Review 1. The enzymology of combinatorial biosynthesis.

Authors: Christopher D Reeves
Journal: Crit Rev Biotechnol Date: 2003 Impact factor: 8.429

Review 2. Bioprospecting microbial natural product libraries from the marine environment for drug discovery.

Authors: Xiangyang Liu; Elizabeth Ashforth; Biao Ren; Fuhang Song; Huanqin Dai; Mei Liu; Jian Wang; Qiong Xie; Lixin Zhang
Journal: J Antibiot (Tokyo) Date: 2010-07-07 Impact factor: 2.649

3. Molecular networking as a dereplication strategy.

Authors: Jane Y Yang; Laura M Sanchez; Christopher M Rath; Xueting Liu; Paul D Boudreau; Nicole Bruns; Evgenia Glukhov; Anne Wodtke; Rafael de Felicio; Amanda Fenner; Weng Ruh Wong; Roger G Linington; Lixin Zhang; Hosana M Debonsi; William H Gerwick; Pieter C Dorrestein
Journal: J Nat Prod Date: 2013-09-11 Impact factor: 4.050

Review 4. [Strategies on the construction of high-quality microbial natural product library--a review].

Authors: Jiang Bian; Fuhang Song; Lixin Zhang
Journal: Wei Sheng Wu Xue Bao Date: 2008-08

Review 5. Exploiting cyanobacterial P450 pathways.

Authors: Faith O Robert; Jagroop Pandhal; Phillip C Wright
Journal: Curr Opin Microbiol Date: 2010-03-17 Impact factor: 7.934

6. Genome mining in Streptomyces. Elucidation of the role of Baeyer-Villiger monooxygenases and non-heme iron-dependent dehydrogenase/oxygenases in the final steps of the biosynthesis of pentalenolactone and neopentalenolactone.

Authors: Myung-Ji Seo; Dongqing Zhu; Saori Endo; Haruo Ikeda; David E Cane
Journal: Biochemistry Date: 2011-02-08 Impact factor: 3.162

7. Kutznerides 1-4, depsipeptides from the actinomycete Kutzneria sp. 744 inhabiting mycorrhizal roots of Picea abies seedlings.

Authors: Anders Broberg; Audrius Menkis; Rimvydas Vasiliauskas
Journal: J Nat Prod Date: 2006-01 Impact factor: 4.050

8. The hedamycin locus implicates a novel aromatic PKS priming mechanism.

Authors: Tsion Bililign; Chang-Gu Hyun; Jessica S Williams; Anne M Czisny; Jon S Thorson
Journal: Chem Biol Date: 2004-07

9. Process and metabolic strategies for improved production of Escherichia coli-derived 6-deoxyerythronolide B.

Authors: Blaine Pfeifer; Zhihao Hu; Peter Licari; Chaitan Khosla
Journal: Appl Environ Microbiol Date: 2002-07 Impact factor: 4.792

10. Construction of soil environmental DNA cosmid libraries and screening for clones that produce biologically active small molecules.

Authors: Sean F Brady
Journal: Nat Protoc Date: 2007 Impact factor: 13.491