| Literature DB >> 36068239 |
Vincent Libis1, Logan W MacIntyre1, Rabia Mehmood1, Liliana Guerrero1, Melinda A Ternei1, Niv Antonovsky1, Ján Burian1, Zongqiang Wang1, Sean F Brady2.
Abstract
Bacterial genomes contain large reservoirs of biosynthetic gene clusters (BGCs) that are predicted to encode unexplored natural products. Heterologous expression of previously unstudied BGCs should facilitate the discovery of additional therapeutically relevant bioactive molecules from bacterial culture collections, but the large-scale manipulation of BGCs remains cumbersome. Here, we describe a method to parallelize the identification, mobilization and heterologous expression of BGCs. Our solution simultaneously captures large numbers of BGCs by cloning the genomes of a strain collection in a large-insert library and uses the CONKAT-seq (co-occurrence network analysis of targeted sequences) sequencing pipeline to efficiently localize clones carrying intact BGCs which represent candidates for heterologous expression. Our discovery of several natural products, including an antibiotic that is active against multi-drug resistant Staphylococcus aureus, demonstrates the potential of leveraging economies of scale with this approach to systematically interrogate cryptic BGCs contained in strain collections.Entities:
Mesh:
Substances:
Year: 2022 PMID: 36068239 PMCID: PMC9448795 DOI: 10.1038/s41467-022-32858-0
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 17.694
Fig. 1General strategy for multiplexed capture, identification, and mobilization of biosynthetic gene clusters (BGCs).
A strain collection is cloned at random in the form of a large-insert genomic library (PACs P1-derived Artificial Chromosomes). CONKAT-seq (co-occurrence network analysis of targeted sequences) is used to detect captured BGCs, evaluate their novelty and determine their physical location in the arrayed library. Finally, cryptic BGCs are mobilized into heterologous hosts to access their products.
Fig. 2Experimental performance of a streamlined genes-to-molecules discovery pipeline.
a Amplicons derived from biosynthetic genes found in close proximity on chromosomes are connected in the form of networks using CONKAT-seq. Networks may correspond to a single BGC or in some cases multiple BGCs found close to each other on the original chromosome. Each node in a network corresponds to an amplified biosynthetic domain and is colored based on its amino acid identity to the closest known BGC for which a metabolite has been reported. b Individual BGCs are transferred into hosts and cultures are interrogated by LC-HRMS. Untargeted metabolomics is used to identify BGC-specific mass features through an all-versus-all comparison. c Examples of extracted ion chromatograms (EIC) of BGC-specific masses associated with cryptic BGCs. BGCs are depicted on the left with their core PKS and NRPS genes colored in black. In each row, the signal associated with a BGC is depicted in blue when in S. lividans and red when in S. albus. d Metabolites identified in this study. Source data are provided as a Source Data file.