Literature DB >> 32427317

ARTS 2.0: feature updates and expansion of the Antibiotic Resistant Target Seeker for comparative genome mining.

Mehmet Direnç Mungan^1,2, Mohammad Alanjary³, Kai Blin⁴, Tilmann Weber⁴, Marnix H Medema³, Nadine Ziemert^1,2.

Abstract

Multi-drug resistant pathogens have become a major threat to human health and new antibiotics are urgently needed. Most antibiotics are derived from secondary metabolites produced by bacteria. In order to avoid suicide, these bacteria usually encode resistance genes, in some cases within the biosynthetic gene cluster (BGC) of the respective antibiotic compound. Modern genome mining tools enable researchers to computationally detect and predict BGCs that encode the biosynthesis of secondary metabolites. The major challenge now is the prioritization of the most promising BGCs encoding antibiotics with novel modes of action. A recently developed target-directed genome mining approach allows researchers to predict the mode of action of the encoded compound of an uncharacterized BGC based on the presence of resistant target genes. In 2017, we introduced the 'Antibiotic Resistant Target Seeker' (ARTS). ARTS allows for specific and efficient genome mining for antibiotics with interesting and novel targets by rapidly linking housekeeping and known resistance genes to BGC proximity, duplication and horizontal gene transfer (HGT) events. Here, we present ARTS 2.0 available at http://arts.ziemertlab.com. ARTS 2.0 now includes options for automated target directed genome mining in all bacterial taxa as well as metagenomic data. Furthermore, it enables comparison of similar BGCs from different genomes and their putative resistance genes.

Entities: Chemical Disease Species

Year: 2020 PMID： 32427317 PMCID： PMC7319560 DOI： 10.1093/nar/gkaa374

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Due to the continuous increase of drug-resistant bacteria, antibiotic resistance is regarded as a global public health threat (1). The lack of new antibiotics with novel modes of action in the current drug development pipeline, makes finding new compounds to fight off resistant pathogens a critical task (2). Since the discovery of penicillin, secondary metabolites (SMs) produced by various living organisms have been foundational to the development of antimicrobial drugs (3). The majority of antibiotic compounds are isolated as natural products, from fungi and bacteria (4). For many decades, screening biological samples for desired bioactivity has been the traditional methodology for natural product discovery (5). Due to the high rediscovery rates and labor-intensive nature of the process, in silico methods have become a promising way to guide modern drug discovery efforts (6,7). Gene-centered methods, such as genome mining, enable researchers nowadays to computationally detect the biosynthetic gene clusters (BGCs) encoding enzymes necessary for the biosynthesis of antibiotics and predict encoded compounds (8). Over the last decade, greatly improved genome mining tools such as antiSMASH (9), EvoMining (10), PRISM (11) or DeepBGC (12) use methods like Hidden Markov Models, phylogeny or deep learning to highlight a variety of natural product classes. Combined with databases such as MIBiG (13), Natural Product Atlas (14) and the antiSMASH database (15), these tools allow for fast and efficient mining and dereplication of thousands of bacterial genomes and BGCs. According to the latest version of the Atlas of Biosynthetic Gene Clusters (IMG-ABC) (16) there currently are ∼400 000 predicted BGCs sequenced. Moreover, <1% of total clusters are experimentally verified, which leads to an important question: Which of these clusters should be further examined with wet lab experiments? Recently, researchers adopted a prioritization approach for antibiotic discovery that is based on the observation that antibiotic producers have to be resistant against their own products to avoid suicide (17). This so called target-directed or self resistance based genome mining approach allows the prediction of the mode of action of the encoded compound of an uncharacterized BGC based on resistance genes, in some cases co-located within the antibiotic BGC (18). Multiple resistance mechanisms exist, such as inactivation and export of antibiotics as well as target modification. In the latter case, a duplicated and antibiotic-resistant homologue of an essential housekeeping gene is detectable within the antibiotic BGC and allows the prediction of the mode of action of the encoded compound even without knowing a chemical structure (19–21). Moore et al., for example, were able to identify a fatty acid synthase inhibiting antibiotic by screening for duplicated fatty acid synthase genes within orphan BGCs (22). In 2017, we introduced the first version of the ‘Antibiotic Resistant Target Seeker’ (ARTS) (23), a user-friendly web server that automates target-directed genome mining to prioritize promising strains that produce antibiotics with new mode of actions. Since a resistant copy of the antibiotic target gene is typically detectable in the genome, can be observed within the BGC of the antibiotic and horizontally acquired with the BGC (23), ARTS automatically detects possible resistant housekeeping genes based on three criteria: duplication, localization within a biosynthetic gene cluster, and evidence of Horizontal Gene Transfer (HGT). One previous limitation of the ARTS pipeline was its focus on actinobacterial genomes. Although natural product discovery historically was highly focusing on the phylum Actinobacteria, prominent families from other phyla such as Proteobacteria or Firmicutes are known to have high natural product biosynthetic potential (24–26). Here, we introduce a greatly improved version 2 of the ARTS webserver, now allowing the analysis of the entire kingdom of bacteria, metagenomic data, and the comparison of multiple genomes. This update therefore will facilitate natural product prioritization and antibiotic discovery efforts beyond actinomycetes.

NEW FEATURES AND UPDATES

The workflow of the ARTS pipeline involves a few key steps: First, query genomes are screened for BGCs using antiSMASH (9). At the same time essential housekeeping (core) genes within the genome are determined using TIGRFAM models that have been identified by comparing a reference set of similar genomes (27) (Figure 1B). During the next steps the identified core and known resistance genes are screened for their location within BGCs. Duplication thresholds are determined for each core gene model, based on their respective frequencies among the reference set. Finally, possible HGT events are detected via phylogenetic screening with the help of constructed species trees and gene trees. All the results are summarized into interactive output tables.

Figure 1.

Outline representation of the ARTS pipeline. (A) Basic machinery of creating reference sets. Housekeeping core genes and duplication thresholds are detected per clade of organisms and gene alignments and trees are created for fast HGT detection. (B) Workflow with multi-genome comparative analysis. Input data is screened for ARTS selection criteria. All found BGCs are then subjected to BiG-SCAPE clustering algorithm. Finally, interactive output tables are presented for comparative analysis.

Reference sets of organisms and core genes

Since the determination of core gene content and the construction of phylogenetic trees is more specific and accurate when query genomes are compared with genomes from similar organisms, we aimed to generate phylum specific reference sets. However, since the number of genomes in the different phyla varied significantly, reference sets were sometimes also created by class or a group of closely related phyla (Supplementary Table S1). In a first step, sequences of all classified bacteria were downloaded through NCBI’s RefSeq database (28) for further evaluation (Figure 1A). Redundant sequences were filtered with MASH (29) with a +95% similarity cut off. Where applicable, only complete genomes were used in a reference set. If the number and diversity of complete genomes within a phylum was not sufficient (distributed among a genus or two with <100 sequences), contig-level assemblies were also taken into consideration to expand the particular reference. Around 330 genome sequences were used for the creation of each individual reference set, which sum up to 4936 genomes in total. Based on the number of genomes for each reference set, different boundaries were then selected for phyla with different levels of diversity. Given the diversity and large number of proteobacterial genomes deposited in Refseq (30), four different reference sets were created for proteobacterial genomes (Alpha, Beta, Gamma, Delta-Epsilon). In cases where a phylum does not comprise sufficient sequenced genome sequences (less than 100 genomes), multiple phyla were grouped into one reference set. In that way, 22 phyla were grouped into three reference sets. Groupings were based on phylogenetic distances in the tree of life (31) and the NCBI Lifemap (32). Another feature of the grouped sets is the high coverage of bacteria from harsh environments, allowing the analysis of extremophiles. For example, group 2, which was created from 214 organisms, is mainly comprised of the phyla Thermotogae and Chloroflexi (Supplementary Table S1), which are known to be mostly thermophilic (33,34).

Reference set and core gene analysis

Determination of core genes

Core genes were determined for each reference set using the method developed for the previous version of ARTS (23). Subsequently, the core genes from each set were compared with sequences from the Database of Essential Genes (DEG)v 1.5 (46). On average, 85% of genes had a match to one or more records (Supplementary Table S2). The majority of the genes that are not found in DEG belong to the gene categories ’unclassified’, ’unknown function’ or ’energy metabolism’. Furthermore, functional classification of each reference set revealed that, on average, genes with functions such as protein and amino acid synthesis, energy and metabolism were the most abundant as would be expected from essential genes (Supplementary Figure S1). The importance of individual reference sets is highlighted by the fact that one set only accounts for ∼40% of the total unique core genes from all sets (Supplementary Table S4). Additionally, the reliability of the generated gene trees for each reference set were estimated by branch support (Supplementary Figure S2) and comparison to taxonomically correct species trees generated by the Accurate Species TRee ALgorithm (ASTRAL) (47) (Supplementary data).

Positive controls and detection frequencies

In order to test ARTS’ ability to detect resistant targets in non-actinobacterial genomes using the new reference sets, we analyzed known examples of self-resistance mechanisms. We identified several known non-actinobacterial examples as positive controls (Table 1). Out of 11 antibiotic natural products with identified resistance mechanisms, five of them had available genome sequences regarding specific isolates that contained respective BGCs. All of these cases showed at least two ARTS hits when run in normal mode with default cutoffs. To detect the accA gene, a known transferase, exploration mode had to be used. Otherwise, ARTS 2.0 predicted resistance genes in almost all control BGCs except one. The CoA reductase resistant gene was not detected because specific CoA reductase models were missing in both the core and known resistance set. We also analyzed ∼5000 genomes belonging to all reference sets for statistical evaluation (Supplementary Table S3). On average, only one gene model shows positive hits for three or more ARTS criteria. Also, most of the core genes from the respective sets are found in each analyzed genome. Around 2–5% of core genes are highlighted for each criterion. The percent of core genes that went through HGT is in conformity with the HGT estimate levels in the literature (48,49).

Table 1.

Default ARTS analysis for positive examples of genomes and BGCs with known self-resistance mechanisms

Product	Resistance gene	Organism	ARTS hits	Criteria hits (>2, >3)	Genes (core, total)
Thiocillin	ribosomal protein L11(35)	Bacillus cereus ATCC 14579	D,B,P	9,1	472, 5231
Myxovirescin	lspa: signal peptidase II(36)	Myxococcus xanthus DK 1622	D,B,P	15,2	372, 7267
Thailandamide	accA: acetyl-CoA carboxylase(37)	Burkholderia thailandensis E264	D,B,P,R*	42, 5	838, 6347
Indolmycin	trypS: tryptophan-tRNA synthetase(38)	Pseudoalteromonas luteoviolacea	D,B	13, 2	540, 4963
Agrocin 84	leu tRNA synthase(39)	Agrobacterium radiobacter K84	D,P	41, 2	470, 6876
Bengamide	methionine aminopeptidase(40)	Myxococcus virescens DSM 15898	Core	N/A	1, 18
Mupirocin	Ile-tRNA synthetase(41)	Pseudomonas fluorescens NCIMB 10586	Core	N/A	1, 36
Andrimid	accD: acetyl-CoA carboxylase(42)	Pantoea agglomerans Eh335	Core	N/A	1, 18
Cystobactamid	Pentapeptide repeat protein(43)	Cystobacter sp. Cbv34	R*	N/A	0, 24
Phaseolotoxin	ornithine carbamoyltransferase(44)	Pseudomonas savastanoi pv. phaseolicola	Core, R*	N/A	3, 26
Kalimantacin	fabl: enoyl reductase(45)	Pseudomonas fluorescens BCCM ID9359	No hits	N/A	3, 29

Hits to ARTS criteria are shown as; D: duplication, B: BGC proximity, P: phylogeny, R: resistance model. Rows in gray indicate only complete gene cluster as input rather than whole genome. Stars indicate exploration mode.

Default ARTS analysis for positive examples of genomes and BGCs with known self-resistance mechanisms Hits to ARTS criteria are shown as; D: duplication, B: BGC proximity, P: phylogeny, R: resistance model. Rows in gray indicate only complete gene cluster as input rather than whole genome. Stars indicate exploration mode.

Reference sets for metagenomic data

Since metagenomic approaches are becoming increasingly popular in natural product research (50,51), submissions of whole metagenomes to the ARTS webserver are also showing a significant increase. Therefore, we have built an additional reference set available for metagenome analysis, which does not include phylogeny and duplications. Given that metagenomes are usually quite diverse and comprise more than one single phylum, core genes are defined as genes belonging to the Database of Essential Genes (DEG) (Supplementary Table S3).

Comparative analysis

ARTS 2.0 now makes it easier for users to analyze multiple genomes and applies a comparative analysis of provided organisms (Figure 2). Throughout the analysis, individual ARTS results are accessible upon completion of each run. Once all the sequences of interest are analyzed, an interactive summary table representing all genomes with each resulting criterion is provided. In addition, shared core genes with their respective hits and their observed frequencies among all genomes can be inspected via dynamic output tables. This aids in further prioritizing ARTS hits for those that are detected in multiple contexts or related BGCs and therefore are more likely to be involved in resistance. For example, users can now narrow HGT hits by inspecting those that are shared across multiple organisms. In addition to these data, the BiG-SCAPE algorithm (52) is applied on all detected BGCs, allowing users to investigate similar BGCs from multiple sources by constructing gene cluster sequence similarity networks and identifying gene cluster families inside these networks. Furthermore, each of the BGCs in a gene cluster family can be examined in order to assess whether they have core or resistance models as shared hits, as well as whether a cluster stands out with unique hits compared to its relatives from other species.

Figure 2.

Example output of multi-genome ARTS analysis. Top part of the page represents the summaries of individual arts runs and shared core genes throughout the whole analysis with respective ARTS hits. At the bottom, shared BGCs and resistance models can easily be navigated and an interactive BiG-SCAPE graph output can also be found via ‘Open BiG-SCAPE overview” option.

Server-side updates and speed up

In order to keep the ARTS pipeline at high standards, third party tools used in the workflow were updated. ARTS 2.0 now uses antiSMASH v5 and is able to analyze antiSMASH results from their newest JSON format. The most time consuming part of the ARTS pipeline is the creation of species and gene trees for phylogenetic analysis via ASTRAL. By updating antiSMASH and ASTRAL, the average runtime of the whole pipeline could now be cut down to half. Also, in order to satisfy the increasing demand, ARTS 2.0 is now hosted at the highly scalable de.NBI cloud system with seven times the computational power. With these hardware and software updates, the ARTS 2.0 webserver is now capable of analyzing multiple inputs up to 100MB and depending on the genomes and selected parameters, 3-8 times faster than the previous version.

CONCLUSIONS AND FUTURE PERSPECTIVES

Currently, ARTS is the only platform to automate resistance and putative drug-target based genome mining in bacteria via a user-friendly webserver. By design, ARTS aims to survey a wide scope of potential genes as drug targets while minimizing manual inspection by using the dynamic output and multiple screening criteria for more confident target predictions. Thus it is incumbent on the user to examine potential hits with provided metadata and contextual framing. Some of the ARTS hits might be more likely involved in biosynthesis and not associated with resistance. Although we removed common biosynthesis genes from the core gene sets to avoid false positives (23), it is currently not possible to automatically distinguish if genes are more likely involved in biosynthesis or resistance, for example fatty acid synthases are involved in both (22). The occasional high counts of positive hits in exploration mode, largely due to undefined cluster boundaries, can be easily and rapidly filtered in the interactive output page. As shown previously, this inspection can even serve to help define the true boundaries of clusters, which remains a largely unresolved challenge when dealing with bacterial BGCs (23). Newly introduced features now make ARTS 2.0 a fast and comprehensive pipeline allowing users to: analyze sequences from all bacterial genomes as well as metagenomic samples, apply comparative analysis on multiple genomes, and interrogate similar BGCs for shared resistant genes. For future applications, we are working on increasing ARTS’ availability by making it directly accessible through other webservers such as antiSMASH. This will enable researchers to easily apply target-directed genome mining approaches on sequences from different databases as a plugin. Furthermore, we are currently in process of creating the ARTS database, which will contain preanalyzed ARTS results for all bacterial genomes within the Refseq database, and will allow global analysis and comparisons of resistant targets within BGC. We hope that with this update, ARTS 2.0 will now provide an even broader access to resistance based genome mining methods and facilitate the discovery of competitive antibiotics. Click here for additional data file.

52 in total

1. Thirteen posttranslational modifications convert a 14-residue peptide into the antibiotic thiocillin.

Authors: Laura C Wieland Brown; Michael G Acker; Jon Clardy; Christopher T Walsh; Michael A Fischbach
Journal: Proc Natl Acad Sci U S A Date: 2009-02-05 Impact factor: 11.205

Review 2. Phylogeny and molecular signatures for the phylum Thermotogae and its subgroups.

Authors: Radhey S Gupta; Vaibhav Bhandari
Journal: Antonie Van Leeuwenhoek Date: 2011-04-19 Impact factor: 2.271

3. Cystobactamids: myxobacterial topoisomerase inhibitors exhibiting potent antibacterial activity.

Authors: Sascha Baumann; Jennifer Herrmann; Ritesh Raju; Heinrich Steinmetz; Kathrin I Mohr; Stephan Hüttel; Kirsten Harmrolfs; Marc Stadler; Rolf Müller
Journal: Angew Chem Int Ed Engl Date: 2014-12-15 Impact factor: 15.336

Review 4. Discovery of novel bioactive natural products driven by genome mining.

Authors: Zhongyue Li; Deyu Zhu; Yuemao Shen
Journal: Drug Discov Ther Date: 2018

5. MIBiG 2.0: a repository for biosynthetic gene clusters of known function.

Authors: Satria A Kautsar; Kai Blin; Simon Shaw; Jorge C Navarro-Muñoz; Barbara R Terlouw; Justin J J van der Hooft; Jeffrey A van Santen; Vittorio Tracanna; Hernando G Suarez Duran; Victòria Pascal Andreu; Nelly Selem-Mojica; Mohammad Alanjary; Serina L Robinson; George Lund; Samuel C Epstein; Ashley C Sisto; Louise K Charkoudian; Jérôme Collemare; Roger G Linington; Tilmann Weber; Marnix H Medema
Journal: Nucleic Acids Res Date: 2020-01-08 Impact factor: 16.971

6. antiSMASH 5.0: updates to the secondary metabolite genome mining pipeline.

Authors: Kai Blin; Simon Shaw; Katharina Steinke; Rasmus Villebro; Nadine Ziemert; Sang Yup Lee; Marnix H Medema; Tilmann Weber
Journal: Nucleic Acids Res Date: 2019-07-02 Impact factor: 16.971

7. The Natural Products Atlas: An Open Access Knowledge Base for Microbial Natural Products Discovery.

Authors: Jeffrey A van Santen; Grégoire Jacob; Amrit Leen Singh; Victor Aniebok; Marcy J Balunas; Derek Bunsko; Fausto Carnevale Neto; Laia Castaño-Espriu; Chen Chang; Trevor N Clark; Jessica L Cleary Little; David A Delgadillo; Pieter C Dorrestein; Katherine R Duncan; Joseph M Egan; Melissa M Galey; F P Jake Haeckl; Alex Hua; Alison H Hughes; Dasha Iskakova; Aswad Khadilkar; Jung-Ho Lee; Sanghoon Lee; Nicole LeGrow; Dennis Y Liu; Jocelyn M Macho; Catherine S McCaughey; Marnix H Medema; Ram P Neupane; Timothy J O'Donnell; Jasmine S Paula; Laura M Sanchez; Anam F Shaikh; Sylvia Soldatou; Barbara R Terlouw; Tuan Anh Tran; Mercia Valentine; Justin J J van der Hooft; Duy A Vo; Mingxun Wang; Darryl Wilson; Katherine E Zink; Roger G Linington
Journal: ACS Cent Sci Date: 2019-11-14 Impact factor: 14.553

8. Ornithine Transcarbamylase ArgK Plays a Dual role for the Self-defense of Phaseolotoxin Producing Pseudomonas syringae pv. phaseolicola.

Authors: Li Chen; Pin Li; Zixin Deng; Changming Zhao
Journal: Sci Rep Date: 2015-08-10 Impact factor: 4.379

9. DEG 10, an update of the database of essential genes that includes both protein-coding genes and noncoding genomic elements.

Authors: Hao Luo; Yan Lin; Feng Gao; Chun-Ting Zhang; Ren Zhang
Journal: Nucleic Acids Res Date: 2013-11-15 Impact factor: 16.971

10. Resistance-gene-directed discovery of a natural-product herbicide with a new mode of action.

Authors: Yan Yan; Qikun Liu; Xin Zang; Shuguang Yuan; Undramaa Bat-Erdene; Calvin Nguyen; Jianhua Gan; Jiahai Zhou; Steven E Jacobsen; Yi Tang
Journal: Nature Date: 2018-07-11 Impact factor: 49.962

31 in total

1. Genomic insight into a novel actinobacterium, Actinomadura rubrisoli sp. nov., reveals high potential for bioactive metabolites.

Authors: Hilal Ay
Journal: Antonie Van Leeuwenhoek Date: 2021-01-15 Impact factor: 2.271

Review 2. Strategies to access biosynthetic novelty in bacterial genomes for drug discovery.

Authors: Franziska Hemmerling; Jörn Piel
Journal: Nat Rev Drug Discov Date: 2022-03-16 Impact factor: 84.694

Review 3. Metabolomics and genomics in natural products research: complementary tools for targeting new chemical entities.

Authors: Lindsay K Caesar; Rana Montaser; Nancy P Keller; Neil L Kelleher
Journal: Nat Prod Rep Date: 2021-11-17 Impact factor: 13.423

4. Comprehensive genome analysis of a novel actinobacterium with high potential for biotechnological applications, Nonomuraea aridisoli sp. nov., isolated from desert soil.

Authors: Hayrettin Saygin; Hilal Ay; Kiymet Guven; Demet Cetin; Nevzat Sahin
Journal: Antonie Van Leeuwenhoek Date: 2021-09-16 Impact factor: 2.271

5. The Natural Product Domain Seeker version 2 (NaPDoS2) webtool relates ketosynthase phylogeny to biosynthetic function.

Authors: Leesa J Klau; Sheila Podell; Kaitlin E Creamer; Alyssa M Demko; Hans W Singh; Eric E Allen; Bradley S Moore; Nadine Ziemert; Anne Catrin Letzel; Paul R Jensen
Journal: J Biol Chem Date: 2022-09-12 Impact factor: 5.486