Literature DB >> 31620779

PADS Arsenal: a database of prokaryotic defense systems related genes.

Yadong Zhang1,2,3,4, Zhewen Zhang1,2,3, Hao Zhang1,2,3,4, Yongbing Zhao5, Zaichao Zhang6, Jingfa Xiao1,2,3,4.   

Abstract

Defense systems are vital weapons for prokaryotes to resist heterologous DNA and survive from the constant invasion of viruses, and they are widely used in biochemistry investigation and antimicrobial drug research. So far, numerous types of defense systems have been discovered, but there is no comprehensive defense systems database to organize prokaryotic defense gene datasets. To fill this gap, we unveil the prokaryotic antiviral defense system (PADS) Arsenal (https://bigd.big.ac.cn/padsarsenal), a public database dedicated to gathering, storing, analyzing and visualizing prokaryotic defense gene datasets. The initial version of PADS Arsenal integrates 18 distinctive categories of defense system with the annotation of 6 600 264 genes retrieved from 63,701 genomes across 33 390 species of archaea and bacteria. PADS Arsenal provides various ways to retrieve defense systems related genes information and visualize them with multifarious function modes. Moreover, an online analysis pipeline is integrated into PADS Arsenal to facilitate annotation and evolutionary analysis of defense genes. PADS Arsenal can also visualize the dynamic variation information of defense genes from pan-genome analysis. Overall, PADS Arsenal is a state-of-the-art open comprehensive resource to accelerate the research of prokaryotic defense systems.
© The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Year:  2020        PMID: 31620779      PMCID: PMC7145686          DOI: 10.1093/nar/gkz916

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

As mentioned in the Red Queen hypothesis, the ongoing and competitive arms race is one of the most powerful driving factors in co-evolution between prokaryotic organisms and viruses (1–3). As a consequence, prokaryotes have evolved numerous diverse and elaborate defense systems to protect themselves against viruses (4). Based on their action modes, the defense systems can be divided into two major groups, immunity and dormancy induction or programmed cell death (5,6). The immunity group contains restriction-modification (RM) system (7,8), DNA phosphorothioation system (known as DND system) (9–11), defense island system associated with restriction-modification (DISARM) system (12), bacteriophage exclusion (BREX) system (13), prokaryotic Argonautes (pAgos) system (14,15), and clustered regularly interspaced short palindromic repeats and adjacent to cas genes (CRISPR-Cas) system (16–19). The dormancy induction or programmed cell death by infection group includes toxin-antitoxin (TA) system (20–22) and abortive infection (ABI) system (23). Recently, several new types of defense systems have been discovered, such as DRUANTIA, GABIJA, and ZORYA (24). All of these defense systems not only prevent the introduction of heterologous DNA from plasmids or viruses, but also are widely applied in multiple fields, such as ABI system and RM system to avoid phage contamination in the fermentation industry (23,25,26), CRISPR-Cas system in precise genetic editing in biochemistry (27,28), TA system in picking cloning and living bacterial cellular single protein expression (29). Several databases have been developed to integrate different defense systems. CRISPRdb and CRISPRone collect data of spacers and repeats, provide tools to search and display CRISPR-associated genes (30,31); REBASE is centered on RM system about restriction enzymes, methylases, and methylation specificity (32); TADB integrates information of type 2 toxin-antitoxin loci and genetic features and provides similarity search, genome context browse, and phylogenetic tools (33). However, all the databases or platforms mentioned above are only focused on a single defense system or subtype. Confronting the ever-increasing prokaryotic genomic data and the fast-emerging newfound defense systems, an integrated database embedding an in-depth analysis platform for multiple defense systems is an urgent need. To fill this gap, here we present PADS Arsenal, a comprehensive database of prokaryotic defense systems related genes. With a large collection of prokaryotic genomic data from public databases, PADS Arsenal is dedicated to gathering, storing, analyzing and visualizing prokaryotic defense system gene datasets over 33 000 species.

DATABASE IMPLEMENTATION

In terms of data collecting, all prokaryotic genomic data were retrieved from NCBI (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/) (34). For the identification of defense systems related genes, we first extracted some defense systems related genes as seed sequences from literature curation (12,13,19,24). In order to expand the seed dataset, we also downloaded protein families/sequences from COG (35), Pfam (36), REBASE (32), TIGRFAMs (37) and TADB (33) databases. Second, PSI-BLAST (38) was adopted to homology search of defense systems related genes. Sequences with identity value ≥30% were selected as putative defense systems related genes for further analyses (39). Third, all putative defense systems related genes were confirmed by checking conserved domains within the defense genes using InterProScan (40). In addition, we also randomly selected some strains from eight species (Pseudomonas aeruginosa, Bacillus cytotoxicus, Listeria ivanovii, Listeria monocytogenes, Neisseria meningitides, Streptococcus pyogenes, Escherichia coli and Mycoplasma pneumoniae) in PADS Arsenal for quality control. The identified CRISPR-Cas systems related genes in these strains were compared to the results of a well-known CRISPR-Cas systems identification tool CRISPRCasFinder (41). About 96% cas genes detected by CRISPRCasFinder were archived in PADS Arsenal. The reason for a small amount of gene missing was the slightly lower coverage of our seed datasets. We will integrate more seed sequences in the next version of PADS Arsenal for higher defense genes detection rate. Prokka was employed for genome annotation (42), Roary was applied for defense system gene orthologous clustering (43), ComplexHeatmap was used to construct the heatmap of defense system gene (44), and MAFFT was utilized for multiple sequences alignment (45). As for database construction, we used PHP, HTML5, CSS, Bootstrap, JQuery for front-end rendering and implementation of interactive events. Echarts, D3, circosJs, MSAviewer (46), phylotree.js (47) were adopted for building interactive graphs. DataTables and Bootstrap Table were used to render data tables. On the back-end, MySQL was employed to store data, and finally bioinformatics applications were achieved with PSI-BLAST (38), MAFFT (45), PhyML (48) and Python.

DATABASE CONTENT AND USAGE

In current version 1.0, we have annotated 6 600 264 genes from 18 distinctive categories of defense systems. These genes were retrieved from 63 701 genomes, a total of 33 390 species across archaea and bacteria (Table 1). PADS Arsenal not only provides a user-friendly interface but also a rich analysis function, which offers flexible ways to retrieve and present a dynamic interactive defense systems related genes annotation pipeline.
Table 1.

The statistics of annotated genes of each defense system in PADS Arsenal

Defense systemArchaea (1043 species)Bacteria (32 347 species)
Abortive infection/phage exclusion systems (ABI)90970 595
Bacteriophage Exclusion (BREX)9648465 752
Clustered regularly interspaced short palindromic repeats with cas genes (CRISPR-CAS)10 836143 345
Defence island system associated with restriction–modification (DISARM)6461414 500
DNA phosphorothioation (DND)186699 150
DRUANTIA8642726 041
GABIJA2728321 625
HACHIMAN9675713 810
KIWA564780
LAMASSU1026127 721
Prokaryotic Argonautes (PAGOS)901539
Restriction-Modification (RM)13 2761 016 565
SEPTU990151 706
SHEDU162658
Toxin–Antitoxin (TA)19 9971 227 160
THOERIS66345 056
WADJET231198 485
ZORYA7456873 130
The statistics of annotated genes of each defense system in PADS Arsenal In the browse module, all the completed prokaryotic genomes can be visualized by different taxonomic hierarchies. Users can select a taxonomy label and type some characters in the input box and click the corresponding taxonomic group. For example, searching for E. coli (Figure 1), a table of all related strains with a color bar will show up. Users can intuitively observe the composition of defense systems related genes and their corresponding strains and the composition variations of defense systems related genes between different strains. Each colored block can be clicked to show the details of all genes in that defense system. In addition, the last thumbnail click is used to display the information of the locus of the defense systems related genes, GC content and GC skew value of the genome by Circos graph. Strips of different colors represent different types of defense systems related genes, and each strip can be clicked for further information. Users can estimate the regions of defense island by combining all information of the arrangement of defense systems related genes across the genome, GC skew value, and the difference in GC content compared to the average of the genome.
Figure 1.

Screenshots of browse page. (A) The E. coli search table based on species label at the browse page. (B) The ZORYA defense system gene table of E. coli BL21(DE3) by clicking the colored block. (C) Circos graphs of E. coli BL21(DE3) by clicking the shortcut link. (D) The detail information about a ZORYA defense system gene by clicking the strip (only partially shown).

Screenshots of browse page. (A) The E. coli search table based on species label at the browse page. (B) The ZORYA defense system gene table of E. coli BL21(DE3) by clicking the colored block. (C) Circos graphs of E. coli BL21(DE3) by clicking the shortcut link. (D) The detail information about a ZORYA defense system gene by clicking the strip (only partially shown). To better search and explore the database, we provide four searching approaches (Figure 2). System-based and gene-based approaches can be applied when users are interested in a certain system or gene in a defense system, respectively. Species-based and assembly accession-based searches are also provided when users look for a species or an assembly accession ID. The results collected by the four searching approaches are identical, such as defense system category, defense system subtype, and gene symbol.
Figure 2.

Screenshots of search page. (A) Species-based search results with Acidianus hospitalis. (B) Assembly accession-based search results for ‘GCA_900248165.1’. (C) System-based search for ABI defense system. (D) Gene-based search for the cas6 gene of CRISPR–Cas system.

Screenshots of search page. (A) Species-based search results with Acidianus hospitalis. (B) Assembly accession-based search results for ‘GCA_900248165.1’. (C) System-based search for ABI defense system. (D) Gene-based search for the cas6 gene of CRISPR–Cas system. An interactive online pipeline of defense systems related gene annotation is integrated in the analysis module, combining the function of sequence homology search, multiple sequence alignment, and phylogenetic analysis. Users can upload a protein sequence for sequence similarity search. The targeting sequences will be further filtered by blast identity value and users can select seed sequences of interest for multiple sequence alignment. Users can also construct a phylogenetic tree to further annotate their uploaded sequence. For instance (Figure 3), we present the example sequence of DND and BREX systems and show the related results of homologous sequences search, multiple sequence alignment, and phylogenetic analysis in return.
Figure 3.

Screenshots of annotation page. (A) The upload of a sequence, the program selection and the parameters settings. (B) The preliminary results of the annotation and settings the filtering threshold. (C) Selected filtered results based on the threshold and parameters for multiple sequence alignment and building an evolutionary tree. (D) The result of multiple sequence alignment (only partially shown due to limited space). (E) The constructed evolutionary tree.

Screenshots of annotation page. (A) The upload of a sequence, the program selection and the parameters settings. (B) The preliminary results of the annotation and settings the filtering threshold. (C) Selected filtered results based on the threshold and parameters for multiple sequence alignment and building an evolutionary tree. (D) The result of multiple sequence alignment (only partially shown due to limited space). (E) The constructed evolutionary tree. Gene conservation is an important character for understanding the mechanism of defense system. To visualize the dynamic variation of defense systems related genes across species, a static presence-absence variation (PAV) analysis function is integrated in PADS Arsenal. In PAV analysis, users can select a species of interest to view the heatmap of PAV analysis result, by which users will choose a defense system to view the dynamic variation of defense systems related genes at the species-level from the insight of pan-genome. All defense system gene families (core, shared, unique) are listed in a table. For example, the results of searched Chlamydia muridarum and selected DISARM defense system are shown in Figure 4. For further interpretation, the heatmap of C. muridarum suggests that genes associated with DISARM system are highly conserved. In addition, the orthologous clustering of defense system genes identified in PAV analysis also paves a way for downstream analyses.
Figure 4.

Screenshots of PAV analysis page. (A) The heatmap of defense system genes distribution for C. muridarum. (B) The detailed information of DISARM defense system orthologous gene clusters based on the heatmap. (C) The results of multiple sequence alignment of an orthologous gene cluster by clicking the ‘MSA’ button.

Screenshots of PAV analysis page. (A) The heatmap of defense system genes distribution for C. muridarum. (B) The detailed information of DISARM defense system orthologous gene clusters based on the heatmap. (C) The results of multiple sequence alignment of an orthologous gene cluster by clicking the ‘MSA’ button. In the statistic module, interactive charts are provided (Supplementary Figure S1). Users can get the overall distribution of defense systems related genes in archaea and bacteria kingdom through two pie charts. In the histogram, two browsing modes (single/multiple) are provided based on multiple taxonomic hierarchies (from phylum to genus). Users can recognize the presence-absence condition of different defense systems related genes at different taxonomic hierarchies by dynamic histograms. For instance, ZORYA defense systems related genes are widespread in phyla under archaea, while Abi genes are more specific and only observed in some archaeal genera. In addition, our statistics results for four species E. coli, S. enterica, S. pyogenes and M. pneumoniae show that some defense systems (TA, RM and ZORYA) might include different numbers of defense genes in different strains from the same species (Supplementary Figure S2). However, defense genes numbers in GABIJA, LAMASSU and WADJET defense systems are relatively stable. All the processed results for these 6 600 264 defense systems related genes are publicly available at the download section. Besides, we also provide the data tables retrieved from the browse page and the search page, as well as the results of PAV analysis and online annotation.

FUTURE DIRECTIONS

Over the last several decades, defense systems related genes have been served as important editing, engineering and regulation tools due to their natural and powerful enzymatic activities, and the development of these tools has gone through two generations to date (6). RM enzymes were used as key genetic engineering tools in the early stage (49–51). Recently, CRISPR–Cas systems have been widely used as genetic editing tools with its functional diversity, which includes versatile mechanisms of crRNA guide processing, self/non-self discrimination, and target cleavage (48). Moreover, prokaryotic Argonaute proteins have been reported to mediate nucleic acid-guided cleavage of cognate DNA targets (52,53) or RNA targets (54,55) in vitro. This might lead to a new generation of genome-editing tools (56,57). In this study, we construct PADS Arsenal in a wide variety of application, including displaying defense systems related genes in a complete genome-scale at different taxonomic hierarchies, searching defense systems related genes, annotating and analyzing specific sequences with multiple tools and depicting dynamic variation of defense systems related genes across species. PADS Arsenal archives defense systems related genes rather than indicating complete defense systems. This is mainly because there are no definite descriptions of complete system or active defense system for some multiple gene systems (more than three genes in a system), such as DISARM, DND and Druantia. The integrity identification of all the 18 defense systems or their subtypes is a great challenge and it is also the future development direction for PADS Arsenal. In current version, PADS Arsenal will help users to detect potential defense systems related genes as engineering tools, but none of these systems can be functional if they are not complete. For defense systems integrity, we count the number of strains with or without complete systems, the results presented that RM and TA defense systems are complete in all analyzed strains of E. coli, S. enterica and S. pyogenes (Supplementary Figure S3). This implies that the complete RM and TA defense systems might be essential for these species. However, the integrity of HACHIMAN, KIWA and SEPTU defense systems shows dynamic changes in different strains of the same species (E. coli and S. enterica). Some recent studies indicate that the defense genes are the most evolutionarily dynamic functional class of genes and the gene loss is about three times more than gene gain (57,58). There will be many new defense systems that have yet to be discovered (2,5). In future, PADS Arsenal, as one of the important database resources in BIG Data Center (59), will continuously collect and organize more types of defense systems and prokaryotic genomic data. Defense islands, formed by many physically clustered genes that are involved in archaeal and bacterial defense functions, provide a shortcut for discovering new defense systems (4,6,60). We will develop and integrate novel prediction methods to facilitate the identification of defense islands. In some defense systems, genomic modification plays a key role in self/non-self discrimination, for instance, in the BREX system, methylation on the fifth locus of non-palindromic TAGGAG motifs to guide self/non-self discrimination (13); and in the DISARM system, methylation on the second locus of CCWGG motifs as a marker of self DNA (12). And with a greater integration with motif and gene modification site information of self/non-self discrimination through literature curation and deep mining of genome modification information will be a welcome improvement. Click here for additional data file.
  58 in total

1.  MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform.

Authors:  Kazutaka Katoh; Kazuharu Misawa; Kei-ichi Kuma; Takashi Miyata
Journal:  Nucleic Acids Res       Date:  2002-07-15       Impact factor: 16.971

Review 2.  Prokaryotic toxin-antitoxin stress response loci.

Authors:  Kenn Gerdes; Susanne K Christensen; Anders Løbner-Olesen
Journal:  Nat Rev Microbiol       Date:  2005-05       Impact factor: 60.633

3.  DNA phosphorothioation is widespread and quantized in bacterial genomes.

Authors:  Lianrong Wang; Shi Chen; Kevin L Vergin; Stephen J Giovannoni; Simon W Chan; Michael S DeMott; Koli Taghizadeh; Otto X Cordero; Michael Cutler; Sonia Timberlake; Eric J Alm; Martin F Polz; Jarone Pinhassi; Zixin Deng; Peter C Dedon
Journal:  Proc Natl Acad Sci U S A       Date:  2011-02-01       Impact factor: 11.205

Review 4.  The phage-host arms race: shaping the evolution of microbes.

Authors:  Adi Stern; Rotem Sorek
Journal:  Bioessays       Date:  2011-01       Impact factor: 4.345

5.  Defense islands in bacterial and archaeal genomes and prediction of novel defense systems.

Authors:  Kira S Makarova; Yuri I Wolf; Sagi Snir; Eugene V Koonin
Journal:  J Bacteriol       Date:  2011-09-09       Impact factor: 3.490

Review 6.  Prokaryotic Argonaute proteins: novel genome-editing tools?

Authors:  Jorrit W Hegge; Daan C Swarts; John van der Oost
Journal:  Nat Rev Microbiol       Date:  2017-07-24       Impact factor: 60.633

Review 7.  DNA modification and restriction.

Authors:  W Arber; S Linn
Journal:  Annu Rev Biochem       Date:  1969       Impact factor: 23.643

Review 8.  Toxin-antitoxin systems in bacterial growth arrest and persistence.

Authors:  Rebecca Page; Wolfgang Peti
Journal:  Nat Chem Biol       Date:  2016-04       Impact factor: 15.040

9.  CRISPRCasFinder, an update of CRISRFinder, includes a portable version, enhanced performance and integrates search for Cas proteins.

Authors:  David Couvin; Aude Bernheim; Claire Toffano-Nioche; Marie Touchon; Juraj Michalik; Bertrand Néron; Eduardo P C Rocha; Gilles Vergnaud; Daniel Gautheret; Christine Pourcel
Journal:  Nucleic Acids Res       Date:  2018-07-02       Impact factor: 16.971

10.  Systematic discovery of antiphage defense systems in the microbial pangenome.

Authors:  Shany Doron; Sarah Melamed; Gal Ofir; Azita Leavitt; Anna Lopatina; Mai Keren; Gil Amitai; Rotem Sorek
Journal:  Science       Date:  2018-01-25       Impact factor: 47.728

View more
  11 in total

1.  Adaptive Laboratory Evolution as a Means To Generate Lactococcus lactis Strains with Improved Thermotolerance and Ability To Autolyze.

Authors:  Robin Dorau; Jun Chen; Jianming Liu; Peter Ruhdal Jensen; Christian Solem
Journal:  Appl Environ Microbiol       Date:  2021-08-18       Impact factor: 4.792

2.  Systematic and quantitative view of the antiviral arsenal of prokaryotes.

Authors:  Florian Tesson; Alexandre Hervé; Ernest Mordret; Marie Touchon; Camille d'Humières; Jean Cury; Aude Bernheim
Journal:  Nat Commun       Date:  2022-05-10       Impact factor: 17.694

3.  Functional genomics reveals the toxin-antitoxin repertoire and AbiE activity in Serratia.

Authors:  Hannah G Hampton; Leah M Smith; Shaun Ferguson; Sean Meaden; Simon A Jackson; Peter C Fineran
Journal:  Microb Genom       Date:  2020-11

4.  In Silico Characterization of Toxin-Antitoxin Systems in Campylobacter Isolates Recovered from Food Sources and Sporadic Human Illness.

Authors:  Bishoy Wadie; Mohamed A Abdel-Fattah; Alshymaa Yousef; Shaimaa F Mouftah; Mohamed Elhadidy; Tamer Z Salem
Journal:  Genes (Basel)       Date:  2021-01-07       Impact factor: 4.096

5.  Genomic Analysis of Molecular Bacterial Mechanisms of Resistance to Phage Infection.

Authors:  Antón Ambroa; Lucia Blasco; María López; Olga Pacios; Inés Bleriot; Laura Fernández-García; Manuel González de Aledo; Concha Ortiz-Cartagena; Andrew Millard; María Tomás
Journal:  Front Microbiol       Date:  2022-02-17       Impact factor: 5.640

6.  The Hyperthermophilic Restriction-Modification Systems of Thermococcus kodakarensis Protect Genome Integrity.

Authors:  Kelly M Zatopek; Brett W Burkhart; Richard D Morgan; Alexandra M Gehring; Kristin A Scott; Thomas J Santangelo; Andrew F Gardner
Journal:  Front Microbiol       Date:  2021-05-20       Impact factor: 5.640

Review 7.  The 27th annual Nucleic Acids Research database issue and molecular biology database collection.

Authors:  Daniel J Rigden; Xosé M Fernández
Journal:  Nucleic Acids Res       Date:  2020-01-08       Impact factor: 16.971

8.  An Interplay between Viruses and Bacteria Associated with the White Sea Sponges Revealed by Metagenomics.

Authors:  Anastasiia Rusanova; Victor Fedorchuk; Stepan Toshchakov; Svetlana Dubiley; Dmitry Sutormin
Journal:  Life (Basel)       Date:  2021-12-24

Review 9.  Phages in the Gut Ecosystem.

Authors:  Michele Zuppi; Heather L Hendrickson; Justin M O'Sullivan; Tommi Vatanen
Journal:  Front Cell Infect Microbiol       Date:  2022-01-04       Impact factor: 5.293

10.  Identification and classification of antiviral defence systems in bacteria and archaea with PADLOC reveals new system types.

Authors:  Leighton J Payne; Thomas C Todeschini; Yi Wu; Benjamin J Perry; Clive W Ronson; Peter C Fineran; Franklin L Nobrega; Simon A Jackson
Journal:  Nucleic Acids Res       Date:  2021-11-08       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.