Literature DB >> 26000295

A large-scale structural classification of antimicrobial peptides.

Hao-Ting Lee1, Chen-Che Lee1, Je-Ruei Yang1, Jim Z C Lai1, Kuan Y Chang1.   

Abstract

Antimicrobial peptides (AMPs) are potent drug candidates against microbial organisms such as bacteria, fungi, parasites, and viruses. AMPs have abundant sequences and structures, two fundamental resources for bioinformatics researches, but analyses on how they associate with each other are either nonexistent or limited to partial classification and data. We thus present A Database of Anti-Microbial peptides (ADAM), which contains 7,007 unique sequences and 759 structures, to systematically establish comprehensive associations between AMP sequences and structures through structural folds and to provide an easy access to view their relationships. 30 distinct AMP structural fold clusters with more than one structure are detected and about a thousand AMPs are associated with at least one structural fold cluster. According to ADAM, AMP structural folds are limited-AMPs only cover about 3% of the overall protein fold space.

Entities:  

Mesh:

Substances:

Year:  2015        PMID: 26000295      PMCID: PMC4426897          DOI: 10.1155/2015/475062

Source DB:  PubMed          Journal:  Biomed Res Int            Impact factor:   3.411


1. Introduction

Antimicrobial peptides (AMPs) are potent drug candidates against microbial organisms such as bacteria, fungi, parasites, and viruses. Up to date, more than 10 AMPs have entered clinical trials [1]. Due to the importance, several databases dedicated to AMPs were released in the past few years. Some databases are species-specific such as BACTIBASE [2], BAGEL2 [3], DADP [4], PenBase [5], and PhytAMP [6]; some curate a broad spectrum of species such as AMPer [7], APD2 [8], CAMP2 [9], DAMPD [10], Defensins Knowledgebase [11], and YADAMP [12]. The sizes of these databases range from hundreds to a couple of thousand AMP sequences. However, none of these databases contains all. Understanding sequence-structure relationships is important for AMP-based drug design. However, one of the major limitations in AMP databases is poorly utilizing structural information. Like AMP sequences, various AMP structures have been resolved. Classified by secondary structures, four traditional AMP structures are alpha helices, beta strands, loop structures, and extended structures [13, 14]. An alternative structural classification using peptide backbone torsion angles also shows many different AMP folds [1]. Few AMP databases such as APD2 have attempted to associate AMP sequences with their secondary structures. However, none has established associations between AMP sequences and AMP structural folds. Examining AMP tertiary structures would help us understand AMPs better and enhance potential antimicrobial drug discovery. In this work, we present A Database of Anti-Microbial peptides (ADAM) (available at http://bioinformatics.cs.ntou.edu.tw/ADAM). ADAM collects AMPs comprehensively and establishes associations systematically between AMP sequences and structures. Integrated from various sources, ADAM contains the most complete AMP sequences and structures. ADAM not only allows biomedical researchers to search basic AMP information but also provides an easy access to link AMP sequences to structures and vice versa.

2. Data Collection and Methods

2.1. AMP Sequences

ADAM contains 7,007 unique AMP sequences extracted from twelve databases (Figure 1). The twelve databases include APD2 [8], AVPpred [15], BACTIBASE [2], BAGEL3 [3], CAMP2 [9], DADP [4], DAMPD [10], HIPdb [16], PenBase [5], PhytAMP [6], RAPD [17], and YADAMP [12]. The AMP sequences in ADAM were mostly derived from natural sources, covering a broad spectrum of species such as archaea, bacteria, plants, and animals. 2497 out of the 7,007 sequences have been validated experimentally and recorded in literature. Table 1 compares the AMPs of the twelve databases. The CAMP2 contains the most overlapping sequences among the large AMP databases such as APD2, CAMP2, DAMPD, DADP, and YADAMP. For species-specific AMP databases, AVP and HIPdb are found to contain less overlapping sequences.
Figure 1

Simplified conceputal diagram of ADAM.

Table 1

Comparison of overlapping identical sequence counts of the twelve AMP databases.

APD CAMPDADPDAMPDYADBACTIBBAGELPenBasePhytAMPRAPDAVPHIPdb
APD 2436 2100744376160186391107555633
CAMP2100 3052 8585861994122561145716533
DADP744858 1792 220772000051311
DAMPD376586220 1068 5283170519101811
YADAMP16011994772528 2782 11343160677649
BACTIBASE86122031113 204 52001001
BAGEL39560704352 431 00010
PenBase1105100 28 0000
PhytAMP10714501960000 272 3100
RAPD55715106710003 119 95
AVP5665131876010109 604 156
HIPdb333311114910005156 744
Each unique AMP sequence was assigned an ADAM ID. The ADAM ID is linked to the basic AMP information, structural view, physicochemical properties, amino acid composition, and external resources. The structural view displays the best corresponding PDB structure and, if available, the representative PDB structure of the fold cluster which this AMP sequence belongs to. The physicochemical properties list peptide length, net charge, instability index∗∗, aliphatic index∗∗, and grand average of hydropathicity index∗∗. The composition is the ratio of each amino acid in the AMP. The other resources are linked to PDB, CATH, SCOP, Pfam, and other AMP databases associated with this AMP (∗∗see Supplementary Material available online at http://dx.doi.org/10.1155/2015/475062).

2.2. AMP Structures

The AMP structures were obtained by running BLAST of the experimentally validated AMP sequences against the Protein Data Bank [18]. 408 sequences had 759 matching structures with either 100% sequence identity or at least 90% identity sequence with the E-value < 10−5. Each matching structure was annotated by SCOP v1.75B [19] and CATH v4.0 [20]. Because not every AMP structure had CATH or SCOP annotation, one could not determine all unique AMP structural folds simply based on these annotations. Tables 2 and 3 record the number of the AMP structures according to CATH v4.0 and SCOP v1.75B, respectively. Four hierarchical levels of CATH are class, architecture, topology, and homologous superfamily; four levels of SCOP are class, fold, superfamily, and family. The topology level of CATH corresponds to the fold level of SCOP. The AMP structures appear at the entire four fundamental CATH classes (Table 2) and seven SCOP classes (Table 3). Within 759 AMP structures, 40 out of 1375 CATH folds (Table 2) and 47 out of 1390 SCOP folds (Table 3) are found. These AMP structures cover about 3% of the protein fold space defined by CATH and SCOP.
Table 2

Structural classification of the AMPs according to CATH v4.0 classification.

ClassArchitectureTopologyHomologous superfamily
ADAM4114041
CATH 4.044013752738
Table 3

Structural classification of the AMPs according to SCOP v1.75B.

ClassFoldSuperfamilyFamily
ADAM7475372
SCOP 1.75B11139022204609

2.3. AMP Structural Fold Clusters

A graph-based clustering procedure was applied for accessing the unique AMP folds. In this graph, the vertices represent AMP structures and there is an edge between two vertices if the two AMP structures are similar. The AMP structures came from the previous BLAST results. Only 264 best matching structures were collected under more stringent selection conditions. Each AMP is allowed to have at most one best matching structure, and multiple AMPs can map to the same AMP structure. The similarity of two AMP structures was then measured by TM-score, whose value ranges from 0 to 1 [21]. An edge exits if its TM-score > 0.5, which indicates that the two structures should belong to the same fold [22]. 136 AMP fold clusters were formed with 30 clusters containing more than one AMP structure, as shown in Figure 2. The top 10 common AMP structural folds with CATH and SCOP annotations are listed in Table 4. The structural fold clusters can have the same CATH and SCOP annotations as cluster #1 in Table 4. One CATH fold can map to multiple SCOP folds as cluster #4 in Table 4; one SCOP fold can also map to multiple CATH folds as cluster #9 in Table 4. Note that some AMP structures have neither CATH nor SCOP annotation.
Figure 2

Nework representation of AMP structral fold clusters.

Table 4

Top 10 common AMP structural folds annotated by CATH and SCOP.

AMP structural foldsCATHSCOP
Fold cluster IDClassArchitectureTopologyClassFold
1Alpha beta2-layer sandwichDefensin A-likeSmall proteinsKnottins

2Mainly betaBeta barrelOB foldAlpha and beta proteins (a + b)IL8-like

3Mainly alphaUp-down bundleSingle alpha-helices involved in coiled-coils or other helix-helix interfacesPeptidesAntimicrobial helix

3PeptidesLiposaccharide-binding protein CAP18

3PeptidesPeptide hormones

4Alpha betaRollAntimicrobial peptide, beta-defensin 2; chain ASmall proteinsDefensin-like

5Small proteinsKnottins

6Mainly alphaOrthogonal bundleHistone, subunit AAll alpha proteinsHistone-fold

7Mainly alphaOrthogonal bundleLysozymeAlpha and beta proteins (a + b)Lysozyme-like

8Alpha beta2-layer sandwichCrambinSmall proteinsCrambin-like

9Mainly alphaOrthogonal bundleNK-lysinAll alpha proteinsSaposin-like

9Mainly alphaUp-down bundleBacteriocin As-48; chain AAll alpha proteinsSaposin-like

10Alpha betaRollP-30 proteinAlpha and beta proteins (a + b)RNase A-like
The vertices represent the AMP structures and an edge between two vertices exists if the TM-score > 0.5, indicating the two structures as the two verctices fall into the same fold [22]. Among the 136 fold clusters in ADAM, 30 of them which contain more than one structure are displayed here.

2.4. AMP Structures Associated with ADAM Sequences

From AMP sequences to structures, AMP structures were obtained by performing BLAST on the experimentally validated AMPs against PDB. From AMP structures to ADAM sequences, about one-eighth of the ADAM sequences, over a thousand sequences, were found to associate with the AMP structures, which were determined by running BLAST against the best matching AMP structures with the E-value < 10−5. Here we list the top 10 common Pfam domains and families [23] found in the experimentally validated AMPs and their associations with the AMP structural fold clusters (Table 5). Out of these common Pfam domains and families, seven of them fall within the top 10 AMP structural folds. Table 5 also indicates that no structures are available for the AMPs with Pfam family antimicrobial_1.
Table 5

Top 10 common Pfam domains and families associated with the AMP structural folds.

PfamAMP structural fold cluster ID
1Antimicrobial_23
2Antimicrobial_1NA
3Defensin_beta4
4Gamma-thionin1
5Cyclotide5
6Defensin_21
7Defensin_14
8Bacteriocin_II33
9Cecropin106
10DD_K3

3. Implementations and Results

ADAM was built using AppServ 2.6.0. The Apache HTTP server was applied, the server-side scripts were written in PHP, and the database was built by MySQL.

3.1. Multiple Search Capabilities

ADAM offers multiple search capabilities, which can be classified into two basic categories: sequence search and structural search. Each AMP entry is assigned an ADAM ID, which would have a unique sequence and, if found, a corresponding structure. The sequence search covers the direct information of an AMP sequence, including the description, source species, sequence length, and Pfam domain. ADAM which focuses on AMP structure and sequence information does not contain all of the information that other AMP databases provide. Therefore, external links to other AMP databases are also provided in ADAM. In addition, the structural search allows users to retrieve the AMP information associated with specific PDB structures or ADAM fold clusters.

3.2. Structure-Sequence Cluster Browsing

ADAM offers 136 AMP fold clusters built by TM-score for browsing. Each structure in the AMP cluster is annotated by CATH, SCOP, and Pfam, if available. The AMP structures from all of the clusters occupy about 3% of the protein fold space defined by CATH and SCOP. Each cluster would list the associated AMP sequences. For example, ADAM cluster #1 (AC_001) is a cluster of 26 structures associated with 207 AMP sequences. Detailed information can be found at Table S1. These structures in this cluster gathered by TM-score are consistently classified into the same CATH fold, alpha-beta 2-layer sandwich defensin A-like structure, and the same SCOP fold, small protein knottins. SCOP further classifies these structures into four different SCOP families. In addition, this AMP structural fold is found to associate with six different Pfam domains, including antimicrobial_6, defensin_2, gamma-thionin, toxin_2, toxin_3, and toxin_37, which supports that different sequences which fold into the same structure could behave similarly. Another interesting example is ADAM cluster #5 (AC_005), which contains 53 AMP sequences involved with cyclotide Pfam family. Within this cluster, only four structures are annotated by SCOP. All of the four structures are again classified into the same SCOP fold, knottins, but fall into multiple SCOP families. ADAM also allows users to extract the relevant AMP structures according to CATH or SCOP classification by the underneath hyperlinks. In fact, both structure-to-sequence and sequence-to-structure browsing can be performed in ADAM. Each AMP cluster is further examined. An interesting phenomenon is observed that peptides in one AMP cluster consistently belong to the same mechanism of microbial killing, either transmembrane pore formation or metabolic inhibition of intracellular targets [24], suggesting that AMP structures may play a role in the killing action. For example, the AMPs in ADAM cluster #3 (AC_003) belong to the mechanism of transmembrane pore formation; those in ADAM cluster #6 (AC_006) are the metabolic inhibitors for the intracellular targets.

4. Discussions

ADAM, which is a comprehensive AMP database, provides an easy access to AMP sequences, structures, and their relations. Two distinct characters of ADAM are its size and sequence-structure analysis. ADAM contains 7,007 unique AMP sequences and 759 structures. To our knowledge, this is the first comprehensive study to analyze various AMP structural folds. Our analysis demonstrates that AMP structures cover about ~3% of the overall CATH or SCOP folds. Biologically this infers more than one scheme for AMPs to fight microbes. The results also indicate that AMP structural folds are limited. The majority of the protein structural folds lack antimicrobial activities. The development of ADAM raises some interesting research topics, which are beyond the scope of this study, still waiting to be explored. To name a few, for example, Table 5 shows that little is known of the structure of Pfam family antimicrobial_1. Such AMP structures need to be resolved by X-ray crystallography or NMR spectroscopy; Table 4 demonstrates a prolonged discussion that CATH and SCOP classifications are not always consistent with each other [21]. The best approach to annotate protein structure is still to be determined. Despite sequence differences between Pfam antimicrobial_2 and DD_K domains, the two domains somehow share the same alpha-helical structural fold: how the two different domains maintain the same structural fold as well as antimicrobial activities still needs more studies. ADAM, which offers complete AMP sequence and structure information, can benefit a number of different AMP researches such as biomimetics in drug development, comparative immunomics, and structure-function analysis. For example, ADAM cluster #1 (AC_001) has 26 structures associated with 207 AMP sequences (Table S1). Not every structure in the cluster has annotations, but those which do belong to the same CATH and SCOP fold, matching with six different kinds of Pfam families. Such information can help to identify key elements for antimicrobial drug design. The content of the Supplementary Material can be classified into three main categories: (1) an example of ADAM fold cluster, (2) the technical descriptions for aliphatic index**, instability index**, and hydropathicity**, and (3) the frequently asked questions about ADAM. In more detail, Table S1 lists ADAM fold cluster AC_001 with 26 AMP structures associated with 207 unique AMP sequences. Figure S1 and S2 illustrate how to browse through ADAM either from AMP sequence-structure or AMP structure-sequence. Figure S3 and S4 show the basis of the AMP prediction tools using support vector machines and hidden Markov models, which are provided in ADAM.
  24 in total

1.  How significant is a protein structure similarity with TM-score = 0.5?

Authors:  Jinrui Xu; Yang Zhang
Journal:  Bioinformatics       Date:  2010-02-17       Impact factor: 6.937

2.  RAPD: a database of recombinantly-produced antimicrobial peptides.

Authors:  Yifeng Li; Zhengxin Chen
Journal:  FEMS Microbiol Lett       Date:  2008-12       Impact factor: 2.742

Review 3.  The expanding scope of antimicrobial peptide structures and their modes of action.

Authors:  Leonard T Nguyen; Evan F Haney; Hans J Vogel
Journal:  Trends Biotechnol       Date:  2011-06-15       Impact factor: 19.536

4.  BAGEL2: mining for bacteriocins in genomic data.

Authors:  Anne de Jong; Auke J van Heel; Jan Kok; Oscar P Kuipers
Journal:  Nucleic Acids Res       Date:  2010-05-12       Impact factor: 16.971

5.  AMPer: a database and an automated discovery tool for antimicrobial peptides.

Authors:  Christopher D Fjell; Robert E W Hancock; Artem Cherkasov
Journal:  Bioinformatics       Date:  2007-03-06       Impact factor: 6.937

6.  The Pfam protein families database.

Authors:  Marco Punta; Penny C Coggill; Ruth Y Eberhardt; Jaina Mistry; John Tate; Chris Boursnell; Ningze Pang; Kristoffer Forslund; Goran Ceric; Jody Clements; Andreas Heger; Liisa Holm; Erik L L Sonnhammer; Sean R Eddy; Alex Bateman; Robert D Finn
Journal:  Nucleic Acids Res       Date:  2011-11-29       Impact factor: 16.971

7.  Data growth and its impact on the SCOP database: new developments.

Authors:  Antonina Andreeva; Dave Howorth; John-Marc Chandonia; Steven E Brenner; Tim J P Hubbard; Cyrus Chothia; Alexey G Murzin
Journal:  Nucleic Acids Res       Date:  2007-11-13       Impact factor: 16.971

8.  APD2: the updated antimicrobial peptide database and its application in peptide design.

Authors:  Guangshun Wang; Xia Li; Zhe Wang
Journal:  Nucleic Acids Res       Date:  2008-10-28       Impact factor: 16.971

9.  PhytAMP: a database dedicated to antimicrobial plant peptides.

Authors:  Riadh Hammami; Jeannette Ben Hamida; Gérard Vergoten; Ismail Fliss
Journal:  Nucleic Acids Res       Date:  2008-10-04       Impact factor: 16.971

10.  BACTIBASE second release: a database and tool platform for bacteriocin characterization.

Authors:  Riadh Hammami; Abdelmajid Zouhir; Christophe Le Lay; Jeannette Ben Hamida; Ismail Fliss
Journal:  BMC Microbiol       Date:  2010-01-27       Impact factor: 3.605

View more
  36 in total

1.  Collection of antimicrobial peptides database and its derivatives: Applications and beyond.

Authors:  Faiza Hanif Waghu; Susan Idicula-Thomas
Journal:  Protein Sci       Date:  2019-09-30       Impact factor: 6.725

2.  dbAMP: an integrated resource for exploring antimicrobial peptides with functional activities and physicochemical properties on transcriptome and proteome data.

Authors:  Jhih-Hua Jhong; Yu-Hsiang Chi; Wen-Chi Li; Tsai-Hsuan Lin; Kai-Yao Huang; Tzong-Yi Lee
Journal:  Nucleic Acids Res       Date:  2019-01-08       Impact factor: 16.971

3.  Identification of Antimicrobial Peptides from Novel Lactobacillus fermentum Strain.

Authors:  Anna S Pavlova; Georgii D Ozhegov; Georgij P Arapidi; Ivan O Butenko; Eduard S Fomin; Nikolai A Alemasov; Dmitry A Afonnikov; Dina R Yarullina; Vadim T Ivanov; Vadim M Govorun; Airat R Kayumov
Journal:  Protein J       Date:  2020-02       Impact factor: 2.371

4.  APPTEST is a novel protocol for the automatic prediction of peptide tertiary structures.

Authors:  Patrick Brendan Timmons; Chandralal M Hewage
Journal:  Brief Bioinform       Date:  2021-11-05       Impact factor: 11.622

5.  Machine Learning Prediction of Antimicrobial Peptides.

Authors:  Guangshun Wang; Iosif I Vaisman; Monique L van Hoek
Journal:  Methods Mol Biol       Date:  2022

6.  Empirical comparison of web-based antimicrobial peptide prediction tools.

Authors:  Musa Nur Gabere; William Stafford Noble
Journal:  Bioinformatics       Date:  2017-07-01       Impact factor: 6.937

Review 7.  Synthetic Biology and Computer-Based Frameworks for Antimicrobial Peptide Discovery.

Authors:  Marcelo D T Torres; Jicong Cao; Octavio L Franco; Timothy K Lu; Cesar de la Fuente-Nunez
Journal:  ACS Nano       Date:  2021-02-04       Impact factor: 15.881

8.  PredAPP: Predicting Anti-Parasitic Peptides with Undersampling and Ensemble Approaches.

Authors:  Wei Zhang; Enhua Xia; Ruyu Dai; Wending Tang; Yannan Bin; Junfeng Xia
Journal:  Interdiscip Sci       Date:  2021-10-04       Impact factor: 2.233

9.  Prediction and Activity of a Cationic α-Helix Antimicrobial Peptide ZM-804 from Maize.

Authors:  Mohamed F Hassan; Abdelrahman M Qutb; Wubei Dong
Journal:  Int J Mol Sci       Date:  2021-03-05       Impact factor: 5.923

10.  Machine learning designs non-hemolytic antimicrobial peptides.

Authors:  Alice Capecchi; Xingguang Cai; Hippolyte Personne; Thilo Köhler; Christian van Delden; Jean-Louis Reymond
Journal:  Chem Sci       Date:  2021-06-07       Impact factor: 9.825

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.