Literature DB >> 24730612

Prediction of linear cationic antimicrobial peptides based on characteristics responsible for their interaction with the membranes.

Boris Vishnepolsky¹, Malak Pirtskhalava.

Abstract

Most available antimicrobial peptides (AMP) prediction methods use common approach for different classes of AMP. Contrary to available approaches, we suggest that a strategy of prediction should be based on the fact that there are several kinds of AMP that vary in mechanisms of action, structure, mode of interaction with membrane, etc. According to our suggestion for each kind of AMP, a particular approach has to be developed in order to get high efficacy. Consequently, in this paper, a particular but the biggest class of AMP, linear cationic antimicrobial peptides (LCAP), has been considered and a newly developed simple method of LCAP prediction described. The aim of this study is the development of a simple method of discrimination of AMP from non-AMP, the efficiency of which will be determined by efficiencies of selected descriptors only and comparison the results of the discrimination procedure with the results obtained by more complicated discriminative methods. As descriptors the physicochemical characteristics responsible for capability of the peptide to interact with an anionic membrane were considered. The following characteristics such as hydrophobicity, amphiphaticity, location of the peptide in relation to membrane, charge density, propensities to disordered structure and aggregation were studied. On the basis of these characteristics, a new simple algorithm of prediction is developed and evaluation of efficacies of the characteristics as descriptors performed. The results show that three descriptors, hydrophobic moment, charge density and location of the peptide along the membranes, can be used as discriminators of LCAPs. For the training set, our method gives the same level of accuracy as more complicated machine learning approaches offered as CAMP database service tools. For the test set accuracy obtained by our method gives even higher value than the one obtained by CAMP prediction tools. The AMP prediction tool based on the considered method is available at http://www.biomedicine.org.ge/dbaasp/.

Entities: Chemical Disease Gene

Mesh：

Substances：

Year: 2014 PMID： 24730612 PMCID： PMC4038373 DOI： 10.1021/ci4007003

Source DB: PubMed Journal: J Chem Inf Model ISSN： 1549-9596 Impact factor: 4.956

Introduction

Antimicrobial peptides (AMP) are small peptides of low length, which interact with the bacterial cells and kill them. The great interest in these proteins is explained by their possible use for clinical purposes as a substitute for conventional antibiotics when resistance takes place.[1] Most of AMP act directly on the bacterial membrane, consequently it is difficult for bacteria to develop immunity against antimicrobial peptides.[2] Recently, there has been a large number of both theoretical and experimental studies that were focused on the properties of AMP, their mechanism of action and the design of novel peptides (see, for example, ref (3)). Of particular interest are in silico methods of research of AMP that allow the capability to both predict the antimicrobial activity of the peptides based on their sequence and to serve as the first step to design new antimicrobial peptides. Methods for predicting AMP are based on some general properties that distinguish AMP from similar peptides that do not have antimicrobial activity. Available prediction methods are generally based on discriminative analysis and essentially machine learning methods.[4−12] These methods, as a positive training set, have used a full set of antimicrobial peptide sequences, not taking into account variation in mechanisms of action, structure, mode of interaction with membrane and other differences. Contrary to available approaches, we think that strategy of prediction should be based on the fact that there are at least four kinds of AMPs for which four independent algorithms of prediction have to be developed in order to get high efficacy. For these four types of AMPs, we can consider: linear cationic antimicrobial peptides (LCAP), cationic peptides stabilizing structure by interchain covalent bond (CCP), peptides rich in proline and arginine (PRP) and anionic antimicrobial peptides (AAP). Cationic antimicrobial peptides (CAP) of LCAP type, in addition to positive charge and amphiphilicity, possess simple mechanism of structure stabilization in membrane, hydrogen bonding only.[13,14] Absence of any other stabilization factors gives possibilities to determine the forces governing peptide–lipid or peptide–peptide interactions and predict structure of the peptides in water and membrane environment on a base of only sequence information. Consequently, quantitative characteristics for prediction would be easily revealed on the basis of sequence only. Structures of CAP of CCP type due to interchain bonds are more stable and structurally complicated both in water and in membrane environments. But despite the fact that the forces governing CCP membrane or CCP–CCP interactions are identical to the case of LCAP, complicated 3D structure and lack of information about 3D structure require a principally different approach for the development of CCP prediction algorithm. It is known that peptides of PRP type are penetrating. In other words, they do not destabilize membranes and as a rule, have a target inside cell.[15,16] It is clear that the development of the algorithm for the prediction of CAP of PRP type requires a peculiar approach. For AMP of AAP type, the mode of action principally differs from CAP and the development of the algorithm for prediction AAP indeed requires its own approach. In this work is considered CAP of LCAP type only. According to the available databases,[17] this is the biggest class of antimicrobial peptides. Prediction accuracy is largely determined by the set of descriptors that can be used in prediction. Most current methods use a large number of characteristics for AMP prediction, using their optimization by machine learning methods, such as artificial neuron networks (ANN) and support vector machines (SVM).[4−12] Meanwhile, the influence of the individual characteristics on the AMP prediction is studied much less extensively. In this paper, we describe the influence of the characteristics that may be responsible for the prediction of LCAP on the basis of their basic function–interaction with the bacterial membrane. There are a large number of proteins that interact with the membrane also and so resemble AMP in this regard. For instance, the so-called transmembrane proteins are generally inserted into the membrane but without destroying it. It is clear that a selection pressure on sequence random variation directs evolution of peptides with particular function (for instance, transmembrane protein fragments (TMP), LCAP, etc.). So, in order to determine what characteristics efficiently distinguish LCAP from other peptides (other membrane-interactive or nonfunctional (random)), we think that it is reasonable to make comparative analysis of sequences of the three sets of peptides: LCAP, TMP and randomly selected fragments from the soluble proteins (RFP). This work concerns just the comparative analysis of LCAP, TMP and RFP sequences. Consequently, an attempt to reveal that characteristics that can discriminate antimicrobial peptides from both soluble nonmembrane proteins and transmembrane proteins (or fragments of membrane proteins) has been done. Taking into account the structure of the bacterial membrane, which is an anionic lipid bilayer, amphiphatic in nature, it can be assumed that, for discriminators, the following characteristics are convenient: (1) hydrophobicity, (2) amphiphaticity, (3) charge density, (4) propensity to the aggregation and (5) propensity to disordering. We think that just the values of these characteristics are responsible for: (a) capability of the peptide to interact with an anionic membrane and (b) the results of interaction (mechanisms of action). Quantitative estimation of all the characteristics except amphiphaticity requires information on amino acid sequences of the peptides only. Amphiphaticity in addition needs three-dimensional structure information. The exact three-dimensional structure of most linier antimicrobial peptides is unknown. But in the case of linier peptides, based on the theory of Wimly and White[13,14] and the fact that all transmembrane domains of membrane proteins consist mainly of regular secondary structure elements (α-helices or β-sheets saturated with hydrogen bonds), we can assume that the membrane environment will impel the peptide to regular conformation. So, we are motivated estimate in regular structure approximation and evaluate the hydrophobic moment of LCAP in order to see whether the hydrophobic moment can be a good discriminator and which regular structure is more suitable for effective discrimination. There are various statistical approaches for the prediction of AMP that take into account a number of different characteristics.[4,5,11,18−23] In this paper, our goal is (a) to develop the simplest method of discrimination (based on threshold value only) of AMP from non-AMP, efficiency of which will be determined by efficiencies of selected descriptors only and (b) to compare the results of the discrimination procedure with the results obtained by more refined and complicated discriminative methods such as SVM, ANN, etc.

Methods

Benchmarks

Training Sets

For the analysis of the characteristics, the following benchmarks were selected: set for LCAP, set for randomly selected fragments from the soluble proteins and set of fragments from transmembrane proteins. The LCAP set was selected from APD2 database[17] and consists of 1083 peptides (positive set). To estimate the discriminative efficiency of characteristics, a set of nonantimicrobial peptides has been required. Because there is a small number of peptides with experimentally verified no antimicrobial activity,[5] we have used a voluminous set of random sequences; in other words, a set of sequences with a great variety of functions. So, the last set can be considered as a nonfunctional set on average, as well as nonantimicrobial (negative set). The set of random sequences was selected from an UniProt using the filters: non-AMP, non-membrane and non-secretory proteins. Three such sets were used. The first set (RFP10000) was used for optimizing parameters for various descriptors and consists of randomly chosen fragments in the amount of 10 000 for each length of peptides from 4 to 50 amino acids. The other sets were used for the estimation of the descriptors by receiver operating characteristic (ROC) curves. 500 (for RFP500) and 10 (for RFP10) randomly selected fragments from globular proteins with lengths corresponding to each peptide from LCAP set have been included into these sets. The last set was used for comparing our results with other available prediction tools. For membrane proteins, a full set of transmembrane (helices) fragments of more than 11 residues from database of transmembrane proteins PDB-TM[24,25] was chosen. This set contains 1691 sequences (TMP set).

Test Sets

Two test sets were used for the evaluation of AMP descriptors. The first test set, compiled on the basis of CAMP[11] predicted data set, contained 1153 sequences identified as antimicrobial based on the evidence of similarity or annotations in NCBI as “antimicrobial regions”, without experimental evidence. After eliminating sequences: containing nonstandard amino acids; disulfide bonds; having full negative charges; with the length of more than 50 amino acids and rich in Pro and Arg, only 98 sequences were left (TPS1). TPS1 will serve as an independent positive test data set. An additional test set was obtained from DBAASP database (http://www.biomedicine.org.ge/dbaasp) (TPS2). Only experimentally validated peptides with AMP activity have been included in this set. After peptides that were found in the training LCAP set were excluded, the above-mentioned conditions proposed for TPS1 and peptides with more than 80% homology, the TPS2 set contains 174 peptides. As mentioned above, we could not use any additional independent sample as an independent negative test set. So, we have used RFP10 as a negative data set for the evaluation of the accuracy for the selected descriptors.

Optimization of the Parameters Defining the Characteristics of AMP

There is evidence, especially for disulfide-bounded AMP, that despite their short length, they are unions of functional (structural) blocs.[26] So, we can propose that linear peptides are arranged in bloc principle also and not all the considered peptide, but part of it can participate in the interaction with the membrane. Accordingly, for each peptide, the descriptors were calculated for all fragments (windows) of a certain length and peptides are characterized by the particular fragment selected on certain criteria. The values of different descriptors, in most cases, depend on various parameters, such as length of the fragment for which considered characteristic for the peptide is computed, hydrophobicity scale (see below), etc. It is necessary to choose optimal parameters for descriptors on the base of the LCAP set. Optimization of the descriptors was made by the requirement of increasing the ratio (percent) of the peptides for which the probability of appearance of their sequence as a result of random normal process is less than P. The value of P was determined by z-score. That is, for each peptide’s particular descriptor, its own z-score is defined as zpd (where d is hydrophobicity, hydrophobic moment and other descriptors, p corresponds to certain peptide defined by its sequence). Main criterion of optimality (MOC) of descriptors was the maximality of the number of peptides from the LCAP set having z-score zpd >2. The exception was a location of the peptide in relation to membrane, for which optimization has been made differently (see below).

Hydrophobicity

The AMP overall hydrophobicity, defined as the sum of transfer (from water into the hydrophobic environment) energy of the residue (hydrophobicity), can be used as an AMP characteristic. In the literature, there is a large number of papers[27−32] that define transfer energies of the amino acids (hydrophobicity scales). The values of the transfer energies in these scales depend on the method of determination and differ from scale to scale. Therefore, the hydrophobicity scale can be used as an optimization parameter for assessing the suitability of the hydrophobicity as AMP characteristics. The following hydrophobicity scales were considered: KD,[27] WW,[28] UHC,[29] Hes,[30] EG[31] and MF.[32] For each peptide, hydrophobicity was calculated for all fragments of a certain length and the peptide characteristic was defined by the fragment of the highest hydrophobicity. Therefore, peptide fragment length can be the other optimization parameter. The optimal length and hydrophobicity scale were chosen by MOC. Fragment length was varied within the range of 4–50 residues. Moreover, if the peptide length was less than the length of the considered fragment, hydrophobicity was computed for the full peptide. A similar method for optimizing the fragment length was used for the other descriptors.

Amphipathicity

One of the main features of antimicrobial peptides is their amphipathicity.[33] The separation of hydrophobic and hydrophilic regions in these peptides can be realized in one of the two ways: due to the internal 3D structure and by the linear separation that is due to the uneven distribution of hydrophobic and hydrophilic residues along the peptide chain. Accordingly, two characteristics were used for the evaluation of amphipathicity: hydrophobic moment[34] and linear hydrophobic moment (see below).

Hydrophobic Moment

Hydrophobic moment was estimated by Eisenberg:[34]where μ is hydrophobic moment of the peptide, contained N amino acids, h is the numerical hydrophobicity of the nth residue, and ϑ is turn of the residue along the helix axis. According to the formula the existence of regular conformation is assumed. As mentioned above LCAP in membrane environment is likely to have regular secondary structure. So, ϑ is used as a parameter that determines hydrophobic moment. The last parameter was used as optimization parameter and varied from 60 to 180°. Hydrophobicity scale and fragment length were also used as optimization parameters. Optimization of the parameters was carried out by MOC.

Linear Hydrophobic Moment

As mentioned above, separation of the hydrophobic and hydrophilic parts may also be carried out due to an uneven distribution of hydrophobic and hydrophilic residues along the peptide chain. To estimate the separation along the chain, we have introduced the characteristic “linear hydrophobic moment”, which is defined as follows:whereHere D is the distance between the centers of hydrophobic and hydrophilic parts of the considered fragment of length N; k = 1, N, h+ and h– are the transfer energies of the k-th residue from water to the hydrophobic environment under the conditions that h+ > 0 corresponds to hydrophobic residue and h– < 0 corresponds to hydrophilic residue. Hydrophobicity scale and fragment length were used as optimization parameters. Optimization of the parameters was carried out by MOC.

Charge Density

Cationic antimicrobial peptides at neutral pH have a positive charge due to the large percentage of Lys and Arg, which facilitates them to interact with the negatively charged membrane. So, it is natural to assume that the charge of the peptide can be considered as a characteristic of LCAP. Because electrostatic interaction is long-term, we think that the net charge of the whole peptides determines the results of interaction with membrane. So the charge was calculated for the entire peptide. Charge density determined as full charge divided by the peptide molecular weight was used as the AMP descriptor. Initially, for the charge descriptor, full net charge normalized on the peptide length was used, but after suggestion from one of the reviewers, we have found out that charge density determined as full charge divided by the peptide molecular weight gives better discrimination AMP from non-AMP and so we have used charge density as the AMP descriptor.

Location of the Peptide in Relation to Membrane (LPM)

Mechanism of action of AMP largely depends on their energetically most favorable location within the membrane bilayer. Taking into account the fact that the majority of the LCAP peptides has an α-helical conformation in the membrane environment (see above and the Results section), LPM was described by the penetration depth (d), i.e., distance of the geometrical center of peptide helix from membrane surface and angle (θ) between peptide helix axis and perpendicular to the membrane surface. It would be interesting to explore the possibility of using d and θ as discriminators to distinguish antimicrobial from nonantimicrobial peptides. In contrast to the previous LCAP characteristics, location of the LCAP within the bilayer is an integrated feature that will largely depend on the other previously considered characteristics (hydrophobicity, amphipathicity, charge density). To calculate d and θ, the hydrophobic potential designed by Senes et al.,[35] which represents the energy difference between the residue in water and within the bilayer at a given depth, is used. All calculations of LPM were performed for different fragment (window) lengths and the peptide characteristic was defined by the fragment with minimal energy. Another (different from MOC) approach was used for the optimization of d and θ. The approach is based on receiver operating characteristic (ROC) curve analysis. ROC curve, which represents the dependence of sensitivity (Sn) (y-axis) versus 1 – specificity (Sp) (x-axis) was used for quantifying differences of LCAP from membrane proteins and soluble protein fragments. RFP500 and TMP were used as the negative sets. For each LPM, the area under the ROC curve AUC was calculated, defined as AUC_R, relative to the RFP500 negative set and as AUC_T, relative to the TMP negative set. As mentioned above, peptide fragment length varied and so it was used as an optimization parameter. The maximization (AUC_R + AUC_T) value was used to optimize LPM (d and θ). During optimization, d and θ vary from 0 to 30 Å and 0–180°, respectively and were used as optimization parameters for LPM. Other variables δ(d) and δ(θ) (for each kth d and θ) were used for plotting ROC curve also. For each of the values, d and θ, δ(d) and δ(θ) varied and for ith their, values δ(d) and δ(θ), intervals dk ± δ(d), θ ± δ(θ), i.e., ith area on the (d, θ) plane is determined. The number of peptides from the positive data sets with energetically most favorable depth and orientation lying within the interval d ± δ(d), θ ± δ(θ) determines sensitivity Sn and the number of peptides from the negative data sets with energetically most favorable depth and orientation lying within the same interval (d ± δ(d), θ ± δ(θ)) determines specificity Sp. Sn and Sp give ith point of the ROC curve. For each length, d and θ ROC curves and consequently (AUC_R + AUC_T) values were calculated and maxima among the calculated values correspond to optimums of length, d and θ.

Disordering

It is reasonable to consider such short cationic peptides as LCAP disordered in water environments. Indeed, there is experimentally proved data for many LCAP showing disordered structure in water environments.[36] It is interesting to mention, whether the disordering connected with the short length only, or other causes for structure destabilization exist (for example, total positive charge). Uversky[37] investigated disordered protein and concluded that disordered protein can be predicted on the basis of the estimation of hydrophobic/charge (h/r) ratio. As the LCAPs are characterized by very peculiar balance between hydrophobic and positively charged residues, we think that it will be interesting to estimate if the h/r ratio can be the cause of the disordered structure of LCAP in water environment according to the Uversky’s rule. So, we assume that it is interesting to estimate efficiency of the Uversky’s relations as discriminator. According to Uversky’s formula, the degree of disordering of globular protein under physiological conditions is defined by the relationwhere ⟨H⟩ is the average hydrophobicity of the protein and ⟨R⟩ its charge. Negative values of S correspond to the protein to be disorded.

Aggregation Propensity

We have used two descriptors for aggregation propensity; aggregation in solution (in vitro aggregation) and aggregation in bacteria membrane (in vivo aggregation). In vitro aggregation propensity evaluation was made by employing the TANGO software.[38] Tango counts the partition function of the conformational phase space assuming that every segment on the protein populates one state: random coil, β-turn, α-helix, α-helix aggregation and β-sheet aggregation. Therefore, TANGO software can predict aggregation in solution, considering only structural parameters defined by the peptide sequence. In vivo aggregation was propensity calculated using AGGRESCAN, an algorithm based on an amino acid aggregation-propensity scale derived from in vivo experiments and on the assumption that short and specific sequence stretches modulate protein aggregation. The algorithm can actually predict the aggregation propensity of peptides in the presence of cell material.[39]

Evaluation of the Efficiency of Characteristics

Receiver operating characteristic (ROC) curves were used to evaluate the effectiveness of various characteristics. Each point, i, of the ROC curve corresponds to values of sensitivity and specificity (Sn and Sp), which are calculated for the variable z(where z changes from min(zpd) to max(zpd) with step 0.1). The number of peptides from the positive data set (LCAP) with z-score zpd > z (for hydrophobic moment, charge density and linear hydrophobic moment) and z-score zpd < z for (hydrophobicity and disordering) determines Sn and the number of peptides from the negative data sets (RFP500 for ROC_R and TMP for ROC_T) with z-score zpd > z (for hydrophobic moment, charge density and linear hydrophobic moment) and z-score zpd < z (for hydrophobicity and disordering) determines Sp. Quantitative evaluations of the effectiveness of the characteristics are made on the basis of area under the ROC curve.

Evaluation of the Prediction Quality

A threshold for each characteristic was evaluated and the prediction of the existence of antimicrobial activity of the peptide was done on the basis of it. The threshold is determined by a point on the ROC curve closest to the point (0,1). Sensitivity, specificity and accuracy for the thresholds have been evaluated. The following equations were used for the prediction quality:where Sn is the sensitivity, Sp the specificity, BAC the balanced accuracy and AC the accuracy. The calculation of balanced accuracy is used for the evaluation of the prediction quality because the negative sets contain more peptides than the positive ones and the balanced accuracy reflects equal influence of positive and negative sets irrespective of the number of contained peptides in them.

Results and Discussion

Optimization of the Descriptors

The following descriptors were considered: hydrophobic moment, charge density, location of the peptide in relation to membrane (LPM), linear hydrophobic moment, disordering and propensities to in vitro and in vivo aggregation. The optimization of the most descriptors (except LPM) has been made by MOC criterion (see the Methods section). The corresponding data are given in Table 1. For normal (random) distribution, the probability that z-score > 2 is equal to 0.02. So, on the basis of the obtained results, we can say that for all optimized descriptors probabilities that z-score > 2 are higher than expected from fully random processes. It means we can assume some kind of selection pressure on sequence random variation.

Table 1

Optimal Parameters for Different Descriptorsa

	hydrophobicity scale	fragment length	angle ϑ (deg)	MOCb No.	MOCb %	d	θ	AUC_ R	AUC_T
hydrophobic moment	MF	24	96	679	62.70
hydrophobicity	KD	21		258	22.30
linear hydrophobic moment	EG	31		119	10.00
LPM		17				12.9	81.0	0.76	0.78

For charge density, disordering, propensity to aggregation in vitro and in vivo parameters optimization was not carried out.

A number (No.) and percent (%) of the peptides from the LCAP set, which satisfy MOC criterion.

For charge density, disordering, propensity to aggregation in vitro and in vivo parameters optimization was not carried out. A number (No.) and percent (%) of the peptides from the LCAP set, which satisfy MOC criterion. For hydrophobic moment, for example, the optimal value of the turn (ϑ) of residue (ϑ varied from 60° to 180°) in regular structure approximation is 96°, which shows that the optimal secondary structure for discrimination LCAP from Non-AMP is an α-helix. This result can be expected. For LPM, another criterion of optimality (different from MOC) was used that was based on d and θ distributions in the considerable set (see the Methods section). Assuming the peptide is α-helical (see above) for each peptide, we can calculate the energetically most favorable location (d and θ) of the peptide fragments of the particular length in the membrane. Figure 1a shows a plot of relative density of the orientation and the depth of energetically most favorable fragments of length 17 on the basis of the three considered sets (LCAP, RFP500 and TMP).

Figure 1

Plots of relative density of the orientation (θ) and the depth (d) of energetically most favorable fragments of length 17 on the basis of the (A) LCAP, (B) TMP and (C) RFP500 set. The values of the density are given relative to the densities of uniform distribution. From Figure 1a, it is clear that the orientation distribution of peptides in different sets varies from each other. For most LCAP, the more energetically favorable depth is within 8–15 Å, which corresponds to the boundary between the interface site and the hydrophobic core of the membrane (Figure 1a). It also shows that most of the peptides are located at a relatively small angle to the membrane surface (θ ∼ 90°). These results are consistent with experimental data, according to which most of the CAP penetrate into the membrane at a shallow depth parallel to the membrane surface.[40] Maximum density on the (d, θ) plot for the LCAP set is higher than for the other peptide sets (RFP500 and TMP). From Figure 1b, it can be seen that it is energetically more favorable for the membrane proteins to penetrate more deeply into the membrane (d = 2–10 Å). At the same time, peptides from the data set of random protein fragments are located closer to the membrane surface (d = 12–30 Å; see Figure 1c). Peptides from the later (last) set are distributed on the (d, θ) plot less densely than from the other peptide sets. On the basis of these data, we have decided to use the receiver operating characteristic (ROC) curve to quantify differences of LCAP from membrane proteins and soluble proteins fragments (see the Methods section). Calculations have shown that the optimal values of the penetration depth and angle (d and θ) are 12.9 Å and 81° at a fragment length of 17 amino acids. The optimal AUC_R and AUC_T values (see the Methods section) for the two data sets (RFP500 and TMP) at the same time are 0.76 and 0.78, respectively. Therefore, we concluded that the location of the peptide in relation to membrane can be used as a descriptor to distinguish linear cationic antimicrobial peptides from other peptides.

Evaluation of the Efficiency of LCAP Prediction

Receiver operating characteristic (ROC) curves were used to evaluate the effectiveness of various characteristics for LCAP prediction. ROC curves, plotted for each characteristic, are shown in Figures 2–10. Quantitative evaluation of effectiveness of the characteristics is made on the basis of the following quantities: (a) area under the ROC curve (defined as AUC_R relative to the RFP500 negative set and defined as AUC_T relative to the TMP negative set) and (b) a threshold for each characteristic by which prediction of peptide antimicrobiality will be done. A threshold is determined by a point on the ROC curve closest to the point (0,1). Sensitivity, specificity and balanced accuracy for the thresholds have been evaluated.

Figure 2

Figure 10

ROC curves for evaluation prediction quality of propensity to aggregation in vitro for training sets: ROC_R corresponds to positive LCAP and negative RFP500 sets and ROC_T corresponds to positive LCAP and negative TMP sets.

ROC curves for evaluation prediction quality of linear hydrophobic moment for training sets: ROC_R corresponds to positive LCAP and negative RFP500 sets and ROC_T corresponds to positive LCAP and negative TMP sets. ROC curves for evaluation prediction quality of propensity to aggregation in vitro for training sets: ROC_R corresponds to positive LCAP and negative RFP500 sets and ROC_T corresponds to positive LCAP and negative TMP sets. ROC curves for evaluation prediction quality of disordering for training sets: ROC_R corresponds to positive LCAP and negative RFP500 sets and ROC_T corresponds to positive LCAP and negative TMP sets. ROC curve for evaluation prediction quality of the transemembrane helixes of linear hydrophobic moment (ROC_RM) for TMP and RFP500 sets. ROC curves for evaluation prediction quality of hydrophobic moment for training sets: ROC_R corresponds to positive LCAP and negative RFP500 sets and ROC_T corresponds to positive LCAP and negative TMP sets. ROC curves for the evaluation of prediction quality of charge density for training sets: ROC_R corresponds to positive LCAP and negative RFP500 sets and ROC_T corresponds to positive LCAP and negative TMP sets. ROC curves for evaluation prediction quality of location of the peptide along the membrane (LPM) for training sets: ROC_R corresponds to positive LCAP and negative RFP500 sets and ROC_T corresponds to positive LCAP and negative TMP sets. ROC curves for evaluation prediction quality of hydrophobicity for training sets: ROC_R corresponds to positive LCAP and negative RFP500 sets and ROC_T corresponds to positive LCAP and negative TMP sets. ROC curves for evaluation prediction quality of propensity to aggregation in vitro for training sets: ROC_R corresponds to positive LCAP and negative RFP500 sets and ROC_T corresponds to positive LCAP and negative TMP sets. Varying z from min(zpd) to max(zpd) and based on the assumption that the values of the descriptors must be higher for LCAP than for non-AMP (as in the case of hydrophobic moment, charge density, linear hydrophobic moment, propensities to aggregation in vitro and in vivo), condition of zpd > z was used to calculate sensitivity and specificity (Sn and Sp) that is ith point of the ROC curve. When we assumed that the values of the descriptors must be less for LCAP than for non-AMP (as in the case of hydrophobicity and disordering), the condition zpd < z was used to calculate sensitivity and specificity (Sn and Sp) that is ith point of the ROC curve. If the assumption is true that the value of AUC for each descriptor will be higher than 0.5. It can be noted that the higher the value of AUC, the better the descriptor discriminates AMP from non-AMP. The value of AUC for good descriptors must be no less than 0.7. But as we can see, our results show that the values of AUC_R for linear moment and in vitro aggregation are close to 0.5 (see Table 2 and Figures 2 and 3) and for disordered even less than 0.5 (see Table 2 and Figure 4). It means that the last characteristics cannot distinguish antimicrobial from nonantimicrobial peptides. The low value of AUC_R = 0.56 for the linear moment suggests that for the most antimicrobial peptides, there is no significant linear separation of hydrophobic and hydrophilic residues along the peptide chain. On the other hand, the ROC curve plotted for linear moment of TMP set relative to the negative set RFP500 (ROC_RM (Figure 5)) gives the value 0.73 for the area under the ROC curve (AUC_RM = 0.73). These differences between the values of AUC_R and AUC_RM can be explained by the fact that in contrast to antimicrobial peptides, in the transmembrane peptides, linear separation of hydrophobic and hydrophilic group of residues occurs. Such separation was revealed by other authors[41,42] also, who supposed that amphyphilic residues are concentrated at the ends of the transmembrane helix while hydrophobic residues are located in the middle.

Table 2

Comparison of the Different Descriptors for Training Set (LCAP and RFP500)

	AUC_R*100	AUC_T *100	R_mina*100	S_n*100	S_p*100	BAC*100
hydrophobic moment	88.63	92.01	23.32	80.79	86.77	83.78
charge	90.29	97.61	22.99	86.24	81.58	83.91
LPM	76.12	78.32	37.38	76.36	68.12	72.24
hydrophobicity	71.18	11.09	47.96	64.27	69.15	66.14
linear hydrophobic moment	56.09	33.37	68.65	40.63	65.54	53.08
disordering	46.59	93.75	74.96	50.97	43.30	47.13
in vitro aggregationb	57.41	4.34	64.23	46.63	64.26	55.45
in vivo aggregationc	75.38	7.87	40.57	75.81	67.43	71.62

Distance from the point (0,1) to the point on the ROC curve closest to the point (0,1).

TANGO AGG index.

AGGERSCAN Na4vSS index.

Figure 3

Figure 4

ROC curves for evaluation prediction quality of disordering for training sets: ROC_R corresponds to positive LCAP and negative RFP500 sets and ROC_T corresponds to positive LCAP and negative TMP sets.

Figure 5

ROC curve for evaluation prediction quality of the transemembrane helixes of linear hydrophobic moment (ROC_RM) for TMP and RFP500 sets.

Distance from the point (0,1) to the point on the ROC curve closest to the point (0,1). TANGO AGG index. AGGERSCAN Na4vSS index. AUC_R value for disordered is 0.47, which is less than 0.5, and it can be said that according to the proposed by Uversky criteria,[37] antimicrobial peptides are more ordering than fragments of random proteins. This may be due to the fact that the Uversky criterion, which determines the degree of the disorder of globular proteins, is not suitable for the evaluation of the disorder of small peptides. In the case of hydrophobicity, hydrophobic moment, LPM and charge density and propensity to aggregation in vivo AUC_R > 0.7 (see Table 2 and Figures 6–10), which indicate that these characteristics can be used to distinguish antimicrobial from soluble nonantimicrobial peptides. For the hydrophobic moment and charge density, AUC_T > AUC_R. On the basis of this, we can suggest that if the value of the last characteristics can discriminate a peptide (potential LCAP) from the nonmembrane peptides, it should rather discriminate the peptide from the transmembrane peptides also. So, for these characteristics, only a single threshold defined from ROC_R can be used, because for this threshold, sensitivities are the same for ROC_R and ROC_T, but specificity and thus accuracy obtained from ROC_T is larger than from ROC_R. Consequently, LCAP peptides can be discriminated from the membrane peptides with accuracy obtained from the threshold defined from ROC_R only.

Figure 6

ROC curves for evaluation prediction quality of hydrophobic moment for training sets: ROC_R corresponds to positive LCAP and negative RFP500 sets and ROC_T corresponds to positive LCAP and negative TMP sets.

LPM was already optimized in such a way that the greater difference from the RFP500 and TMP sets was reached (see above). AUC_R for this descriptor is 0.76, so it can be used as a LCAP characteristic (see Table 2). Though, for the hydrophobicity, AUC_R = 0.71, but AUC_T = 0.11 < 0.5 (see Table 2), it means that the LCAP has a lower average hydrophobicity than the transmembrane helices, but greater than random fragments from the soluble proteins. Therefore, we cannot use single threshold to discriminate nonantimicrobial and antimicrobial peptides. Analogous results were obtained for the propensity to in vivo aggregation (AUC_R = 0.75, but AUC_T = 0.08 < 0.5). The question of AMP aggregation is difficult and unclear. There is speculation that AMP greatly differ in the predisposition to aggregation.[43] Our results confirm this speculation because various AMP peptides from the considered benchmarks great differ by aggregation index, especially for in vitro aggregation. A wide range of proposed mechanisms of AMP action can be explained by the fact that AMP behave differently in terms of the stability of their aggregates both in the membrane and in the aqueous environment. Our results (Table 1) show that propensity to in vitro aggregation does not discriminate AMP from non-AMP and propensity to in vivo aggregation discriminates AMP from non membrane non-AMP but does not do it from transemembrane non-AMP. So, we have not used these descriptors as discriminated LCAP characteristics. The highest values of AUC_R correspond to hydrophobic moment and charge density. Thus, we can suggest that these characteristics are the best separators between nonantimicrobial and antimicrobial peptides. So, three descriptors: hydrophobic moment, charge density and LPM were selected as the most effective LCAP descriptors. Given the above, it can be assumed that the combined use of these three characteristics can improve the prediction of LCAP. To combine these characteristics, we have taken into account the fact that for the separation of LCAP and non-AMP, specific set of threshold values that can be obtained from the analysis of ROC curves were used. Accordingly, by changing synchronously thresholds for different characteristics, we can simply optimize these thresholds to obtain the greatest accuracy. The corresponding balanced accuracy for the charge density alone, hydrophobic moment alone, hydrophobic moment and charge density together and for the three characteristics together in case of the training set (LCAP and RFP500) are 83.91, 83.78, 87.95 and 88.26, respectively (Tables 2 and 3) .

Table 3

Prediction Quality of Combined Use of Three Descriptors for Training Set (LCAP and RFP500)

	AUC_R*100	S_n*100	S_p*100	BAC*100
hydrophobic moment + charge	91.23	84.21	91.69	87.95
hydrophobic moment + charge + LPM	91.38	84.03	92.50	88.26

We have also tried to evaluate our results with other prediction methods. As we have mentioned above, several computational methods[4,5,11,18−23] have been proposed for the predicting AMPs. However, some methods[4,5,18] did not contain available web services for testing our data sets. BACTIBASE[19,20] and PhytAMP[21] methods were specifically designed for bacteriocin and plant, respectively. As for AntiBP[22] and AntiBP2 methods,[23] they were designed to identify the AMPs in a protein sequence, and hence could not be used to compare with our method. So, to make the comparison meaningful, our method was compared with CAMP method,[11] which was developed based on the random forests (RF), SVM, ANN and discriminant analysis (DA). This method can be used for the evaluation of the sensitivity, specificity and accuracy for the considered training positive set. As a set of nonantimicrobial peptides (negative set), we have used a set of 10 peptide fragments (instead of 500) for each peptide in the AMP set (RFP10). The corresponding balanced accuracy for three considered characteristics together when using this set is 88.52. For the same positive and negative sets, CAMP method gives the following values for the balanced accuracy: random forests (RF), 91.04; SVM, 90.51; discriminant analysis (DA), 89.81; ANN, 88.21 (Table 4).

Table 4

Prediction Quality for Different Methods for Sets LCAP and RFP10

	S_n*100	S_p*100	BAC*100	AC*100
hydrophobic moment	80.79	89.54	85.17	88.74
charge	86.24	74.78	80.51	75.83
hydrophobic moment + charge	84.21	92.36	88.23	89.61
hydrophobic moment + charge + LPM	84.03	93.00	88.52	90.20
SVM	93.44	87.57	90.51	88.11
RF	95.57	86.51	91.04	87.33
ANN	90.21	86.20	88.21	86.56
DA	92.89	86.72	89.81	87.28

For testing purposes, two independent positive sets (TPS1 and TPS2) (see the Methods section) were used. The results of comparison for test sets are shown in Table 5.

Table 5

Prediction Quality for Different Methods for Test Setsa

	TPS1 set			TPS2 set
	S_n*100	BAC*100	AC*100	S_n*100	BAC	AC*100
hydrophobic moment	80.61	85.08	88.82	86.78	88.18	89.49
charge	72.45	73.62	74.76	81.03	77.91	74.88
hydrophobic moment + charge	81.63	87.00	92.27	89.66	91.01	92.32
hydrophobic moment + charge + LPM	81.63	87.32	92.93	89.66	91.33	92.98
SVM	84.69	86.13	87.55	91.95	89.76	87.64
RF	81.63	84.07	86.47	93.68	90.10	86.62
ANN	83.67	84.93	86.17	89.66	87.93	86.25
DA	83.67	85.20	86.69	91.38	89.05	86.80

Specificities for test sets have been calculated for the RFP10 set (see Table 4).

Specificities for test sets have been calculated for the RFP10 set (see Table 4). We cannot use any additional independent test sets for non-AMP (negative set), so the RFP10 set, which was not employed for training purposes, was used as a negative test set. As we can see from Tables 4 and 5 the best prediction quality for the training and both test sets was obtained when all three descriptors were used together, although a pair of the descriptors (hydrophobic moment and charge density) gives very close results. We have also noted that for the test sets, the prediction quality (balanced accuracy) based on the hydrophobic moment and charge density gives better results than the one obtained from the all CAMP prediction algorithms. We want to emphasize the fact that the CAMP method uses a combination of numerous characteristics and more complicated, refined and effective discriminative methods.[11] High performance of our approach can be explained by the fact that we have used only one class of AMP, cationic linier peptides. Our results confirm the assumption that prediction of AMP is preferable to make for the peculiar class separately, using a particular approach in each case. The AMP prediction tool based on the considered method is included into the Database of Antimicrobial Activity and Structure of Peptides (DBAASP) and available at http://www.biomedicine.org.ge/dbaasp/.

43 in total

Review 1. The structure, dynamics and orientation of antimicrobial peptides in membranes by multidimensional solid-state NMR spectroscopy.

Authors: B Bechinger
Journal: Biochim Biophys Acta Date: 1999-12-15

2. Why are "natively unfolded" proteins unstructured under physiologic conditions?

Authors: V N Uversky; J R Gillespie; A L Fink
Journal: Proteins Date: 2000-11-15

Review 3. Role of membranes in the activities of antimicrobial cationic peptides.

Authors: Robert E W Hancock; Annett Rozek
Journal: FEMS Microbiol Lett Date: 2002-01-10 Impact factor: 2.742

4. Amphiphilicity index of polar amino acids as an aid in the characterization of amino acid preference at membrane-water interfaces.

Authors: Shigeki Mitaku; Takatsugu Hirokawa; Toshiyuki Tsuji
Journal: Bioinformatics Date: 2002-04 Impact factor: 6.937

Review 5. Mechanisms of antimicrobial peptide action and resistance.

Authors: Michael R Yeaman; Nannette Y Yount
Journal: Pharmacol Rev Date: 2003-03 Impact factor: 25.468

Review 6. Mode of action of membrane active antimicrobial peptides.

Authors: Yechiel Shai
Journal: Biopolymers Date: 2002 Impact factor: 2.505

7. Energetics, stability, and prediction of transmembrane helices.

Authors: S Jayasinghe; K Hristova; S H White
Journal: J Mol Biol Date: 2001-10-05 Impact factor: 5.469

8. Identification of crucial residues for the antibacterial activity of the proline-rich peptide, pyrrhocoricin.

Authors: Goran Kragol; Ralf Hoffmann; Michael A Chattergoon; Sandor Lovas; Mare Cudic; Philippe Bulet; Barry A Condie; K Johan Rosengren; Luis J Montaner; Laszlo Otvos
Journal: Eur J Biochem Date: 2002-09

9. PDBTM: Protein Data Bank of transmembrane proteins after 8 years.

Authors: Dániel Kozma; István Simon; Gábor E Tusnády
Journal: Nucleic Acids Res Date: 2012-11-30 Impact factor: 16.971

10. CAMP: Collection of sequences and structures of antimicrobial peptides.

Authors: Faiza Hanif Waghu; Lijin Gopi; Ram Shankar Barai; Pranay Ramteke; Bilal Nizami; Susan Idicula-Thomas
Journal: Nucleic Acids Res Date: 2013-11-21 Impact factor: 16.971

17 in total

1. Collection of antimicrobial peptides database and its derivatives: Applications and beyond.

Authors: Faiza Hanif Waghu; Susan Idicula-Thomas
Journal: Protein Sci Date: 2019-09-30 Impact factor: 6.725

2. Comparative analysis of machine learning algorithms on the microbial strain-specific AMP prediction.

Authors: Boris Vishnepolsky; Maya Grigolava; Grigol Managadze; Andrei Gabrielian; Alex Rosenthal; Darrell E Hurt; Michael Tartakovsky; Malak Pirtskhalava
Journal: Brief Bioinform Date: 2022-07-18 Impact factor: 13.994

3. Legume Plant Peptides as Sources of Novel Antimicrobial Molecules Against Human Pathogens.

Authors: Rui M Lima; Balaji Baburao Rathod; Hilda Tiricz; Dian H O Howan; Mohamad Anas Al Bouni; Sándor Jenei; Edit Tímár; Gabriella Endre; Gábor K Tóth; Éva Kondorosi
Journal: Front Mol Biosci Date: 2022-06-09

4. Imbalanced multi-label learning for identifying antimicrobial peptides and their functional types.

Authors: Weizhong Lin; Dong Xu
Journal: Bioinformatics Date: 2016-08-26 Impact factor: 6.937

5. Empirical comparison of web-based antimicrobial peptide prediction tools.

Authors: Musa Nur Gabere; William Stafford Noble
Journal: Bioinformatics Date: 2017-07-01 Impact factor: 6.937

6. Novel antimicrobial peptide discovery using machine learning and biophysical selection of minimal bacteriocin domains.

Authors: Francisco R Fields; Stefan D Freed; Katelyn E Carothers; Md Nafiz Hamid; Daniel E Hammers; Jessica N Ross; Veronica R Kalwajtys; Alejandro J Gonzalez; Andrew D Hildreth; Iddo Friedberg; Shaun W Lee
Journal: Drug Dev Res Date: 2019-09-04 Impact factor: 5.004

7. An Efficient Evaluation System Accelerates α-Helical Antimicrobial Peptide Discovery and Its Application to Global Human Genome Mining.

Authors: Licheng Liu; Caiyun Wang; Mengyue Zhang; Zixuan Zhang; Yingying Wu; Yixuan Zhang
Journal: Front Microbiol Date: 2022-04-25 Impact factor: 6.064

8. Myticalins: A Novel Multigenic Family of Linear, Cationic Antimicrobial Peptides from Marine Mussels (Mytilus spp.).

Authors: Gabriele Leoni; Andrea De Poli; Mario Mardirossian; Stefano Gambato; Fiorella Florian; Paola Venier; Daniel N Wilson; Alessandro Tossi; Alberto Pallavicini; Marco Gerdol
Journal: Mar Drugs Date: 2017-08-22 Impact factor: 5.118

Review 9. Physicochemical Features and Peculiarities of Interaction of AMP with the Membrane.

Authors: Malak Pirtskhalava; Boris Vishnepolsky; Maya Grigolava; Grigol Managadze
Journal: Pharmaceuticals (Basel) Date: 2021-05-17

10. DBAASP v.2: an enhanced database of structure and antimicrobial/cytotoxic activity of natural and synthetic peptides.

Authors: Malak Pirtskhalava; Andrei Gabrielian; Phillip Cruz; Hannah L Griggs; R Burke Squires; Darrell E Hurt; Maia Grigolava; Mindia Chubinidze; George Gogoladze; Boris Vishnepolsky; Vsevolod Alekseyev; Alex Rosenthal; Michael Tartakovsky
Journal: Nucleic Acids Res Date: 2015-11-17 Impact factor: 16.971