Literature DB >> 23275694

AmylPepPred: Amyloidogenic Peptide Prediction tool.

Smitha Sunil Kumaran Nair¹, Nv Subba Reddy, Ks Hareesha.

Abstract

UNLABELLED: We present an efficient computational architecture designed using supervised machine learning model to predict amyloid fibril forming protein segments, named AmylPepPred. The proposed prediction model is based on bio-physio-chemical properties of primary sequences and auto-correlation function of their amino acid indices. AmylPepPred provides a user friendly web interface for the researchers to easily observe the fibril forming and non-fibril forming hexmers in a given protein sequence. We expect that this stratagem will be highly encouraging in discovering fibril forming regions in proteins thereby benefit in finding therapeutic agents that specifically aim these sequences for the inhibition and cure of amyloid illnesses. AVAILABILITY: AmylPepPred is available freely for academic use at www.zoommicro.in/amylpeppred.

Entities: Disease

Keywords: AmylPepPred; Amyloid fibrils; Auto-correlation function; Bio-physio-chemical properties; Support Vector Machine

Year: 2012 PMID： 23275694 PMCID： PMC3524944 DOI： 10.6026/97320630008994

Source DB: PubMed Journal: Bioinformation ISSN： 0973-2063

Background

Amyloid fibril forming proteins are found to be related to amyloid illnesses. Recent experiments suggest that it is not the whole protein; rather short fragments are responsible for amyloidosis [1]. The major limitations of wet lab experimental methods are the time frame involved, high cost and effort. Therefore, a viable solution is through computational approaches. There are web tools available online such as AGGRESCAN [2], AMYLPRED [3], FOLDAMYLOID [4] and so on, but they have varied limitations in maintaining a balance between true positive rates and false positive rates as evaluated [5-7]. AmylPepPred thus provides an open access platform that enables easy and comprehensive retrieval of fibril forming short stretches that compensates the gap in existing amyloid fibril prediction tools by maintaining equilibrium between sensitivity and specificity. This prediction model is a practical implementation of the computational architecture depicted in figure 1 that purely follows a sequence-based design strategy.

Methodology

The training dataset (Amylpreddataset) has been compiled using experimentally proved proteins related to amyloidosis and proteins with no experimentally determined amyloidogenic regions as described in [6, 7]. The length of wet lab proven positive regions of proteins varies. In fact, the long positive protein segments are broken up into smaller fragments comprising of six amino acids to make the data uniform. Among the 559 properties identified, we extracted a new and complementary set of 40 physicochemical and biochemical properties through Memetic Algorithm, an evolutionary Support Vector Machine (SVM) feature selection approach, besides their auto-correlation function of 5 best pre-selected features in AAIndex database [8] with accession nos. VINM940104, ENGD860101, PRAM900101, KUHL950101, JANJ790101 through SVM within a residue in forming the feature vector to train the SVM model. The overall methodology is illustrated in (Figure 1). The programs are written in C#

Figure 1

Flowchart illustrating the computational architecture of AmylPepPred

Software input/output

Once all the related files are downloaded in the same directory, double click the application named, Hexpepfinder. Choose Finder from the menu in the Main window. The user can now browse the input text file containing protein sequence in FASTA format and an output text file. Click Run Finder. The program separates the header and sequence and checks if the input is valid or not. Wait for a pop-up window. To view the output, choose Output file viewer from the menu. By selecting appropriate radio buttons, user can view the fibril forming, non-fibril forming hexmer sequences or both along with positions.

Conclusion

The study of protein aggregation is crucial to develop rational therapeutic stratagems against amyloid diseases. An encouraging tactic to spot such deposits is through computational prediction models. Nevertheless, these models cannot substitute the wet lab experiments; they might assist in recognizing the regions of concern for further molecular research. AmylPepPred provides a user-friendly interface, a convenient menu driven search option, allowing efficient discrimination of fibril forming and non- fibril forming short protein sequences.

8 in total

1. AAindex: amino acid index database.

Authors: S Kawashima; M Kanehisa
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

2. Machine learning study of classifiers trained with biophysiochemical properties of amino acids to predict fibril forming Peptide motifs.

Authors: Smitha Sunil Kumaran Nair; N V Subba Reddy; K S Hareesha
Journal: Protein Pept Lett Date: 2012-09 Impact factor: 1.890

3. FoldAmyloid: a method of prediction of amyloidogenic regions from protein sequence.

Authors: Sergiy O Garbuzynskiy; Michail Yu Lobanov; Oxana V Galzitskaya
Journal: Bioinformatics Date: 2009-12-17 Impact factor: 6.937

4. Exploiting heterogeneous features to improve in silico prediction of peptide status - amyloidogenic or non-amyloidogenic.

Authors: Smitha Sunil Kumaran Nair; N V Subba Reddy; K S Hareesha
Journal: BMC Bioinformatics Date: 2011-11-30 Impact factor: 3.169

AmylPepPred: Amyloidogenic Peptide Prediction tool.

Background

Methodology

Software input/output

Conclusion

1. AAindex: amino acid index database.

2. Machine learning study of classifiers trained with biophysiochemical properties of amino acids to predict fibril forming Peptide motifs.

3. FoldAmyloid: a method of prediction of amyloidogenic regions from protein sequence.

4. Exploiting heterogeneous features to improve in silico prediction of peptide status - amyloidogenic or non-amyloidogenic.

5. Motif mining: an assessment and perspective for amyloid fibril prediction tool.

6. AGGRESCAN: a server for the prediction and evaluation of "hot spots" of aggregation in polypeptides.

7. Prediction of amyloid fibril-forming segments based on a support vector machine.

8. Amyloidogenic determinants are usually not buried.