| Literature DB >> 23308169 |
Chyn Liaw1, Chun-Wei Tung, Shinn-Ying Ho.
Abstract
Antibody amyloidogenesis is the aggregation of soluble proteins into amyloid fibrils that is one of major causes of the failures of humanized antibodies. The prediction and prevention of antibody amyloidogenesis are helpful for restoring and enhancing therapeutic effects. Due to a large number of possible germlines, the existing method is not practical to predict sequences of novel germlines, which establishes individual models for each known germline. This study proposes a first automatic and across-germline prediction method (named AbAmyloid) capable of predicting antibody amyloidogenesis from sequences. Since the amyloidogenesis is determined by a whole sequence of an antibody rather than germline-dependent properties such as mutated residues, this study assess three types of germline-independent sequence features (amino acid composition, dipeptide composition and physicochemical properties). AbAmyloid using a Random Forests classifier with dipeptide composition performs well on a data set of 12 germlines. The within- and across-germline prediction accuracies are 83.10% and 83.33% using Jackknife tests, respectively, and the novel-germline prediction accuracy using a leave-one-germline-out test is 72.22%. A thorough analysis of sequence features is conducted to identify informative properties for further providing insights to antibody amyloidogenesis. Some identified informative physicochemical properties are amphiphilicity, hydrophobicity, reverse turn, helical structure, isoelectric point, net charge, mutability, coil, turn, linker, nuclear protein, etc. Additionally, the numbers of ubiquitylation sites in amyloidogenic and non-amyloidogenic antibodies are found to be significantly different. It reveals that antibodies less likely to be ubiquitylated tend to be amyloidogenic. The method AbAmyloid capable of automatically predicting antibody amyloidogenesis of novel germlines is implemented as a publicly available web server at http://iclab.life.nctu.edu.tw/abamyloid.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23308169 PMCID: PMC3538782 DOI: 10.1371/journal.pone.0053235
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Performance comparison of three general methods.
| Method | Sensitivity | Specificity | Accuracy (%) | AUC |
|
| 1.000 | 0.000 | 56.94 | 0.528 |
|
| 0.911 | 0.059 | 54.40 | 0.519 |
|
| 0.886 | 0.156 | 57.18 | 0.612 |
Figure 1The receiver operator characteristic (ROC) curves of three general methods.
Figure 2Three evaluation methods for AbAmyloid.
(A) There are 12 individual germline models. Each model is evaluated using a Jackknife test. (B) Only one model is constructed using a dataset of 12 germline (AA-432). (C) The leave-one-germline-out test is applied to evaluate the novel-germline prediction of AbAmyloid where each dataset of one germline is served as the test dataset of novel germline in turn.
Performance comparison among various types of sequence features and methods for the within-germline prediction in terms of Jackknife test accuracy.
| Method | Sensitivity | Specificity | Accuracy (%) |
|
| 0.813 | 0.742 | 78.24 |
|
| 0.829 | 0.833 | 83.10 |
|
| 0.825 | 0.694 | 76.85 |
|
| 0.829 | 0.801 | 81.71 |
|
| 0.850 | 0.720 | 79.40 |
|
| 0.850 | 0.747 | 80.56 |
|
| 0.850 | 0.753 | 80.79 |
|
| 0.756 | 0.823 | 78.47 |
Performance comparison among various types of sequence features and methods for the across-germline prediction in terms of Jackknife test accuracy.
| Method | Sensitivity | Specificity | Accuracy (%) |
|
| 0.846 | 0.731 | 79.63 |
|
| 0.854 | 0.806 | 83.33 |
|
| 0.846 | 0.720 | 79.17 |
|
| 0.854 | 0.801 | 83.10 |
|
| 0.854 | 0.731 | 80.09 |
|
| 0.870 | 0.774 | 82.87 |
|
| 0.858 | 0.763 | 81.71 |
Performance comparison among various types of sequence features and methods for the across-germline prediction using 10-fold cross-validation.
| Method | Sensitivity | Specificity | Accuracy (%) |
|
| 0.846 | 0.742 | 80.09 |
|
| 0.862 | 0.823 | 84.49 |
|
| 0.837 | 0.704 | 78.01 |
|
| 0.870 | 0.801 | 84.03 |
|
| 0.837 | 0.715 | 78.47 |
|
| 0.862 | 0.780 | 82.64 |
|
| 0.862 | 0.785 | 82.87 |
Figure 3Histogram and percentages for sequence pairs with sequence identities between training and test datasets.
Figure 4Histogram for sequence pairs with sequence identities between amyloidogenic and non-amyloidogenic sequences.
Performance comparison among various types of sequence features and methods for the novel-germline prediction.
| Method | Sensitivity | Specificity | Accuracy (%) |
|
| 0.626 | 0.581 | 60.65 |
|
| 0.785 | 0.640 | 72.22 |
|
| 0.675 | 0.484 | 59.26 |
|
| 0.768 | 0.634 | 71.06 |
|
| 0.671 | 0.522 | 60.65 |
|
| 0.732 | 0.629 | 68.75 |
|
| 0.699 | 0.602 | 65.74 |
|
| 0.411 | 0.645 | 51.16 |
The novel-germline prediction performances for 12 germlines using dipeptide composition (DPC).
| Germline | Sensitivity | Specificity | Accuracy (%) |
|
| 1.000 | 0.600 | 73.91 |
|
| 0.333 | 0.900 | 68.75 |
|
| 0.875 | 0.684 | 74.07 |
|
| 0.848 | 0.375 | 69.39 |
|
| 0.579 | 1.000 | 75.76 |
|
| 1.000 | 0.000 | 35.71 |
|
| 1.000 | 0.000 | 56.67 |
|
| 0.118 | 1.000 | 34.78 |
|
| 0.400 | 1.000 | 78.57 |
|
| 0.962 | 0.588 | 81.40 |
|
| 0.943 | 0.444 | 77.36 |
|
| 1.000 | 0.853 | 94.05 |
Figure 5Learning curves using various numbers of germlines for training classifiers.
The prediction performance is evaluated by using a leave-one-germline-out test.
Figure 6Feature importance of Amino Acid Composition (AAC).
The feature with the largest value of mean decrease of Gini index (MDGI) is the most important.
Figure 7Feature importance of Dipeptide Composition (DPC).
The feature with the largest value of mean decrease of Gini index (MDGI) is the most important.
Figure 8The heatmap of DPC feature importance.
Figure 9Feature importance of Physicochemical Properties (PPs).
The feature with the largest value of mean decrease of Gini index (MDGI) is the most important.
Top 30 informative physicochemical properties.
| AAindex ID | Description |
|
| Amphiphilicity index (Mitaku et al., 2002) |
|
| Isoelectric point (Zimmerman et al., 1968) |
|
| AA composition of CYT of multi-spanning proteins (Nakashima-Nishikawa, 1992) |
|
| Linker propensity from 3-linker dataset (George-Heringa, 2003) |
|
| Optimized transfer energy parameter (Oobatake et al., 1985) |
|
| Optimized propensity to form reverse turn (Oobatake et al., 1985) |
|
| Average relative fractional occurrence in ER(i) (Rackovsky-Scheraga, 1982) |
|
| Composition of amino acids in nuclear proteins (percent) (Cedano et al., 1997) |
|
| Normalized relative frequency of helix end (Isogai et al., 1980) |
|
| Transfer free energy, CHP/water (Lawson et al., 1984) |
|
| Signal sequence helical potential (Argos et al., 1982) |
|
| Fraction of site occupied by water (Krigbaum-Komoriya, 1979) |
|
| Net charge (Klein et al., 1984) |
|
| alpha-NH chemical shifts (Bundi-Wuthrich, 1979) |
|
| Average reduced distance for C-alpha (Meirovitch et al., 1980) |
|
| Membrane-buried preference parameters (Argos et al., 1982) |
|
| SD of AA composition of total proteins (Nakashima et al., 1990) |
|
| Normalized frequency of zeta R (Maxfield-Scheraga, 1976) |
|
| Side chain orientational preference (Rackovsky-Scheraga, 1977) |
|
| Relative mutability (Dayhoff et al., 1978b) |
|
| Average reduced distance for C-alpha (Rackovsky-Scheraga, 1977) |
|
| Information measure for middle turn (Robson-Suzuki, 1976) |
|
| Partition energy (Guy, 1985) |
|
| Residue accuracyessible surface area in folded protein (Chothia, 1976) |
|
| Weights for alpha-helix at the window position of 6 (Qian-Sejnowski, 1988) |
|
| Average relative fractional occurrence in AL(i) (Rackovsky-Scheraga, 1982) |
|
| Weights for alpha-helix at the window position of 1 (Qian-Sejnowski, 1988) |
|
| Surface composition of amino acids in nuclear proteins (percent) (Fukuchi-Nishikawa, 2001) |
|
| Helix termination parameter at position j-2,j-1,j (Finkelstein et al., 1991) |
|
| Intercept in regression analysis (Prabhakaran-Ponnuswamy, 1982) |
Categorized informative of top 30 physicochemical properties.
| Categorized property | AAindex ID | References |
|
| MITS020101, NAKH920106 |
|
|
| OOBM850103, ARGP820103, LAWE840101, CHOC760102, GUYH850101, KRIW790102, RACS770103 |
|
|
| QIAN880113, QIAN880108, BUNA790101, ARGP820102, ISOY800106 |
|
|
| RACS820106, ROBB760110 | new |
|
| GEOR030104 | new |
|
| OOBM850102 |
|
|
| RACS770101, PRAM820101, MEIH800101 | new |
|
| ZIMJ680104, FINA910103 |
|
|
| KLEP840101 |
|
|
| DAYM780201 | new |
|
| CEDJ970105, FUKS010104 | new |
|
| RACS820103, NAKH900102, MAXF760103 | new |
Figure 10The 100% stacked column chart of the numbers of lysines (K) and putative ubiquitylated lysines (Ub-K).
Figure 11Distribution of lysines for amyloidogenic and non-amyloidogenic antibodies.
Figure 12Feature importance of 12 ubiquitylation features.
The feature with the largest value of mean decrease of Gini index (MDGI) is the most important.
Figure 13Feature importance of 12 ubiquitylation features and Dipeptide Composition (DPC).
The feature with the largest value of mean decrease of Gini index (MDGI) is the most important.
The numbers of amyloidogenic and non-amyloidogenic antibodies in the dataset AA-432.
| Germline | Amyloidogenic | Non-Amyloidogenic |
|
| 8 | 15 |
|
| 6 | 10 |
|
| 8 | 19 |
|
| 33 | 16 |
|
| 19 | 14 |
|
| 5 | 9 |
|
| 17 | 13 |
|
| 34 | 12 |
|
| 5 | 9 |
|
| 26 | 17 |
|
| 35 | 18 |
|
| 50 | 34 |
|
| 246 | 186 |