| Literature DB >> 34215782 |
Puneet Rawat1, R Prabakaran1, Sandeep Kumar2, M Michael Gromiha3,4.
Abstract
The light chain (AL) amyloidosis is caused by the aggregation of light chain of antibodies into amyloid fibrils. There are plenty of computational resources available for the prediction of short aggregation-prone regions within proteins. However, it is still a challenging task to predict the amyloidogenic nature of the whole protein using sequence/structure information. In the case of antibody light chains, common architecture and known binding sites can provide vital information for the prediction of amyloidogenicity at physiological conditions. Here, in this work, we have compared classical sequence-based, aggregation-related features (such as hydrophobicity, presence of gatekeeper residues, disorderness, β-propensity, etc.) calculated for the CDR, FR or VL regions of amyloidogenic and non-amyloidogenic antibody light chains and implemented the insights gained in a machine learning-based webserver called "VLAmY-Pred" ( https://web.iitm.ac.in/bioinfo2/vlamy-pred/ ). The model shows prediction accuracy of 79.7% (sensitivity: 78.7% and specificity: 79.9%) with a ROC value of 0.88 on a dataset of 1828 variable region sequences of the antibody light chains. This model will be helpful towards improved prognosis for patients that may likely suffer from diseases caused by light chain amyloidosis, understanding origins of aggregation in antibody-based biotherapeutics, large-scale in-silico analysis of antibody sequences generated by next generation sequencing, and finally towards rational engineering of aggregation resistant antibodies.Entities:
Year: 2021 PMID: 34215782 PMCID: PMC8253744 DOI: 10.1038/s41598-021-93019-9
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Residue wise Shannon entropy and occupancy plotted for consensus sequences from amyloidogenic (red) and non-amyloidogenic light chains (blue) of complete dataset, kappa (κ) isotype and lambda (λ) isotype. The bar graph shows the Shannon entropy (left axis) and the line graph (right axis) shows the percent occupancy. The CDR regions in the consensus sequence (x-axis) are colored in yellow. Low occupancy values denote more gaps in the multiple sequence alignment.
Figure 2The aggregation related features (a) consensus hydrophobicity of the CDRs, (b) percentage of gatekeeper residues in FRs and (c) disorderness score (from IUPred2A server) calculated for different segments of VL-region of amyloidogenic (grey) and non-amyloidogenic (white) antibodies.
Figure 3The aggregation-related features (a) average hydrophobicity of the CDRs, (b) percentage of gatekeeper residues in FRs and (c) disorderness score (from IUPred2A server) calculated for kappa (grey) and lambda (white) isotypes.
Performance of the classification model for distinguishing between amyloidogenic and non-amyloidogenic variable domain light chain of antibodies.
| Performance | Accuracy | Sensitivity | Specificity | ROC |
|---|---|---|---|---|
| Self-consistency | 81.9 | 82.4 | 81.8 | 0.9 |
| Leave one out CV | 80 | 75.1 | 81.2 | 0.83 |
| 10-fold CV | 72.7 | 78 | 71.4 | 0.82 |
| Resampling | 71.7 ± 6.9 | 77.2 ± 9.9 | 70.4 ± 9.9 | 0.82 ± 0.04 |
| Test set (AL-Test) | 71 | 65.7 | 72.3 | 0.74 |
| Test set (David et al.)[ | 73.2 | 77.3 | 57.7 | 0.66 |
| Novel germline prediction[ | 65.2 ± 11.5 | 62.2 ± 33.6 | 45.4 ± 29.4 | 0.65 ± 0.16 |
| Final model | 79.7 | 78.7 | 79.9 | 0.88 |
| CST dataset | 75.6 | – | – | – |
| Human antibody repertoire | 94.1 | – | – | – |
The standard deviation is mentioned for resampling and Novel germline prediction next to the performance measure. The final model was developed on complete AL-Base dataset (1828 sequences). The percent accuracy for CST dataset and human antibody repertoire represent the percentage of sequences predicted as non-amyloid.
Figure 4The receiver operating characteristic (ROC) curve plotted for the classification model using training dataset (blue) and leave-one-out cross-validation (red).
Performance of aggregation-prone region prediction algorithms (TANGO and WALTZ), RFAmyloid and VLAmY-Pred on AL-Base dataset (1828 sequences).
| TANGO | WALTZ | RFAmyloid | VLAmY-Pred | |
|---|---|---|---|---|
| Accuracy (%) | 48.4 | 22.2 | 19 | 80 |
| Sensitivity (%) | 34.5 | 96 | 100 | 75.1 |
| Specificity (%) | 51.7 | 4.8 | 0 | 81.2 |
The presence of APRs in the variable region of light chain is considered amyloidogenic for TANGO and WALTZ. The VLAmY-Pred results are based on Leave one out cross-validation.