| Literature DB >> 35368659 |
Yuwei Zhou1,2, Shiyang Xie1,2, Yue Yang1,2, Lixu Jiang1,2, Siqi Liu1,2, Wei Li1,2, Hamza Bukari Abagna1,2, Lin Ning3, Jian Huang1,2.
Abstract
Therapeutic antibodies play a crucial role in the treatment of various diseases. However, the success rate of antibody drug development is low partially because of unfavourable biophysical properties of antibody drug candidates such as the high aggregation tendency, which is mainly driven by hydrophobic interactions of antibody molecules. Therefore, early screening of the risk of hydrophobic interaction of antibody drug candidates is crucial. Experimental screening is laborious, time-consuming, and costly, warranting the development of efficient and high-throughput computational tools for prediction of hydrophobic interactions of therapeutic antibodies. In the present study, 131 antibodies with hydrophobic interaction experiment data were used to train a new support vector machine-based ensemble model, termed SSH2.0, to predict the hydrophobic interactions of antibodies. Feature selection was performed against CKSAAGP by using the graph-based algorithm MRMD2.0. Based on the antibody sequence, SSH2.0 achieved the sensitivity and accuracy of 100.00 and 83.97%, respectively. This approach eliminates the need of three-dimensional structure of antibodies and enables rapid screening of therapeutic antibody candidates in the early developmental stage, thereby saving time and cost. In addition, a web server was constructed that is freely available at http://i.uestc.edu.cn/SSH2/.Entities:
Keywords: developability; hydrophobic interactions; prediction model; support vector machine; therapeutic antibody
Year: 2022 PMID: 35368659 PMCID: PMC8965096 DOI: 10.3389/fgene.2022.842127
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Three experimental thresholds for evaluating the hydrophobic interaction of antibodies (Jain et al., 2017b).
| Assay | Worst 10% threshold | Units (flag) |
|---|---|---|
| Standup monolayer adsorption chromatography (SMAC) | 12.8 | Retention time (min) (>) |
| Salt-gradient affinity-capture self-interaction nanoparticle spectroscopy (SGAC-SINS) | 370 | Salt concentration (mM) (<) |
| Hydrophobic interaction chromatography (HIC) | 11.7 | Retention time (min) (>) |
FIGURE 1The number of hydrophobic interaction flags and the classification of antibodies.
FIGURE 2The ACC of different feature numbers during the sequential forward selection process of three sub-datasets (Group 1, Group 2, Group 3).
The prediction performance of three sub-models evaluated through leave-one-out cross-validation and that of the ensemble model evaluated through voting strategy.
| Model | Sn(%) | Sp (%) | ACC(%) | MCC | AUC |
|---|---|---|---|---|---|
| SSH_a | 81.08 | 80.64 | 80.88 | 0.6159 | 0.8086 |
| SSH_b | 81.08 | 74.19 | 77.94 | 0.5544 | 0.7763 |
| SSH_c | 78.37 | 71.87 | 75.36 | 0.5038 | 0.7513 |
| SSH2.0 | 100.00 | 77.66 | 83.97 | 0.7039 | 0.8883 |
FIGURE 3The ROC curves of three sub-models for predicting all 131 antibodies.
FIGURE 4Analysis of MEMD2.0 dimensionality reduction results. (A) The reduced ratio and (B) the number of features in the dimension of three sub-datasets. The numbers in parentheses are the original feature numbers of various feature extraction algorithm.
The prediction performance of the ensemble model based on 20 feature extraction algorithms.
| Feature | Sn (%) | Sp(%) | ACC(%) | MCC | AUC |
|---|---|---|---|---|---|
| CKSAAGP | 100.00 | 77.66 | 83.97 | 0.7039 | 0.8883 |
| CTriad | 100.00 | 75.53 | 82.44 | 0.6825 | 0.8777 |
| DPC | 100.00 | 72.34 | 80.15 | 0.6518 | 0.8617 |
| TPC | 100.00 | 71.28 | 79.39 | 0.6419 | 0.8564 |
| AAC | 100.00 | 70.21 | 78.63 | 0.6322 | 0.8511 |
| CKSAAP | 100.00 | 69.15 | 77.86 | 0.6226 | 0.8457 |
| NMBroto | 97.30 | 69.15 | 77.10 | 0.5983 | 0.8322 |
| DDE | 100.00 | 65.96 | 75.57 | 0.5947 | 0.8298 |
| GTPC | 100.00 | 63.83 | 74.05 | 0.5767 | 0.8191 |
| CTDC | 97.30 | 65.96 | 74.81 | 0.5699 | 0.8163 |
| CTDT | 91.89 | 63.83 | 71.76 | 0.5021 | 0.7786 |
| CTDD | 97.30 | 56.38 | 67.94 | 0.4910 | 0.7684 |
| Geary | 100.00 | 53.19 | 66.41 | 0.4929 | 0.7660 |
| SOCNumber | 100.00 | 52.13 | 65.65 | 0.4850 | 0.7606 |
| Moran | 100.00 | 50.00 | 64.12 | 0.4693 | 0.7500 |
| QSOrder | 83.78 | 60.64 | 67.18 | 0.4003 | 0.7221 |
| KSCTriad | 100.00 | 40.43 | 57.25 | 0.4010 | 0.7021 |
| GAAC | 75.68 | 62.77 | 66.41 | 0.3464 | 0.6922 |
| GDPC | 100.00 | 30.85 | 50.38 | 0.3345 | 0.6543 |
| PAAC | 100.00 | 0.00 | 28.24 | 0.0000 | 0.5000 |
The top 10 CKSAAGP features of three sub-models. The features marked in red indicate that they exist in at least two sub-models (neg: negative charged group; pos: positive charge group).
| SSH_a | SSH_b | SSH_c |
|---|---|---|
| aromatic.uncharge.gap0 | aromatic.aliphatic.gap1 | aliphatic.pos.gap0 |
| uncharge.uncharge.gap0 | aliphatic.neg.gap3 | uncharge.aliphatic.gap4 |
| aromatic.aliphatic.gap3 | pos.aliphatic.gap2 | uncharge.aromatic.gap2 |
| pos.neg.gap0 | uncharge.uncharge.gap2 | neg.aromatic.gap5 |
| aliphatic.aromatic.gap5 | aliphatic.pos.gap0 | pos.uncharge.gap5 |
| uncharge.uncharge.gap2 | neg.uncharge.gap4 | aliphatic.uncharge.gap5 |
| pos.uncharge.gap0 | aliphatic.aromatic.gap5 | aromatic.aliphatic.gap3 |
| pos.uncharge.gap4 | neg.aliphatic.gap2 | aliphatic.uncharge.gap1 |
| neg.pos.gap2 | aromatic.uncharge.gap2 | aliphatic.aliphatic.gap2 |
| aliphatic.uncharge.gap5 | aromatic.pos.gap1 | neg.neg.gap3 |
Comparison of the feature and performance between SSH2.0 and SSH.
| Model | Feature | Feature extraction method | Feature number of sub-models | Sn(%) | Sp(%) | ACC(%) | AUC |
|---|---|---|---|---|---|---|---|
| SSH | TPC | f -scores | 313,315,315 | 84.30 | 96.39 | 91.23 | 0.9620 |
| SSH2.0 | CKSAAGP | MRMD2.0 | 29,31,35 | 100.00 | 77.66 | 83.97 | 0.8883 |
FIGURE 5Correlation coefficient matrix of DI and 12 experimental assays. The lower triangle shows the spearman correlation coefficients, and the upper triangle represents the corresponding correlation values. The radius of the circles is proportional to the magnitude of the correlation coefficient. Red represents a positive correlation, and blue represents a negative correlation.
FIGURE 6Screenshots of the SSH2.0 web server. (A)Homepage of the SSH2.0 web server. (B) If illegal characters appear in the input sequence, click “predict” bottom and a prompt page will pop up, The prompt page showing “There is the illegal character!”. Users can click “submit another job.” to return to the home page and resubmit the sequence. (C) Result display page. “1” in the “Result” column denotes that the submitted antibody candidate exhibits a high risk of hydrophobic interaction and should be excluded from the development pipeline. The “Probability” column represents the probability of the risk of hydrophobic interaction. The antibody will be predicted to have a high risk of hydrophobic interaction if the probability is 0.5 or higher. The result table can be sorted according to each column, and a custom display box allows users to select and display specific information as needed.