| Literature DB >> 27059896 |
Rob Patro1, Raquel Norel2, Robert J Prill2, Julio Saez-Rodriguez3, Peter Lorenz4, Felix Steinbeck4,5, Bjoern Ziems5, Mitja Luštrek6, Nicola Barbarini7, Alessandra Tiengo7, Riccardo Bellazzi7, Hans-Jürgen Thiesen4,5, Gustavo Stolovitzky2, Carl Kingsford8.
Abstract
BACKGROUND: Understanding the interactions between antibodies and the linear epitopes that they recognize is an important task in the study of immunological diseases. We present a novel computational method for the design of linear epitopes of specified binding affinity to Intravenous Immunoglobulin (IVIg).Entities:
Keywords: Antibodies; Machine learning; Protein binding; Protein design
Mesh:
Substances:
Year: 2016 PMID: 27059896 PMCID: PMC4826543 DOI: 10.1186/s12859-016-1008-7
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Performance of the various classifiers used within the Pythia method
| Features | AUROC | AUPR |
|
|
|---|---|---|---|---|
| k-spectrum | 0.85 | 0.70 | −0.043 | −0.072 |
| Sparse Spatial Sample | 0.87 | 0.73 | −0.023 | −0.042 |
| Nonlinear Fisher Mat. | 0.86 | 0.69 | −0.024 | −0.082 |
| Statistical Analysis Mat. | 0.85 | 0.67 | −0.025 | −0.102 |
| BLOSUM Encoding | 0.86 | 0.70 | −0.024 | −0.072 |
| Local Compositiona | 0.88 | 0.74 | −0.013 | −0.032 |
| Structure | 0.74 | 0.53 | −0.153 | −0.242 |
| Ensemble |
|
|
athe best single classifier under both the AUROC and AUPR metrics
Boldface indicates the best solution
Fig. 1Quality of designed peptides from two approaches. The distribution of measured affinities for the designed peptides predicted to belong to the low (L) and high (H) binding affinity classes for the (a) Pythia-design method, and (b) method of Barbarini et al. [2]. The horizontal line at 10,000 indicates the binding affinity cutoff above which a peptide is considered to have a high binding affinity. Both methods produce a statistically significant separation of high- and low- binders (P<0.001), but Pythia-design is much better at generating high-affinity binders
Fig. 2a Precision, b recall and c accuracy of peptide design. Performance of Pythia-design, the method of [2] (labeled Pavia), and the aggregate formed by taking the positives from Pythia-design and the negatives from [2] is shown. In (a), the lines for Pythia and the aggregate overlap. The x-axis δ is a measure of the activities from the high- and low-binding affinities (see text), and (d) shows the fraction of peptides excluded for a given δ
Fig. 3Performance of the Pythia-design and Barbarini et al. [2] method (labeled Pavia) for designing peptides with desired reactivities. ROC curves were determined from predicted peptides incubated with IVIg (5 mg/ml) diluted to 1:100, 1:400 and 1:1000, and epitope-antibody reactivities (EAR) determined as described by Lustrek et al. 2013
Fig. 4Diversity of the designed, predicted high- and low-binding affinity peptides. The sequence diversity among the Pythia-design peptides is significantly higher than the approach of [2]. The y-axis gives a measure of diversity of a set of designed peptides (see text) under a particular Hamming-distance threshold defining similar peptides (x-axis). Almost all the Pythia-designed peptides differ in at least 9 of their 15 possible positions
Confusion matrix for designed peptides
| Barbarini et al. [ | Pythia-design | |||
|---|---|---|---|---|
| Binder | Nonbinder | Binder | Nonbinder | |
| Bound (1:100) | 261 | 25 | 387 | 117 |
| Not bound (1:100) | 139 | 175 | 13 | 83 |
| Bound (1:400) | 128 | 7 | 325 | 40 |
| Not bound (1:400) | 272 | 193 | 75 | 160 |
| Bound (1:1000) | 99 | 3 | 270 | 18 |
| Not bound (1:1000) | 301 | 197 | 130 | 182 |
Six hundred peptides representing 400 binders and 200 non-binders of each of Pythia-design and the method of Barbarini et al. were incubated with IVIg. The confusion matrix below indicates that the peptides selected by Pythia-design bind antibodies with higher affinity than the peptides designed by Barbarini et al.
Fig. 5Performance of a method that combines Pythia-design and Barbarini et al. [2] (labeled Pavia)
The presence of citrulline and cysteine in the designed peptides and the training and test sets
| Total | With Z | With C | With Z and C | |
|---|---|---|---|---|
| Training set High | 3420 | 8 | 906 | 3 |
| Training set Low | 10218 | 326 | 1971 | 71 |
| Test set High | 3421 | 5 | 944 | 3 |
| Test set Low | 10219 | 356 | 2032 | 93 |
| Pythia-design “H” | 1500 | 1286 | 1302 | 1093 |
| Pythia-design “M” | 3000 | 2885 | 2365 | 2253 |
| Pythia-design “L” | 1500 | 1484 | 1044 | 1029 |
| Pythia-design “H” tested | 400 | 318 | 344 | 265 |
| Pythia-design “L” tested | 200 | 185 | 150 | 136 |
| Barbarini et al. [ | 1100 | 0 | 420 | 0 |
| Barbarini et al. [ | 1100 | 0 | 628 | 0 |
| Barbarini et al. [ | 400 | 0 | 196 | 0 |
| Barbarini et al. [ | 200 | 0 | 110 | 0 |
The prevalence of citrulline and cysteine is likely due to the fact that citrulline and cystiine were less represented in the peptides used in the training data sets, allowing these designed peptides to more easily satisfy the imposed diversity requirements
Fig. 6Position-specific peptide propensity within true positive Pythia-design peptides (at dilution 1:1000) divided by the PWM of the negative set of peptides. PWM segments in red indicate amino acids that are predicted to interfere with antibody binding. Green highlights amino acids that favor binding of antibodies present in IVIg. An over-representation of cysteine (C) and tryptophan (W) in all positions is seen