| Literature DB >> 26303588 |
Aram Avila-Herrera1,2, Katherine S Pollard3,4,5,6.
Abstract
BACKGROUND: When biomolecules physically interact, natural selection operates on them jointly. Contacting positions in protein and RNA structures exhibit correlated patterns of sequence evolution due to constraints imposed by the interaction, and molecular arms races can develop between interacting proteins in pathogens and their hosts. To evaluate how well methods developed to detect coevolving residues within proteins can be adapted for cross-species, inter-protein analysis, we used statistical criteria to quantify the performance of these methods in detecting inter-protein residues within 8 angstroms of each other in the co-crystal structures of 33 bacterial protein interactions. We also evaluated their performance for detecting known residues at the interface of a host-virus protein complex with a partially solved structure.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26303588 PMCID: PMC4549020 DOI: 10.1186/s12859-015-0677-y
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
List of methods benchmarked
| Method | APC | Re-weighting | Reference | Software package | |
|---|---|---|---|---|---|
| Information-based | MI | No | None | [ | infCalc |
| VI | [ | ||||
| MIj | [ | ||||
| MIHmin | |||||
| MIw | seq %id | [ | DCA | ||
| Direct | DI | Yes | seq %id, pseudocount | ||
| DI256 | [ | Code S1 in [ | |||
| DI32 | |||||
| DIplm | seq %id | [ | plmDCA | ||
| PSICOV | Blosum, pseudocount | [ | PSICOV | ||
| Phylogenetic | CMPcor | No | Downsampling | [ | CoMap |
| CMPchg | [ | ||||
| CMPvol | |||||
| CMPpol |
Coevolution methods benchmarked fall into three categories. Information-based methods: MI: mutual information [71], VI: variation of information [65], MIj: MI divided by alignment column-pair entropy, MIHmin: MI divided by minimum column entropy [8], MIw: MI with adjusted amino acid probabilities. Direct methods: DI: direct information—MI with re-estimated joint probabilities [9], DI256, DI32: DI using Hopfield-Potts for dimensional reduction (256 and 32 patterns respectively) [68], DIplm: Frobenius norm of coupling matrices in 21-state Potts model using pseudolikelihood maximization [72], PSICOV: sparse inverse covariance estimation [14]. Phylogenetic methods: CoMap P-values for four analyses CMPcor: substitution correlation analysis [10], CMPpol for polarity compensation, CMPchg for charge compensation, CMPvol for volume compensation [2]
TP: True positive, FP: False positive, TN: True negative, FN: False negative
| Prediction | ||
|---|---|---|
| C | Coevolving | Not coevolving |
| <8Å | TP | FN |
| ≥8Å | FP | TN |
Fig. 1Coevolution statistics differ in their ability to detect residue contacts in HisKA-RR sub-alignments. Direct methods benefit from larger, more diverse alignments. Left: Precision (PPV) at false positive rate (FPR) < 0.1 %. Right: Power (TPR) at false positive rate (FPR) < 5 %. Blue lines indicate a loess fit to each method, 95 % confidence intervals are shown in gray. See Abbreviations and Table 1 for abbreviations
Fig. 2Detecting coevolving alignment columns is easier when individual alignment columns have similar levels of variation. Column pairs in the HisKA-RR sub-alignments are parsed according to above- or below- median entropy for each size alignment size (number of sequences: N). Left: Median precision (PPV) at FPR < 0.1 %. Right: Median power (TPR) at FPR < 5 %. See Abbreviations and Table 1 for abbreviations
Fig. 3Null distributions for coevolution statistics differ in their control of the false positive rate (FPR). Nominal FPRs for a given target FPR 0.1 % (Dashed orange line) are shown for the HisKA-RR sub-alignments. Left: Nominal FPRs using the empirical distribution of score ranks as the null distribution (i.e. using P ). Right: Nominal FPRs assuming standardized scores have a standard normal null distribution (i.e. using P ). Blue lines indicate a loess fit for each method, 95 % confidence intervals are shown in gray. See Abbreviations and Table 1 for abbreviations
Important residues for the Vif-A3G interaction
| Position | Notes | |
|---|---|---|
| Vif | 21–23,26 | A3G-specific |
| 30 | ||
| 40–44 | ||
| 55–72 | A3G and A3F | |
| A3G | 121–149 | essential for Vif-binding |
HIV1 Vif [56–59]. Human A3G [60, 61]
Fig. 4Power (TPR), precision (PPV), and false positive rate (FPR) for predicting antiviral protein A3G residues (not pairs) essential for interacting with its viral antagonist Vif at P <α thresholds that maximize PPV for each coevolution method. Residues defined as positive are taken from previous functional mutation studies in Table 3. See Abbreviations and Table 1 for abbreviations
Fig. 5A3G residues currently known to be essential for binding its viral antagonist Vif. Predictions of residues that coevolve with Vif (red) made at a threshold that maximizes precision (PPV) using currently known essential residues identify position D130 which was previously implicated in species specific resistance
Versions and sources of coevolution methods benchmarked
| Method | Software package | Version | URL | |
|---|---|---|---|---|
| Information-based | MI | infCalc | v0.1.2 |
|
| VI | ||||
| MIj | ||||
| MIHmin | ||||
| MIw | DCA | “2011/12” |
| |
| Direct | DI | |||
| DI256 | Code S1 in [ | “2013” |
| |
| DI32 | ||||
| DIplm | plmDCA | symmetric_v2 |
| |
| PSICOV | PSICOV | V1.09 |
| |
| Phylogenetic | CMPcor | CoMap | 1.5.1b5 |
|
| CMPchg | ||||
| CMPvol | ||||
| CMPpol |