| Literature DB >> 31871588 |
Miguel Andrade1, Camila Pontes1, Werner Treptow1.
Abstract
Here, we investigate the contributions of coevolutive, evolutive and stochastic information in determining protein-protein interactions (PPIs) based on primary sequences of two interacting protein families A and B. Specifically, under the assumption that coevolutive information is imprinted on the interacting amino acids of two proteins in contrast to other (evolutive and stochastic) sources spread over their sequences, we dissect those contributions in terms of compensatory mutations at physically-coupled and uncoupled amino acids of A and B. We find that physically-coupled amino-acids at short range distances store the largest per-contact mutual information content, with a significant fraction of that content resulting from coevolutive sources alone. The information stored in coupled amino acids is shown further to discriminate multi-sequence alignments (MSAs) with the largest expectation fraction of PPI matches - a conclusion that holds against various definitions of intermolecular contacts and binding modes. When compared to the informational content resulting from evolution at long-range interactions, the mutual information in physically-coupled amino-acids is the strongest signal to distinguish PPIs derived from cospeciation and likely, the unique indication in case of molecular coevolution in independent genomes as the evolutive information must vanish for uncorrelated proteins.Entities:
Keywords: Coevolution; Evolution; Mutual information; Protein network; Protein-protein interaction
Year: 2019 PMID: 31871588 PMCID: PMC6906720 DOI: 10.1016/j.csbj.2019.10.005
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
Scheme 1Structural contacts mapped into M-long multi-sequence alignment (MSA) of protein interologs A and B. A set of pairwise protein-protein interactions is defined by associating each sequence l in MSA B to a sequence k in MSA A in one unique arrangement, {l(k)|z}, determined by the coevolution process z to which these protein families were subjected. Shown is a “scrambled” concatenated MSA of A and B associated to a given process z (red dashes). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Protein system A and B considered in the study.
| Complex description | PDB ID | Protein A | Protein B | M | MSA length | |
|---|---|---|---|---|---|---|
| Obligate Dimers | Carbamoyl Phosphate Synthetase | 1BXR | 1004 | 1452 | ||
| Lactococcus Lactis Dihydroorotate Dehydrogenase B. | 1EP3 | 552 | 572 | |||
| Polysulfide reductase native structure | 2VPZ | 676 | 927 | |||
| heterohexameric TusBCD proteins | 2D1P | 216 | 214 | |||
| 3-oxoadipate coA-transferase | 3RRL | 1330 | 437 | |||
| Bovine heart cytochrome | 2Y69 | 1484 | 740 | |||
| Non-Obligate Dimer | Toxin-antitoxin complex RelBE2 from Mycobacterium tuberculosis | 3G5O | 904 | 173 |
Fig. 1Informational analysis of protein complex TusBCD, chains B and C. (A) Three-dimensional representation of stochastic variables X and Y as defined from physically coupled amino acids at short-range cutoff distances r ≤ 8.0 Å (turquoise) and physically uncoupled amino-acids at long-range cutoff distances r > 8.0 Å (gray). Calculation of r involved Cβ-Cβ atomic separation distances. (B) Conditional mutual information as a function of the number M − n of randomly paired proteins in the reference (native) MSA, for 0 ≤ n ≤ M. < I(X; Y|z)> are expectation values estimated from a generated ensemble of 500 MSA models. Mutual information of fully “scrambled” models featuring M unpaired sequences is similar to that calculated from randomized sequence alignments generated by aleatory swapping of lines within columns. (C) Mutual information gap ΔIM between reference and 100 fully “scrambled” models featuring M unpaired sequences. (D) Per-contact mutual information gap N−1ΔI,. (E) Mutual information decomposition according to Eq. (11) and comparison with functional mutual information (MIp,rc≤8) and direct information (DI≤8). In B, C, D and E error bars correspond to standard deviations.
Fig. 2Degeneracy and error analysis for stochastic variables X and Y involving interacting amino acids at short-range distances r ≤ 8.0 Å (turquoise) and long-range distances r > 8.0 Å (gray). (A) Total number ω of native-like MSA models at various mutual-information resolutions δI. (B) Per-contact gaps of mutual information N−1ΔI as a function of the number M − n of “scrambled” sequence pairs in the reference native alignment. (C) Expectation values <ε> (Eq. (15)) for the fraction of sequence matches across native-like MSA models at various mutual-information resolutions δI. Dashed lines highlight differences at δI values of 0.01 and 0.02.
Fig. 3Dependence with contact definition r* and docking decoys. (A) Per-contact mutual information gap N−1ΔI and mutual information subtracted from structural-functional relationships MI at various r*. (B) Per-contact mutual information gap N−1ΔI (turquoise), information content resulting from coevolution alone N−1ΔΔI (green) and mutual information subtracted from structural or functional relationships MI (blue) at alternative interfaces generated by docking – only physically coupled amino acids as defined for r ≤ 8.0 Å were included in the calculations. Black bars represent the root-mean-square deviation (RMSD in Ȧ units) between the native bound structure and docking decoys as generated by GRAMM-X [30]. Docking solutions were selected following a stability binding-energy criterium according to the scoring function of GRAMM – all docking decoys considered in the study are low-energy configurations despite large RMSD values relative to the native structure. (C) Illustration of four docking decoys of chain B in the protein complex TusBCD (chain C is shown in gray). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)