| Literature DB >> 36232803 |
Jordi Gómez Borrego1, Marc Torrent Burgas1.
Abstract
Adhesion and colonization of host cells by pathogenic bacteria depend on protein-protein interactions (PPIs). These interactions are interesting from the pharmacological point of view since new molecules that inhibit host-pathogen PPIs would act as new antimicrobials. Most of these interactions are discovered using high-throughput methods that may display a high false positive rate. The absence of curation of these databases can make the available data unreliable. To address this issue, a comprehensive filtering process was developed to obtain a reliable list of domains and motifs that participate in PPIs between bacteria and human cells. From a structural point of view, our analysis revealed that human proteins involved in the interactions are rich in alpha helix and disordered regions and poorer in beta structure. Disordered regions in human proteins harbor short sequence motifs that are specifically recognized by certain domains in pathogenic proteins. The most relevant domain-domain interactions were validated by AlphaFold, showing that a proper analysis of host-pathogen PPI databases can reveal structural conserved patterns. Domain-motif interactions, on the contrary, were more difficult to validate, since unstructured regions were involved, where AlphaFold could not make a good prediction. Moreover, these interactions are also likely accommodated by post-translational modifications, especially phosphorylation, which can potentially occur in 25-50% of host proteins. Hence, while common structural patterns are involved in host-pathogen PPIs and can be retrieved from available databases, more information is required to properly infer the full interactome. By resolving these issues, and in combination with new prediction tools like Alphafold, new classes of antimicrobials could be discovered from a more detailed understanding of these interactions.Entities:
Keywords: Alphafold; domain; host; motif; pathogen; protein interaction
Mesh:
Substances:
Year: 2022 PMID: 36232803 PMCID: PMC9569774 DOI: 10.3390/ijms231911489
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 6.208
Figure 1Workflow followed to retrieve HP-PPIs from the PHISTO database considering enriched motifs, domains, domain–domain and domain–motif associations, and to analyze the interface regions.
Figure 2Analysis of enriched domains in host-pathogen PPIs. (A) Bar plot representation of the 10 most enriched domains for human proteins (left). The tridimensional structure of the most enriched human domain, IPR001101, is displayed. Representative GO enriched terms for host domains are displayed on the right. (B) Bar plot representation of the 10 most enriched observed domains for bacterial proteins (left). The tridimensional structure of the two most enriched bacterial domains, IPR025875 and IPR019931, are also displayed. Representative GO enriched terms for bacterial domains are displayed on the right. In all cases, the GO term frequency is displayed on the x-axis and the GO term on the y-axis. Colors represent adjusted p-values for each GO term as calculated by dcGO [31].
Figure 3Physicochemical and structural properties of human proteins participating in host–pathogen interactions. (A) Boxplot representation of major properties for human proteins in PPIs compared to five groups of human proteins selected by random picking from the human proteome. (B) Overall representation of all scales used in CleverMachine for evaluating features in proteins. For a list of all properties evaluated, see [32]. Statistical comparisons were made using the Mann-Whitney U-test. **** p ≤ 0.0001.
Figure 4Analysis of domain–domain associations. (A) Bar graph representation of the 10 most enriched DD associations. (B) Network representation of the DD associations in the PHISTO dataset. Interactions highlighted in dark gray represent the enriched DD associations present in at least three interactions. Host proteins are identified by letters and bacterial proteins by numbers. More details on these proteins can be found in Table 1.
Figure 5Analysis of enriched motifs and domain–motif associations. (A) Bar plot representation of the 10 most enriched motifs and (B) domain-motif combinations. (C) Network representation of domain–motif associations. Domains are colored in blue and motifs in green.
InterPro identifiers and short descriptions of the host and pathogenic enriched domains, depicted in Figure 5B.
| Network Identifier (Host) | InterPro Identifier (Host) | Description | Network Identifier (Pathogen) | InterPro Identifier (Pathogen) | Description |
|---|---|---|---|---|---|
| A | IPR001715 | Calponin homology domain | 1 | IPR014016 | UvrD-like helicase, ATP-binding domain |
| 2 | IPR014017 | UvrD-like DNA helicase, C-terminal | |||
| B | IPR000504 | RNA recognition motif domain | 3 | IPR003343 | Bacterial Ig-like domain, group 2 |
| 4 | IPR032781 | ABC-transporter extension domain | |||
| 5 | IPR003344 | Big-1 domain | |||
| 6 | IPR002314 | Aminoacyl-tRNA synthetase, class II (G/P/S/T) | |||
| 7 | IPR018392 | LysM domain | |||
| C | IPR003961 | Fibronectin type III | 8 | IPR019931 | LPXTG cell wall anchor domain |
| D | IPR001245 | S-T/Y-protein kinase | 9 | IPR010918 | PurM-like, C-terminal domain |
| E | IPR000626 | Ubiquitin-like domain | 10 | IPR029487 | Novel E3 ligase domain |
| F | IPR001781 | Zinc finger, LIM-type | 11 | IPR006680 | Amidohydrolase-related |
| G | IPR001007 | VWFC domain | 12 | IPR001036 | Acriflavin resistance protein |
| 13 | IPR007642 | RNA polymerase Rpb2, domain 2 | |||
| H | IPR001680 | WD40 repeat | 14 | IPR004161 | Translation elongation factor EFTu-like, domain 2 |
| 15 | IPR005475 | Transketolase-like, pyrimidine-binding domain | |||
| 16 | IPR033248 | Transketolase, C-terminal | |||
| 17 | IPR005474 | Transketolase, N-terminal | |||
| 18 | IPR000795 | Translational (tr)-type GTP-binding domain | |||
| I | IPR001881 | EGF-like calcium-binding domain | 12 | IPR001036 | Acriflavin resistance protein |
| J | IPR000157 | Toll/interleukin-1 receptor homology (TIR) domain | 19 | IPR001029 | Flagellin, N-terminal domain |
| 20 | IPR002423 | Chaperonin Cpn60/GroEL/TCP-1 family | |||
| 21 | IPR001702 | Porin, Gram-negative type | |||
| K | IPR000488 | Death domain | 19 | IPR001029 | Flagellin, N-terminal domain |
Figure 6Post-translational modifications in enriched motifs. The sequences were inspected using MusiteDeep [36] and modifications were reported as the percentage of proteins containing a given modification for each motif.
Figure 7Domain–domain interactions predicted by Alphafold Multimer. (A) E9KL35-FusA; (B) RACK1-FusA; (C) UBA52-IpaH; (D) UBA52-sspH2; (E) RACK1-TktA.
Figure 8Domain–motif interactions predicted by Alphafold Multimer. (A) HADHA-mtaD; (B) ENKD1-tuf; (C) ABHD17A-yopM; (D) IGHG1-yopM.
Figure 9Analysis of complexes between IpaH and sspH2 with ubiquitin UBA52. (A) Structure of UBA52 and sspH2 showing the degree of sequence conservation. The conservation scale is displayed at the bottom of the figure. (B) Residue contact network between UBA52 and sspH2 showing several contact signatures. (C) Blast search results using the ubiquitin-protein ligase domain as the query sequence. Newly identified domains are highlighted by brown boxes. (D) Predicted structures of selected proteins using Alphafold [26].