| Literature DB >> 23715053 |
Andrew C Doxey1, Brendan J McConkey.
Abstract
Molecular mimicry of host proteins is a common strategy adopted by bacterial pathogens to interfere with and exploit host processes. Despite the availability of pathogen genomes, few studies have attempted to predict virulence-associated mimicry relationships directly from genomic sequences. Here, we analyzed the proteomes of 62 pathogenic and 66 non-pathogenic bacterial species, and screened for the top pathogen-specific or pathogen-enriched sequence similarities to human proteins. The screen identified approximately 100 potential mimicry relationships including well-characterized examples among the top-scoring hits (e.g., RalF, internalin, yopH, and others), with about 1/3 of predicted relationships supported by existing literature. Examination of homology to virulence factors, statistically enriched functions, and comparison with literature indicated that the detected mimics target key host structures (e.g., extracellular matrix, ECM) and pathways (e.g., cell adhesion, lipid metabolism, and immune signaling). The top-scoring and most widespread mimicry pattern detected among pathogens consisted of elevated sequence similarities to ECM proteins including collagens and leucine-rich repeat proteins. Unexpectedly, analysis of the pathogen counterparts of these proteins revealed that they have evolved independently in different species of bacterial pathogens from separate repeat amplifications. Thus, our analysis provides evidence for two classes of mimics: complex proteins such as enzymes that have been acquired by eukaryote-to-pathogen horizontal transfer, and simpler repeat proteins that have independently evolved to mimic the host ECM. Ultimately, computational detection of pathogen-specific and pathogen-enriched similarities to host proteins provides insights into potentially novel mimicry-mediated virulence mechanisms of pathogenic bacteria.Entities:
Keywords: bacteria; collagen; extracellular matrix; leucine-rich repeats; mimicry; pathogenomics; pathogens; proteins; proteomes; virulence factors
Mesh:
Substances:
Year: 2013 PMID: 23715053 PMCID: PMC5359739 DOI: 10.4161/viru.25180
Source DB: PubMed Journal: Virulence ISSN: 2150-5594 Impact factor: 5.882

Figure 1. Computational pipeline for detection of molecular mimicry candidates in human pathogenic bacteria.
Table 1. Top 25 unique predictions of molecular mimicry relationships.
| No. | Pathogen protein | Human protein (NCBI gi number, gene name) | Description of human protein | No. human proteins in cluster | No. pathogen species in which mimic is found | Min. BLAST E-value |
|---|---|---|---|---|---|---|
| 1 | spr1403 | 29725624, COL23A1 | collagen α-1(XXIII) chain | 42 | 7 | 6.00E−25 |
| 2 | BA_3841 | 11386161, COL9A2 | collagen α-2(IX) chain precursor | 1 | 6 | 1.00E−11 |
| 3 | SpyM3_1561 | 122937309, LRRC4B | leucine-rich repeat-containing protein 4B precursor | 33 | 5 | 5.00E−15 |
| 4 | SpyM3_0738 | 115527062, COL6A2 | collagen α-2(VI) chain isoform 2C2 precursor | 3 | 5 | 2.00E−11 |
| 5 | lpl2569 | 116256356, COL4A4 | collagen α-4(IV) chain precursor | 5 | 5 | 2.00E−12 |
| 6 | VV1_2676 | 7656971, N4BP2L2 | NEDD4-binding protein 2-like 2 isoform 2 | 1 | 5 | 3.00E−12 |
| 7 | lpl2411 | 4505983, PPFIA1 | liprin-α-1 isoform b | 34 | 5 | 4.00E−10 |
| 8 | ML2177 | 31742508, UPP1 | uridine phosphorylase 1 | 4 | 3 | 4.00E−22 |
| 9 | CPE0622 | 119395714, FKTN | fukutin isoform a | 2 | 3 | 6.00E−16 |
| 10 | BA_2967 | 296434275, GPRASP2 | G-protein coupled receptor-associated sorting protein 2 | 5 | 3 | 3.00E−12 |
| 11 | nfa38270 | 22748883, TMEM68 | transmembrane protein 68 | 1 | 3 | 1.00E−15 |
| 12 | MT_0370 | 14149793, QRICH2 | glutamine-rich protein 2 | 1 | 3 | 4.00E−14 |
| 13 | CTC_02331 | 85386053, ATP6V0A4 | V-type proton ATPase 116 kDa subunit a isoform 4 | 7 | 3 | 1.00E−11 |
| 14 | ECH_0498 | 20270337, LEO1 | RNA polymerase-associated protein LEO1 | 1 | 3 | 3.00E−10 |
| 15 | nfa31870 | 21618331, CRAT | carnitine O-acetyltransferase precursor | 22 | 2 | 1.00E−53 |
| 16 | CBU_1158 | 117414150, TM7SF2 | delta(14)-sterol reductase | 5 | 2 | 2.00E−50 |
| 17 | lpl1919 | 51479145, ARFGEF1 | brefeldin A-inhibited guanine nucleotide-exchange protein 1 | 19 | 2 | 3.00E−34 |
| 18 | RP374 | 117606360, PSD3 | PH and SEC7 domain-containing protein 3 isoform a | 1 | 2 | 3.00E−15 |
| 19 | LMOf2365_0212 | 8922995, PLCXD1 | PI-PLC X domain-containing protein 1 | 2 | 2 | 8.00E−14 |
| 20 | BPSS0088 | 239747149, LOC100287429 | PREDICTED: zinc finger protein 84-like isoform 2 | 27 | 2 | 4.00E−14 |
| 21 | APH_0455 | 132626688, MDC1 | mediator of DNA damage checkpoint protein 1 | 1 | 2 | 1.00E−13 |
| 22 | SpyM3_0116 | 4502313, ATP6V0C | V-type proton ATPase 16 kDa proteolipid subunit | 1 | 2 | 4.00E−09 |
| 23 | TDE_0021 | 5174415, CEPT1 | choline/ethanolaminephosphotransferase 1 | 2 | 2 | 1.00E−12 |
| 24 | BCE_5203 | 310115369, LOC100294033 | PREDICTED: protein FAM115A-like isoform 2 | 4 | 2 | 4.00E−11 |
| 25 | TP_0671 | 50726996, CHPT1 | cholinephosphotransferase 1 | 1 | 2 | 3.00E−11 |
All were specific to human pathogenic bacteria and did not appear in the non-pathogen species data set.

Figure 2. Top BLAST matches for human proteins in pathogen vs. non-pathogen proteomes. Left: −log10E-values for top BLAST matches to human proteins in 62 human pathogens vs. 66 non-pathogens. Right: −log10E-values for top BLAST matches to human proteins with different host/pathogen definitions (6 plant pathogens vs. 16 non-pathogens). Values above ~60 are not shown. Collagens (top detected mimicry relationship) and ADP-ribosylating factors (positive control mimicry relationship) have pathogen-elevated E-value distributions.

Figure 3. Top pathogen vs. non-pathogen protein similarities to a selected set of predicted human mimicry targets. Predicted human mimicry targets were selected from the top 25 detected relationships (Table 1), and the top BLAST matches by bitscore (x-axis) in pathogen vs. non-pathogen proteomes (frequency on y-axis) have been plotted. In each case, it can be seen that a subset of pathogen proteomes encode putative mimics that exhibit much greater similarities to human proteins than similarities found in non-pathogen proteins. A selected portion of the alignment is shown for the top-scoring pathogen mimic in each case. See for additional details regarding pairwise alignments.
Table 2. Function enrichment analysis of predicted mimicry candidates
| Rank | Term | Count | Fold enrichment | Benjamini | Term ID, ontology | |
|---|---|---|---|---|---|---|
| 1 | Extracellular matrix | 54 | 12.52 | 4.14E−44 | 3.39E−42 | PANTHER_MF_ALL, MF00178 |
| 2 | collagen | 22 | 48.84 | 1.17E−31 | 3.11E−29 | GOTERM_CC_ALL, GO:0005581 |
| 3 | Extracellular matrix structural protein | 24 | 24.36 | 6.24E−26 | 2.56E−24 | PANTHER_MF_ALL, MF00179 |
| 4 | ARF guanyl-nucleotide exchange factor activity | 14 | 61.54 | 6.22E−22 | 2.11E−19 | GOTERM_MF_ALL, GO:0005086 |
| 5 | Extracellular matrix part | 23 | 15.81 | 3.51E−20 | 4.65E−18 | GOTERM_CC_ALL, GO:0044420 |
| 21 | O-acyltransferase activity | 9 | 17.24 | 3.69E−08 | 2.08E−06 | GOTERM_MF_ALL, GO:0008374 |
| 23 | cell adhesion | 29 | 3.57 | 6.77E−09 | 2.95E−06 | GOTERM_BP_ALL, GO:0007155 |
| 51 | Inflammation mediated by chemokine and cytokine signaling pathway | 14 | 2.68 | 1.06E−03 | 6.84E−03 | PANTHER_PATHWAY, P00031 |
| 58 | Lysosome | 8 | 4.98 | 9.28E−04 | 1.38E−02 | KEGG_PATHWAY, has |
| 64 | Toll-Like Receptor Pathway | 4 | 9.47 | 6.20E−03 | 5.15E−02 | BIOCARTA, h_tollPathway |
| 71 | Interaction with host | 5 | 11.20 | 9.70E−04 | 9.30E−02 | GOTERM_BP_ALL, GO:0051701 |
This table includes statistics for the top five enriched function terms (above), and additional selected enrichments mentioned in the text (below). Full results are available in .
Table 3. Predicted mimicry candidates in human pathogenic bacteria and potential roles in virulence
| Pred. # | Human protein/s | Pathogen protein/s (putative mimic/s) | Pathogen species | Virulence mechanism (known or proposed) |
|---|---|---|---|---|
| 1 | Collagen | spr1403 + 11 others | Diverse | Involvement in host-cell adherence/invasion |
| 3, 50, 60 | Leucine-rich repeat proteins | SpyM3_1561, lmo0801 (internalin), + 7 others; RC0370; RC0830 | Diverse | Internalin-related; adhesion and invasion of host epithelia |
| 6, 74, 79 | P-loop NTPases and ATPases | VC_1610 | Possible modulation of host external ATP pool and macrophage cell death | |
| 7 | Coiled-coil proteins | lpl2411 (lepB) + 5 others | Disruption of vesicular protein-trafficking | |
| 8 | Uridine phosphorylase | ML2177 | Use of host uridine for pyrimidine scavenging by | |
| 9 | Fukutin | LicD proteins | *Modification of cell-surface glyproteins or glycolipids; transfer of phosphorylcholine residues | |
| 11, 54 | Acyltransferases | MT_1971, nfa38270 | Possible involvement in lipid biosynthesis from host-derived precursors; possible use as energy store and/or virulence-related, immunomodulatory lipids | |
| 15 | Carnitine palmitoyl-transferase | MPN114 | Possible biosynthesis of or modification of host phosphatidylcholine | |
| 16 | Delta(14)-sterol reductase | CBU_1158 | Cholesterol metabolism for structural and/or signaling roles during parasitophorous vacuole formation | |
| 17, 18 | ADP ribosylating factor (ARF) guanine exchange factor | lpl1919 (RalF), RP374 (sec7) | RalF recruits ARF GTPases for | |
| 19 | Phosphatidyl-inositol-specific phospholipase C | LMOf2365_0212 | Escape from primary vacuole | |
| 21 | Mediator of DNA damage checkpoint protein 1 (MDC1) | APH_0455 | Possible disruption of host cell apoptotic signaling in neutrophils | |
| 23, 25 | Choline/ethanolaminephosphotransferase | TDE_0021, TP_0671 | Possible production of phosphatidylcholine; host-phospholipid mimicry | |
| 24 | Predicted FAM115A-like proteins | BCE_5203 + 2 others (enhancins) | Host–protein (e.g., mucin) proteolysis | |
| 30 | Ankyrin | TP_0835 + 1 other | Possible interaction with host cytoskeleton as previously hypothesized | |
| 31 | Tyrosine-protein phosphatase non-receptor type 20 | YPCD1.67c (yopH) + 3 others | Disruption of host macrophage signal transduction pathway | |
| 38 | Paraoxonase | LA0399 | Possible loss of hemostasis through hydrolysis of a platelet-activating factor? | |
| 39 | Periaxin, apoB (leucine-proline rich repeats) | MT_1796, Rv1753c (PPE family proteins) | Targeting of DRP2-dystroglycan complex; possible interaction with host lipids | |
| 43 | Ectonucleoside triphosphate diphospho-hydrolase (CD39/NTPDase) | lpl1869 | Replication in host macrophages; manipulation of host pathways regulated by CD39; Modulation of host NTPs and NDPs | |
| 53 | Zonadhesin | BA_0871 | Acts as an adhesin that mediates cell attachment to collagen | |
| 62 | Glucoside xylosyltransferase 1 isoform 1 | RP476 | Possible involvement in biosynthesis of lipopolysaccharide | |
| 64 | Serine protease | VC_1200 | May process cholera toxin A (CT A) in the host | |
| 68 | Fucosyltransferase | HP0379 + 2 others | Production of lewis x trisaccharide; mimicry of host cell-surface sugars; immune evasion | |
| 69 | Phospholipase A2 | BA_3805 | Possible role in invasion/entry of host cells, escape from phagosome, and cell lysis | |
| 70 | PCTP like protein | VV2_0046, VVA0553 | Possible modulation of host lipid (e.g., cholesterol) metabolism | |
| 76 | Adenylate kinase | CPE2384 + 7 others | Diverse | An adenylate kinase toxin from |

Figure 4. Independent evolution of ECM mimics from separate repeat amplifications. High-scoring collagen-like (A) and leucine-rich repeat (B) protein mimics were selected and divided into their constituent protein repeats, which were aligned and used to generate sequence logos. Differences between the sequence logos of each repetitive protein suggest evolution from separate progenitor peptides and repeat amplifications. (C) An example demonstrating similarity of leucine-rich repeat sequence conservation patterns between a human NOD-like receptor (NLRC3) and a predicted mimicry candidate (lpl1579) from Legionella pneumophila. The detected level of sequence similarity between these two proteins is far above that observed in non-pathogens (blue) and other pathogens (red) as indicated by the BLAST bitscore distribution (left panel).

Figure 5. Phylogenetic trees of bacterial pathogen encoded collagen-like repeats (left) and leucine-rich repeats (right) from Figure 4. The repeats are colored in the tree according to their parent protein. Top-aligning repeats from human proteins have also been included and are colored light green. Repeats cluster predominantly by protein of origin, suggesting that different pathogen repeat proteins have evolved by independent repeat amplifications. Interestingly, the pathogen repeat classes generally cluster with a specific human repeat, suggesting that ancestral progenitor repeats may be host-derived.