| Literature DB >> 27064593 |
Arnau Domenech1, Javier Moreno1, Carmen Ardanuy1, Josefina Liñares1, Adela G de la Campa2, Antonio J Martin-Galiano3.
Abstract
The diverse pneumococcal diseases are associated with different pneumococcal lineages, or clonal complexes. Nevertheless, intra-clonal genomic variability, which influences pathogenicity, has been reported for surface virulence factors. These factors constitute the communication interface between the pathogen and its host and their corresponding genes are subjected to strong selective pressures affecting functionality and immunogenicity. First, the presence and allelic dispersion of 97 outer protein families were screened in 19 complete pneumococcal genomes. Seventeen families were deemed variable and were then examined in 216 draft genomes. This procedure allowed the generation of binary vectors with 17 positions and the classification of strains into surfotypes. They represent the outer protein subsets with the highest inter-strain discriminative power. A total of 116 non-redundant surfotypes were identified. Those sharing a critical number of common protein features were hierarchically clustered into 18 surfogroups. Most clonal complexes with comparable epidemiological characteristics belonged to the same or similar surfogroups. However, the very large CC156 clonal complex was dispersed over several surfogroups. In order to establish a relationship between surfogroup and pathogenicity, the surfotypes of 95 clinical isolates with different serogroup/serotype combinations were analyzed. We found a significant correlation between surfogroup and type of pathogenic behavior (primary invasive, opportunistic invasive, and non-invasive). We conclude that the virulent behavior of S. pneumoniae is related to the activity of collections of, rather than individual, surface virulence factors. Since surfotypes evolve faster than MLSTs and directly reflect virulence potential, this novel typing protocol is appropriate for the identification of emerging clones.Entities:
Keywords: diagnosis; emergent clones; genomics; surface proteins; virulence factors
Year: 2016 PMID: 27064593 PMCID: PMC4815138 DOI: 10.3389/fmicb.2016.00420
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
Figure 1Selection of surface proteins showing inter-strain variability. (A) Procedure flowchart used to detect variable proteins. (B) Occurrence distribution of protein families according to surface anchor. (C) Identity and alignment length averages of families with an occurrence>80% in the reference genomes. The dashed lines split protein families not selected as a consequence of low variability.
List of selected surface proteins.
| CbpF (CBP) | SP0391 | Choline binding protein F | [CB]6 |
| CbpG (CBP) | SP0390 | Choline binding protein G | Tripsin-[CB]3 |
| CbpI (CBP) | SP0069 | Choline binding protein I | [CB]6 |
| CbpJ (CBP) | SP0378 | Choline binding protein J | [CB]8 |
| CbpL(CBP) | SP0667 | Choline binding protein L | Excalibur-[CB]7-Lipoprotein_Ltp |
| DiiA (GPA) | SP1992 | Dimorphic invasion-involved protein A | [B02864]1-2-DUF1542-LPxTG |
| NanC (NC) | SP1326 | Neuraminidase C | Sialidase-BNR-BNR_2 |
| NanE (LPP) | SP1330 | N-acetylmannosamine-6-P epimerase | NanE |
| PclA (GPA) | NF | Collagen-like surface-anchored protein | YSIRK-G5-[Collagen]6 |
| PhtA (NC) | SP1175 | Pneumococcal histidine triad protein A | [Strep_His_triad]2-B01076-Strep_His_triad |
| PspC2 (GPA) | NF | Pneumococcal surface protein C | < YSIRK>-RICH- < B16622>- < B9758}>- < [B503]1-2>-LPxTG |
| PsrP (GPA) | SP1772 | Pneumococcal serine-rich | [B214]10-LPxTG |
| RrgB (GPA) | SP0463 | Ancillary pilus subunit B | Cna_B-LPxTG |
| SP_1796(LPP) | SP1796 | Unknown substrate ABC transporter | SBP_bac_1 |
| SrtD (LPP) | SP0468 | Sortase D | Sortase |
| ZmpC (GPA) | SP0071 | Zinc metalloproteinase | YSIRK-B134-B5460-B1438-G5-Peptidase_M26_N-B1656 Peptidase_M26_C |
| ZmpD (GPA) | NF | Zinc metalloproteinase | B5200-LPxTG-G5 |
Protein class. CBP, Choline-binding protein; GPA, Gram-positive anchor (LPxTG motif-containing) protein; LPP, lipoprotein; NC, Non-classical surface protein.
NF, not found in TIGR4 strain.
PfamA and PfamB (those starting by “B”) domains are in sequential order. Accessory domains are in angle brackets. Repeated motifs are in square brackets together with the observed number of repeats. CB, Choline-binding motif. LPxTG, Gram-positive anchor containing the “LPxTG” sortase motif. Pfam domains with the lowest E-values were prioritized. PfamB domains (Eval < 0.01) were also considered only if overlapped < 50% in length with more significant domains.
Figure 2Hierarchical clustering of surfotypes and correlation to clinical behavior. Surfogroup signature cells: dark gray (presence/full feature match homogeneity >60%); white (absence/truncated feature match homogeneity >60%); dashed (match homogeneity < 60%). The surfogroup clades are labeled with the most abundant clonal complex together with pathogenic tendency: primary invasive (red circles), opportunistic invasive (blue squares), and non-invasive (yellow triangles). Minority clonal complexes are listed in smaller font size below. Specific protein families responsible from branching (>80% surfotypes in a branch, < 20% surfotypes in the other) are labeled in the tree.
Figure 3Correlation between surfogroup and clinical isolates. (A) Each bubble represents a unique SG-ST combination. Bubble size (see pattern in the inset): number of clinical isolates. Bubbles are colored according to type of pathogenicity after surfogroup prediction according to Figure 2. (B) Measures of the classification performance.
Figure 4Methodological scheme for surfotype and surfogroup assignment of test isolates. Raw data derived from either sequencing or PCR is processed into a 17-mer Boolean vector (presence-full or absence-truncation). Assignment of surfotypes to the surfogroups showed in Figure 2 can be done through feature-by-feature comparison against the surfogroup signatures.
Essential differences between MLST and Surfotyping.
| Number of genes | Fixed (seven) | Variable (species-specific) |
| Type of variability | SNPs | Presence/absence, distant allelic variants (many residue changes), large insertions/deletions, mosaicism |
| Gene evolution rate | Slow | Fast |
| Protein location | Cytoplasm | Cell wall |
| Protein role | Housekeeping | Virulence |
| Correlation to pathogenity | Indirect association | Direct causality |
| Protein structural nature | Globular | Disordered regions, tandem repeats, anchor modules |
| Gene phyletic dispersion | Universal | Species-specific |