| Literature DB >> 33033372 |
Chrispin Chaguza1,2, Marie Yang3, Jennifer E Cornick3,4, Mignon du Plessis5,6, Rebecca A Gladstone7, Brenda A Kwambana-Adams8,9, Stephanie W Lo7, Chinelo Ebruke9, Gerry Tonkin-Hill7, Chikondi Peno4,10, Madikay Senghore9,11, Stephen K Obaro12,13, Sani Ousmane14, Gerd Pluschke15, Jean-Marc Collard14, Betuel Sigaùque16, Neil French3, Keith P Klugman17, Robert S Heyderman4,8, Lesley McGee18, Martin Antonio9,19, Robert F Breiman20, Anne von Gottberg5,6, Dean B Everett4,10, Aras Kadioglu3, Stephen D Bentley21,22.
Abstract
Hyper-virulent Streptococcus pneumoniae serotype 1 strains are endemic in Sub-Saharan Africa and frequently cause lethal meningitis outbreaks. It remains unknown whether genetic variation in serotype 1 strains modulates tropism into cerebrospinal fluid to cause central nervous system (CNS) infections, particularly meningitis. Here, we address this question through a large-scale linear mixed model genome-wide association study of 909 African pneumococcal serotype 1 isolates collected from CNS and non-CNS human samples. By controlling for host age, geography, and strain population structure, we identify genome-wide statistically significant genotype-phenotype associations in surface-exposed choline-binding (P = 5.00 × 10-08) and helicase proteins (P = 1.32 × 10-06) important for invasion, immune evasion and pneumococcal tropism to CNS. The small effect sizes and negligible heritability indicated that causation of CNS infection requires multiple genetic and other factors reflecting a complex and polygenic aetiology. Our findings suggest that certain pathogen genetic variation modulate pneumococcal survival and tropism to CNS tissue, and therefore, virulence for meningitis.Entities:
Mesh:
Year: 2020 PMID: 33033372 PMCID: PMC7545184 DOI: 10.1038/s42003-020-01290-9
Source DB: PubMed Journal: Commun Biol ISSN: 2399-3642
Fig. 1Characteristics of the African S. pneumoniae serotype 1 isolates.
a Study design of the pathogen genome-wide association study (GWAS) showing the number of the central nervous system (CNS) and non-CNS isolates and three types of genetic variation namely single-nucleotide polymorphisms (SNPs), unitigs and accessory clusters of orthologous genes (COGs) used for the analysis. b Histogram showing age distribution of patients whose CNS and non-CNS isolates were sampled. The two histograms are coloured by isolation source whereby darker shades indicate an overlap. c Country of origin of the isolates, their frequency and proportion of the CNS and non-CNS isolates at each location are shown as pie charts. The size of the pie charts is proportional to the number of isolates from each country as shown by the scale represented by the concentric circles at the bottom left of the diagram. The country names are designated by their international two letter codes as follows: South Africa (ZA), Malawi (MW), The Gambia (GM), Ghana (GH), Niger (NE), Nigeria (NG), Togo (TG), Benin (BJ), Côte d’Ivoire or Ivory Coast (CI) and Senegal (SN). All the metadata associated with the isolates in the phylogeny are provided in the appendix (Supplementary Table 1) while data shown in the figure is shown in Supplementary Data 2.
Fig. 2Whole-genome phylogenetic tree showing genetic similarity of the 909 African S. pneumoniae serotype 1 isolates.
A mid-point rooted whole-genome phylogenetic tree depicting the genetic relatedness of the isolates after filtering out genomic regions with recombination. The coloured strips at the tips of the tree indicates isolate metadata: ST, isolation source and country. The country names are designated by their international two letter codes as follows: South Africa (ZA), Malawi (MW), The Gambia (GM), Ghana (GH), Niger (NE), Nigeria (NG), Togo (TG), Benin (BJ), Côte d’Ivoire or Ivory Coast (CI) and Senegal (SN). All the metadata associated with the isolates in the phylogeny are provided in the appendix (Supplementary Table 1) while data shown in the figure is shown in Supplementary Data 2.
Fig. 3Genetic diversity of the African S. pneumoniae serotype 1 isolates.
Boxplots overlaid with dot plots showing the distribution of a the average number of SNPs between isolates in each clade, b geographical distance between pair of isolates in each clade, c the Simpson diversity index values for the composition of the isolates in each clade by country of origin and d Simpson diversity index values for the composition of the isolates in each clade by sequence type (ST). e Scatter plot showing the relationship between the number of SNPs and geographical distance (in kilometres [Km]) between pair of isolates. Both axes are shown in logarithmic scale (base 10) for clarity. The points coloured in blue in panel e represent isolates from the same country while those coloured in red represent isolates from different countries. The density of the points on each axis of the graph are represented by the dashed lines at the top and far right of the scatter plot. Further breakdown of the plot e is provided in the appendix (Supplementary Table 1) while data shown in the figure is shown in Supplementary Data 2.
Fig. 4SNPs, unitigs and accessory genes associated with CNS infection.
a show statistical significance (–-log10P) for the unitigs (unordered) from the pathogen genome-wide association study (GWAS) analysis using FaST-LMM and GEMMA. b show the statistical significance of the accessory clusters of orthologous genes (COGs) using the same methods. c show the chromosomal locations and statistical significance of the single-nucleotide polymorphisms (SNPs). The green and red lines designate the genome-wide significant and suggestive P-value thresholds as discussed in the ‘methods’ section while the variants highlighted in yellow were identified by both FaST-LMM and GEMMA methods. All the variants had minor allele frequency (MAF) < 1% and missingness >5%. The data shown in the figure is shown in Supplementary Data 2.
Fig. 5Phylogenetic distribution of the genome-wide significant and suggestive unitigs.
a Schematic representation of the serotype 1 genome showing location of the capsule biosynthetic locus and the identified genome-wide significant unitigs in the P1031 serotype 1 reference genome (GenBank accession: CP000920) with coding sequences reannotated with common gene names. b, c Circular phylogenetic trees showing the distribution of the two genome-wide significant unitigs ID 8805 and 47853. d, e Frequency of the unitigs in CNS and non-CNS isolates in each phylogenetic clade defined in Fig. 2. The full list of the genome-wide significant unitigs represented by the patterns shown in this figure is provided in the appendix (Supplementary Table 1) while data shown in the figure is shown in Supplementary Data 2.
Summary of the genome-wide significant and suggestive unitigs associated with CNS infection identified by the GWAS analysis.
| Unitig | Gene | Genome accession | Locus tag | Risk allele | MAF | Odds ratio | Gene description | |
|---|---|---|---|---|---|---|---|---|
| 8805a | CP000936.1 | SPH_2388 | Absence | 0.01 | 0.70 | 5.0 × 10−08 | Surface protein PspC | |
| 47853a | NC_014498.1 | SP670_1521 | Absence | 0.07 | 0.71 | 1.3 × 10−06 | DnaQ exonuclease/DinG helicase family | |
| 41314 | NC_011072.1 | SPG_1486 | Absence | 0.03 | 1.18 | 1.2 × 10−05 | Phosphoglucosamine mutase | |
| 72152 | AP018043.1 | Intergenic region | Absence | 0.02 | 1.24 | 3.5 × 10−05 | Intergenic region | |
| 80564 | CP000920.1 | SPP_1579 | Presence | 0.03 | 1.19 | 3.0 × 10−05 | tRNA nucleotidyltransferase | |
| 81567 | AP019192.2 | ASP0581_08080 | Presence | 0.01 | 0.76 | 3.9 × 10−05 | cysteine desulfurase | |
| 90414 | NC_017592.1 | SPNOXC_13690 | Presence | 0.03 | 1.18 | 1.2 × 10−05 | Putative phosphoglucosamine mutase | |
| 102497 | AKBW01000001.1 | Intergenic region | Absence | 0.07 | 1.14 | 9.1 × 10−06 | Intergenic region | |
| 102498 | AP017971.1 | KK0981_35330 | Presence | 0.07 | 1.14 | 1.1 × 10−05 | Cytoplasmic protein | |
| 106507 | AP017971.1 | KK0981_35330 | Presence | 0.07 | 1.13 | 2.4 × 10−05 | Cytoplasmic protein | |
| 108518 | AP017971.1 | KK0981_35330 | Presence | 0.07 | 1.13 | 2.9 × 10−05 | Cytoplasmic protein | |
| 110000 | NC_014498.1 | SP670_1747 | Presence | 0.01 | 0.75 | 1.6 × 10−05 | Hypothetical protein | |
| 47853 | NC_014498.1 | SP670_1521 | Presence | 0.07 | 0.71 | 1.3 × 10−06 | DnaQ exonuclease/DinG helicase family | |
| 47851 | AP018043.1 | KK0381_02650 | Absence | 0.07 | 0.70 | 3.0 × 10−06 | Bifunctional ATP-dependent DNA helicase | |
| 108605 | NC_014498.1 | SP670_1521 | Presence | 0.07 | 0.70 | 3.0 × 10−06 | DnaQ exonuclease/DinG helicase family | |
| 45790 | AP019192.2 | ASP0581_14110 | Presence | 0.47 | 0.78 | 9.4 × 10−06 | N-acetyltransferase | |
| 70431 | AP019192.2 | ASP0581_14110 | Presence | 0.47 | 0.76 | 1.4 × 10−05 | N-acetyltransferase | |
| 102497 | AKBW01000001.1 | NA | Absence | 0.07 | 1.14 | 2.1 × 10−05 | Intergenic region |
The risk genotype refers to the minor allele used as the effect/non-reference allele.
The likelihood ratio P-values shown were estimated by the method, which detected the variant as genome-wide significant or suggestive. When both GEMMA and FaST-LMM identified variants as statistically significant, P-values from GEMMA were used.
MAF minor allele frequency.
aGenome-wide significant unitigs.
Summary of the suggestive SNPs identified by the GWAS analysis.
| SNP | Genomic region | Allele (risk/safe) | MAF | Odds ratio | Gene | Gene description | |
|---|---|---|---|---|---|---|---|
| rs266945 | Genic | A/G | 0.097 | 1.29 | 4.67 × 10−05 | Glutamine-fructose-6-phosphate transaminase (isomerising) | |
| rs721084 | Genic | A/G | 0.042 | 1.19 | 1.05 ×10−04 | Membrane-fusion protein | |
| rs1466695a | Genic | A/G | 0.015 | 1.22 | 3.41 × 10−04 | Phosphoglucosamine mutase | |
| rs1474820a | Genic | G/A | 0.036 | 1.14 | 4.08 × 10−04 | ATP-dependent Clp protease, ATP-binding subunit ClpX |
The SNPs were identified and annotated using the S. pneumoniae serotype 1 reference genome strain P1031 (GenBank accession: CP000920).
The risk genotype refers to the minor allele used as the effect/risk/non-reference allele while the safe allele refers to the reference allele/genotype in the GWAS analysis.
MAF minor allele frequency.
aSNP identified by FaST-LMM.