| Literature DB >> 29893833 |
Luis Delaye1, Susana Ruiz-Ruiz2, Enrique Calderon3,4, Sonia Tarazona5,6, Ana Conesa5,7, Andrés Moya2,8.
Abstract
Pneumocystis species are ascomycete fungi adapted to live inside the lungs of mammals. These ascomycetes show extensive stenoxenism, meaning that each species of Pneumocystis infects a single species of host. Here, we study the effect exerted by natural selection on gene evolution in the genomes of three Pneumocystis species. We show that genes involved in host interaction evolve under positive selection. In the first place, we found strong evidence of episodic diversifying selection in Major surface glycoproteins (Msg). These proteins are located on the surface of Pneumocystis and are used for host attachment and probably for immune system evasion. Consistent with their function as antigens, most sites under diversifying selection in Msg code for residues with large relative surface accessibility areas. We also found evidence of positive selection in part of the cell machinery used to export Msg to the cell surface. Specifically, we found that genes participating in glycosylphosphatidylinositol (GPI) biosynthesis show an increased rate of nonsynonymous substitutions (dN) versus synonymous substitutions (dS). GPI is a molecule synthesized in the endoplasmic reticulum that is used to anchor proteins to membranes. We interpret the aforementioned findings as evidence of selective pressure exerted by the host immune system on Pneumocystis species, shaping the evolution of Msg and several proteins involved in GPI biosynthesis. We suggest that genome evolution in Pneumocystis is well described by the Red-Queen hypothesis whereby genes relevant for biotic interactions show accelerated rates of evolution.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29893833 PMCID: PMC6012782 DOI: 10.1093/gbe/evy116
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
Selection Analysis of Gene Families in Pneumocystis Species
| Species | Protein Coding Genes | Single Copy Gene Families | H0 Gene Families | H1 Gene Families | H0 Gene Families Sharing GO Terms | H1 Gene Families Sharing GO Terms |
|---|---|---|---|---|---|---|
| 3,646 | 2,967 | |||||
| 3,623 | 2,781 | 186 | 2,467 | 171 | ||
| 3,761 |
Note.—Of the 2,967 single copy gene families (orthologs), 2,781 did not reject the null hypothesis of a single omega rate and 186 did reject the hypothesis. From these gene families (2,781 + 186), we selected for gene set enrichment analysis (GSEA), those endowed with genes sharing GO terms. These correspond to the last two columns with 2,467 + 171 gene families. At the same time, only within family shared GO terms were used for GSEA.
H0, those families not rejecting the null hypothesis of a single omega rate of evolution; H1, those rejecting the null hypothesis (Q < 0.05).
. 1.—GO categories enriched in gene families showing high or low omega (dN/dS) values for Pneumocystis jirovecii. The red line indicates the median (dN/dS) for all genes. GO categories are ordered from small to large NES (normalized enrichment scores) within each GO type (biological process, cellular component, and molecular function). Each box denotes the median and the 25% and 75% percentiles.
Number of GO Terms Significantly Enriched (Q < 0.05) in Genes Showing Distinctive Omega Rates
| Species | Number of GO Terms Used for GSEA | Number of GO Terms with (−) NES | Number of GO Terms with (+) NES |
|---|---|---|---|
| 50 | 1 | ||
| 227 | 52 | 6 | |
| 42 | 4 |
Note.—GO terms showing negative NES correspond to those enriched in genes showing lower omega rates; and GO terms showing positive NES correspond to those enriched in genes showing higher omega rates.
NES, normalized enrichment score.
Number of GO terms associated to 2,638 gene families (see text for details).
. 2.—Most genes in the biosynthesis of GPI show elevated omega rates. In red, we show those proteins having higher omega rates. These proteins contribute to the statistical significance of the GO: 0006506 term. In blue, we show those proteins not having higher rates of evolution. Hatched proteins were not included in GSEA. Gray proteins were not identified in Pneumocystis. Protein complexes are shown within dashed boxes.
. 3.—Episodic selection on Msg protein coding genes from Pneumocystis jirovecii. (A) Left tree shows variation in dN/dS. The length of the branches as well as the color is proportional to dN/dS. Right tree shows branch length proportional to genetic distance as estimated by Maximum-Likelihood (GTR + G+I model). Both trees have identical topologies. (B) Genetic distance versus dN/dS. While most branches evolve by negative selection (log10(dN/dS) < 0) a few branches evolve by strong positive selection (log10(dN/dS) > 1).
Selection on Exposed Residues
| Family | N of Seq | A | B | C | D | Odds | |
|---|---|---|---|---|---|---|---|
| Msg A1 | 78 | 83 | 11 | 337 | 195 | 4.36 | 3.6e-07 |
| Msg A1 (sample) | 11 | 51 | 13 | 471 | 224 | 1.86 | 0.031 |
| Msg A3(II) | 14 | 22 | 5 | 437 | 207 | 2.08 | 0.097 |
| Msg A3(III) | 9 | 83 | 11 | 337 | 195 | 2.45 | 0.011 |
| Msg B | 11 | 28 | 9 | 233 | 121 | 1.61 | 0.151 |
| Msg D | 17 | 53 | 7 | 402 | 194 | 3.65 | 3.0e-04 |
| Msg E | 5 | 10 | 3 | 286 | 66 | 0.77 | 0.78 |
Note.—Association between codons under positive selection and residue surface accessibility to solvent among msg gene families. For categories A, B, D, and D, we included only free-gap sites. The P value is calculated with a one-sided Fisher’s exact test.
Codons under positive selection coding for residues showing surface accessibility >0.2 (i.e., exposed).
Codons under positive selection coding for residues showing surface accessibility under 0.2 (i.e., buried).
Codons not under positive selection coding for residues showing surface accessibility >0.2 (i.e., exposed).
Codons not under positive selection coding for residues showing surface accessibility <0.2 (i.e., buried).
. 4.—msg protein coding genes show a distinctive high G + C pattern in Pneumocystis jirovecii. Each gene from P. jirovecii was divided in 50 segments; and the G + C content of each segment was measured. In gray, we show the G + C content of 10 random samples of genes from P. jirovecii. Each random sample consists of 100 genes. The number of msg genes from families A1, A3(II), A3(III), B, D, and E is: 78, 9, 14, 11, 17, and 5, respectively.
Recombination Events on msg Gene Families
| Family | Family | Recombination Events | |
|---|---|---|---|
| Msg A1 | I | 78 | 9 |
| Msg A1 (sample) | I | 11 | 4 |
| Msg A3(II) | II | 14 | 4 |
| Msg A3(III) | III | 9 | 3 |
| Msg B | IV | 11 | 0 |
| Msg D | V | 17 | 2 |
| Msg E | VI | 5 | 0 |
Note.—We show the result of applying GARD (a genetic algorithm for recombination detection) from HyPhy 2.220 package (Kosakovsky et al. 2006) on msg gene families. Only recombination events showing a P value < 0.01 are reported.
Nomenclature by Ma et al. (2016).
Nomenclature by Schmid-Siegert et al. (2017).