| Literature DB >> 25193312 |
Gabriel E Rech1, José M Sanz-Martín1, Maria Anisimova2, Serenella A Sukno1, Michael R Thon3.
Abstract
Natural selection leaves imprints on DNA, offering the opportunity to identify functionally important regions of the genome. Identifying the genomic regions affected by natural selection within pathogens can aid in the pursuit of effective strategies to control diseases. In this study, we analyzed genome-wide patterns of selection acting on different classes of sequences in a worldwide sample of eight strains of the model plant-pathogenic fungus Colletotrichum graminicola. We found evidence of selective sweeps, balancing selection, and positive selection affecting both protein-coding and noncoding DNA of pathogenicity-related sequences. Genes encoding putative effector proteins and secondary metabolite biosynthetic enzymes show evidence of positive selection acting on the coding sequence, consistent with an Arms Race model of evolution. The 5' untranslated regions (UTRs) of genes coding for effector proteins and genes upregulated during infection show an excess of high-frequency polymorphisms likely the consequence of balancing selection and consistent with the Red Queen hypothesis of evolution acting on these putative regulatory sequences. Based on the findings of this work, we propose that even though adaptive substitutions on coding sequences are important for proteins that interact directly with the host, polymorphisms in the regulatory sequences may confer flexibility of gene expression in the virulence processes of this important plant pathogen.Entities:
Keywords: Colletotrichum graminicola; PAML; arms race hypothesis; pathogenicity; positive selection
Mesh:
Substances:
Year: 2014 PMID: 25193312 PMCID: PMC4202328 DOI: 10.1093/gbe/evu192
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
FDistribution of Tajima’s D values for each class of sequences. (A) Typical eukaryotic protein-coding gene and sequence classes analyzed in the present study. (B) Boxplots showing the distribution of Tajima’s D values in each class of sequences. Values inside each box correspond to the middle 50% of the data (between the 25th [Q1] and 75th [Q3] percentiles) and the red line within the box represents the median. The ends of the vertical dotted lines (whiskers) at the top and bottom of each box indicate the maximum and minimum limits to consider outliers according to the inter quartile range (IQR = Q3-Q1). Whiskers lengths were calculated as Q3+3*IQR (upper) and Q1−3*IQR (lower). Values outside the lines (red crosses) are extreme outliers. Red stars and values at the top of the boxplots indicate mean Tajima’s D value for each class of sequence.
Summary Statistics and Characteristics of Cg Isolates
| Isolate | Origin | Mapped Reads | %Used Reads | Read Depth | %Ns | SNPs | Genes |
|---|---|---|---|---|---|---|---|
| Missouri, USA | Reference | — | — | 9.21 | — | 12,006 | |
| Nigeria | 60,957,326 | 86.6 | 121× | 9.83 | 9,170 | 12,004 | |
| Zimbabwe | 58,434,572 | 69.8 | 120× | 14.43 | 160,983 | 11,920 | |
| Michigan, USA | 52,486,812 | 72.9 | 108× | 13.9 | 141,118 | 11,929 | |
| Brazil | 11,251,096 | 40.6 | 24× | 25.24 | 155,561 | 11,900 | |
| Alabama, USA | 46,081,744 | 89.7 | 93× | 14.72 | 82,206 | 11,968 | |
| Germany | 62,416,798 | 92.9 | 132× | 19.79 | 115,695 | 11,925 | |
| Nagano, Japan | 14,884,038 | 43.2 | 31× | 19.53 | 139,134 | 11,952 |
Note.—Origin, region where the isolates were collected; mapped reads, the total number of effectively mapped reads from each isolate to the Cg M.1001 genome; %used reads, percentage of the number of sequenced reads effectively used for the assembly; read depth, the average per-base depth for each genome, taking into account only the unambiguous sites; SNPs, number of single nucleotide polymorphisms identified as compared with the Cg M.1001 reference genome; %Ns, for M.1001, the percentage of ambiguously called bases in the reference genome (non A, T, C, or G). For the sequenced isolates, the percentage of the genome with less than three reads coverage and therefore where SNPs were not called. Genes, number of M.1001 genes present in each isolate, considering a gene as “present” if the sequence contains more than 50% length with unambiguous bases.
Summary Statistics for Coding and Noncoding Sequences
| Tajima’s | Positive Selection | ||||||
|---|---|---|---|---|---|---|---|
| Num. Seq. | Mean Tajima's | Num. Seq. | ω or ζ > 1 | ||||
| Synonymous | 11,860 | 872 | 331 (−1.31) | 541 (1.44) | 0.102 | ||
| Nonsynonymous | 11,860 | 812 | 309 (−1.31) | 503 (1.44) | 0.019 (**) | 11,995 | 224 |
| 3′-Downstream | 5,706 | 476 | 204 (−1.44) | 272 (1.60) | 0.063 (ns) | 5,693 | 668 |
| 3′-UTR | 9,652 | 537 | 221 (−1.31) | 316 (1.44) | 0.065 (*) | 9,648 | 613 |
| 5′-Upstream | 7,949 | 715 | 370 (−1.31) | 345 (1.60) | 0.080 (ns) | 7,944 | 728 |
| 5′-UTR | 10,733 | 611 | 329 (−1.05) | 282 (1.44) | 0.052 (**) | 10,724 | 456 |
| Introns | 8,893 | 741 | 388 (−1.05) | 353 (1.44) | 0.059 (*) | 8,742 | 457 |
Note.—Num. Seq., total number of sequences analyzed in each class. D*, number of sequences with Tajima’s D < 5th percentile or D > 95th percentile; D* < 0, number of sequences with Tajima’s D < 5th percentile; D* > 0, number of sequences with Tajima’s D > 95th percentile; mean Tajima’s D, symbols between brackets indicate significant differences based on a Wilcoxon rank-sum test versus synonymous (**P < 1e-10, *P < 1e-3, ns, not significant); Num. Seq. (PS), total number of sequences analyzed; ω or ζ > 1, sequences under PS in each class: Coding sequences were classified under PS when P < 0.05 at LRT (M0vsM3) and P < 0.05 at LRT (M1avsM2a) or LRT (M7vsM8). Noncoding sequences were classified under PS when any of the LRTs (NMvs2CM or NMvs3CM) showed a P < 0.05.
FEnrichment of putative nonneutrally evolving sequences in different functional gene categories related with pathogenicity. Table values represent the number of sequences for each class and gene category. Tests: D* (D < 5th percentile or D > 95th percentile), D* < 0 (D < 5th percentile), D* > 0 (D > 95th percentile), and PS (sequences under PS according to LRT tests). Background colors indicate significance of the Fisher’s exact test for enrichment after correction for multiple comparisons by the FDR. See supplementary table S2, Supplementary Material online for more details.