| Literature DB >> 21350633 |
Abstract
We characterized ectopic gene conversions in the genome of ten hemiascomycete yeast species. Of the ten species, three diverged prior to the whole genome duplication (WGD) event present in the yeast lineage and seven diverged after it. We analyzed gene conversions from three separate datasets: paralogs from the three pre-WGD species, paralogs from the seven post-WGD species, and common ohnologs from the seven post-WGD species. Gene conversions have similar lengths and frequency and occur between sequences having similar degrees of divergence, in paralogs from pre- and post-WGD species. However, the sequences of ohnologs are both more divergent and less frequently converted than those of paralogs. This likely reflects the fact that ohnologs are more often found on different chromosomes and are evolving under stronger selective pressures than paralogs. Our results also show that ectopic gene conversions tend to occur more frequently between closely linked genes. They also suggest that the mechanisms responsible for the loss of introns in S. cerevisiae are probably also involved in the gene 3'-end gene conversion bias observed between the paralogs of this species.Entities:
Year: 2010 PMID: 21350633 PMCID: PMC3042673 DOI: 10.4061/2011/970768
Source DB: PubMed Journal: Int J Evol Biol ISSN: 2090-052X
Figure 1Schematic representation of ohnologs and paralogs. Genes A and A′ represent ohnologs created by a genome duplication. These genes are therefore located on different chromosomes. Genes B and C represent paralogs created by tandem duplications of gene A. These genes are therefore on the same chromosome as gene A.
Number of ohnologs and paralogs in the pre- and post-WGD genomes.
| Genome | Number of ohnolog families | Number of paralog families |
|---|---|---|
| Post-WGD | ||
| 551 (2) | 30 (3–40) | |
| 436 (2) | 80 (3–68) | |
| 412 (2) | 86 (3–37) | |
| 226 (2) | 13 (3–20) | |
| 462 (2) | 75 (3–23) | |
| 300 (2) | 16 (3–7) | |
| 398 (2) | 17 (3–10) | |
| Pre-WGD | ||
| n.a. | 15 (3–9) | |
| n.a. | 43 (3–9) | |
| n.a. | 60 (3–26) | |
Notes. The range of multigene family sizes is provided in brackets. n.a.: not applicable.
Figure 2The distribution of the average number of paralog gene families (mean ± S.D.) within the seven postduplication genomes and three preduplication genomes is shown. Five outlier families including two families of size 63 and 68 from S. paradoxus, two families of size 32 and 40 from S. cerevisiae, and a single family of 38 genes from S. mikatae are not shown in the figure to improve the visual clarity of the data.
Percentage of gene comparisons between multigene family members located on the same chromosome.
| Genome | Ohnologs | Paralogs |
|---|---|---|
| Post-WGD | ||
| 4.0% (22/551) | 8.4% (163/1930) | |
| 5.0% (15/300) | 38.6% (29/75) | |
| Pre-WGD | ||
| n.a. | 21% (26/124) | |
| n.a. | 31% (86/270) | |
| n.a. | 18% (158/884) | |
Notes. The ratios in brackets are the number of gene comparisons between genes found on the same chromosome divided by the total number of gene comparisons. n.a.: not applicable.
Intra- and interchromosomal gene conversion frequencies for pre- and post-WGD genomes.
| Genome | Ohnologs | Paralogs | ||
|---|---|---|---|---|
| Post-WGD | Intrachromosomal frequency | Interchromosomal frequency | Intrachromosomal frequency | Interchromosomal frequency |
| 9.1% (2/22) | 2.1% (11/529) | 9.2% (15/163) | 5.4% (95/1767) | |
| 0% (0/15) | 0.007% (2/285) | 24.1% (7/29) | 2.2% (1/46) | |
| Pre-WGD | ||||
| n.a. | n.a. | 11.5% (3/26) | 12.2% (12/98) | |
| n.a. | n.a. | 36% (31/86) | 11.4% (21/184) | |
| n.a. | n.a. | 1.9% (3/158) | 2.9% (21/726) | |
Notes. Values in brackets indicate the ratio of the number of gene conversions divided by the number of gene comparisons. Data for S. paradoxus, S. mikatae, S. kudriavzevii, S. bayanus, and S. castellii are not provided because position data was not available for the genes of these genomes. n.a.: not applicable.
The number and frequency of gene conversions in ohnologs and paralogs.
| Genomes | Ohnologs | Paralogs | ||||
|---|---|---|---|---|---|---|
| Number | Frequency (%) with respect to total number of comparisons | Frequency (%) with respect to total number of multigene family members | Number | Frequency (%) with respect to total number of comparisons | Frequency (%) with respect to total number of multigene family members | |
| Post-WGD | ||||||
| 13 | 2.35 | 1.17 | 110 | 5.71 | 51.40 | |
| 7 | 1.60 | 0.80 | 44 | 1.54 | 9.20 | |
| 6 | 1.45 | 0.73 | 26 | 1.50 | 4.80 | |
| 2 | 0.88 | 0.44 | 20 | 7.96 | 29.80 | |
| 14 | 3.03 | 1.51 | 50 | 3.60 | 12.40 | |
| 2 | 0.67 | 0.33 | 8 | 10.67 | 14.80 | |
| 2 | 0.50 | 0.25 | 8 | 5.06 | 10.80 | |
| Pre-WGD | ||||||
| n.a. | n.a. | n.a. | 15 | 12.09 | 23.80 | |
| n.a. | n.a. | n.a. | 52 | 19.25 | 31.70 | |
| n.a. | n.a. | n.a. | 24 | 2.71 | 8.20 | |
Notes. n.a.: not applicable.
Gene conversion lengths of pre- and post-WGD species.
| Genome | Ohnologs (bp) | Paralogs (bp) | Wilcoxon test | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Median | 1st quartile | 3rd quartile | Min | Max | Median | 1st quartile | 3rd quartile | Min | Max | ||
| Post-WGD | |||||||||||
| 272 | 107 | 465 | 60 | 773 | 382.5 | 141 | 869 | 8 | 2642 | 0.22 | |
| 235 | 98 | 354 | 50 | 531 | 106 | 51.5 | 232 | 14 | 1060 | 0.17 | |
| 165.5 | 95 | 431 | 68 | 568 | 167 | 83 | 366 | 14 | 535 | 0.64 | |
| 270.5 | 146 | 395 | 146 | 395 | 136 | 85 | 172 | 25 | 391 | 0.19 | |
| 149.5 | 71 | 315 | 45 | 905 | 126 | 76 | 203 | 21 | 724 | 0.50 | |
| 83.5 | 27 | 140 | 27 | 140 | 130 | 83.5 | 386 | 59 | 668 | 0.36 | |
| 144 | 118 | 170 | 118 | 170 | 226 | 73.5 | 581.5 | 44 | 862 | 0.69 | |
| Pre-WGD | |||||||||||
| n.a. | n.a. | n.a. | n.a. | n.a. | 99 | 40 | 236 | 32 | 1127 | n.a. | |
| n.a. | n.a. | n.a. | n.a. | n.a. | 183 | 104.5 | 310.5 | 18 | 1309 | n.a. | |
| n.a. | n.a. | n.a. | n.a. | n.a. | 83 | 27.5 | 196 | 16 | 1770 | n.a. | |
Note. Wilcoxon two-sample tests were used to detect differences between the median gene conversion lengths of ohnologs and paralogs. n.a.: not applicable.
Maximum flanking similarity of gene conversions in pre and post-WGD species.
| Genome | Ohnolog maximum flanking similarity (%) | Paralog maximum flanking similarity (%) | Wilcoxon test | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Median | 1st quartile | 3rd quartile | Min | Max | Median | 1st quartile | 3rd quartile | Min | Max | ||
| Post-WGD | |||||||||||
| 88 | 84 | 94 | 80 | 97 | 95.6 | 91 | 99 | 80 | 100 | 0.001 | |
| 89 | 83 | 94 | 82 | 97 | 90.3 | 87 | 97.5 | 80 | 100 | 0.24 | |
| 87.5 | 82 | 92 | 81 | 96 | 91.7 | 86.6 | 95.6 | 81 | 100 | 0.15 | |
| 86.8 | 85.7 | 88 | 85.7 | 88 | 94 | 93 | 97 | 85 | 99 | 0.07 | |
| 87.6 | 85 | 92 | 80 | 98 | 92.9 | 86 | 99 | 81 | 100 | 0.04 | |
| 84.5 | 83 | 86 | 83 | 86 | 92.6 | 90 | 99.5 | 86 | 100 | 0.08 | |
| 87 | 86 | 88 | 86 | 88 | 93 | 87 | 97 | 85 | 100 | 0.35 | |
| Pre-WGD | |||||||||||
| n.a. | n.a. | n.a. | n.a. | n.a. | 90 | 86 | 97 | 81 | 98 | n.a. | |
| n.a. | n.a. | n.a. | n.a. | n.a. | 93 | 86.3 | 97 | 80 | 100 | n.a. | |
| n.a. | n.a. | n.a. | n.a. | n.a. | 86.5 | 83.3 | 93.5 | 80 | 100 | n.a. | |
Note. Wilcoxon two sample tests were used to detected differences between the median flanking similarities of ohnologs and paralogs. n.a.: not applicable.
Figure 3Correlation between gene conversion length and maximum flanking sequence similarity. (a) Conversions detected between the ohnologs of the six Saccharomyces species and C. glabrata. There are 107 conversions, 46 of which have ≥80% flanking similarity. (b) Conversions detected between the paralogs of the six Saccharomyces species and C. glabrata. There are 401 conversions, 311 of which have ≥80% flanking similarity. (c) Conversions detected the paralogs of the three pre-WGD genomes. There are 147 conversions, 91 of which have ≥80% flanking similarity.
Nonsynonymous substitutions per nonsynonymous site (Ka), synonymous substitutions per synonymous site (Ks), and Ka/Ks ratios (± standard deviations) for pairs of converted genes in pre- and post-WGD species.
| Genome | Ka | Ks | Ka/Ks | |||
|---|---|---|---|---|---|---|
| Ohnologs | Paralogs | Ohnologs | Paralogs | Ohnologs | Paralogs | |
| Post-WGD | ||||||
| 0.04 ± 0.03 | 0.09 ± 0.08 | 0.96 ± 0.49 | 0.37 ± 0.44 | 0.04 ± 0.02 | 0.38 ± 0.27 | |
| 0.09 ± 0.11 | 0.18 ± 0.20 | 0.91 ± 0.76 | 0.56 ± 0.40 | 0.10 ± 0.05 | 0.46 ± 0.57 | |
| 0.09 ± 0.11 | 0.17 ± 0.19 | 1.87 ± 1.06 | 0.56 ± 0.31 | 0.04 ± 0.04 | 0.34 ± 0.45 | |
| 0.06 ± 0.04 | 0.08 ± 0.04 | 0.95 ± 0.46 | 0.47 ± 0.59 | 0.06 ± 0.01 | 0.38 ± 0.34 | |
| 0.11 ± 0.09 | 0.13 ± 0.12 | 1.91 ± 1.68 | 0.40 ± 0.46 | 0.07 ± 0.05 | 0.40 ± 0.28 | |
| 0.25 ± 0.17 | 0.04 ± 0.04 | 1.32 ± 0.08 | 0.36 ± 0.55 | 0.18 ± 0.12 | 0.37 ± 0.45 | |
| 0.18 ± 0.09 | 0.13 ± 0.07 | 2.80 ± 1.18 | 0.29 ± 0.12 | 0.06 ± 0.01 | 0.61 ± 0.46 | |
| Pre WGD | ||||||
| n.a. | 0.20 ± 0.26 | n.a. | 0.61 ± 0.40 | n.a. | 0.49 ± 0.58 | |
| n.a. | 0.10 ± 0.07 | n.a. | 0.50 ± 0.40 | n.a. | 0.31 ± 0.17 | |
| n.a. | 0.25 ± 0.19 | n.a. | 1.12 ± 0.38 | n.a. | 0.46 ± 0.38 | |
n.a.: not applicable.
GO terms associated with biological processes for the ohnologs and paralogs of S. cerevisiae.
| GO term | Cluster frequency | Background frequency | |
|---|---|---|---|
| Biological regulation | 21.9% | 13.8% | 5.2 × 10−13 |
| Regulation of biological process | 18.0% | 11.3% | 3.9 × 10−10 |
| Regulation of cellular process | 16.8% | 10.5% | 2.6 × 10−09 |
| External encapsulating structure organization and biogenesis | 6.4% | 2.8% | 6.8 × 10−09 |
| Cell wall organization and biogenesis | 6.4% | 2.8% | 6.8 × 10−09 |
| Protein amino acid phosphorylation | 4.0% | 1.4% | 2.1 × 10−08 |
| Cellular polysaccharide biosynthetic process | 2.0% | 0.5% | 4.1 × 10−08 |
| Polysaccharide biosynthetic process | 2.0% | 0.5% | 9.9 × 10−08 |
| Carbohydrate biosynthetic process | 2.7% | 0.9% | 9.3 × 10−07 |
| Cellular carbohydrate metabolic process | 5.2% | 2.3% | 1.0 × 10−06 |
| Carbohydrate metabolic process | 5.5% | 2.5% | 1.7 × 10−06 |
| Transposition | 32.8% | 1.3% | 9.7 × 10−109 |
| Transposition, RNA-mediated | 32.8% | 1.3% | 9.7 × 10−109 |
| Carbohydrate transport | 5.2% | 0.5% | 9.5 × 10−09 |
| Monosaccharide transport | 4.0% | 0.3% | 4.0 × 10−07 |
| Hexose transport | 4.0% | 0.3% | 4.0 × 10−07 |
| Thiamin and derivative metabolic process | 3.2% | 0.3% | 4.0 × 10−05 |
| Thiamin biosynthetic process | 2.8% | 0.2% | 2.0 × 10−4 |
| Thiamin and derivative biosynthetic process | 2.8% | 0.3% | 3.1 × 10−4 |
| Thiamin metabolic process | 2.8% | 0.3% | 3.1 × 10−4 |
| Telomere maintenance via recombination | 2.8% | 0.3% | 4.8 × 10−4 |
| Amino acid catabolic process | 3.6% | 0.5% | 1.0 × 10−3 |
| Cellular response to nitrogen levels | 1.6% | 0.1% | 1.6 × 10−3 |
Notes. Frequencies were calculated from 1100 ohnologs, 250 paralogs, and 7159 background genes. Only the twelve most significant results for each type of genes are shown.
Correlations between the location of the converted regions and their position in the converted genes in pre- and post-WGD genomes.
| Genome | Ohnolog | Paralog | ||
|---|---|---|---|---|
| R-value | Power | R-value | Power | |
| Post WGD | ||||
| −0.07 | 0.036 | 0.73* | n.a. | |
| 0.12 | 0.049 | −0.19 | 0.072 | |
| 0.00 | 0.025 | −0.19 | 0.076 | |
| −0.17 | 0.065 | −0.09 | 0.043 | |
| 0.24 | 0.095 | 0.11 | 0.047 | |
| 0.00 | 0.025 | 0.06 | 0.034 | |
| 0.17 | 0.066 | −0.09 | 0.043 | |
| Pre WGD | ||||
| n.a. | n.a. | −0.32 | 0.14 | |
| n.a. | n.a. | 0.02 | 0.028 | |
| n.a. | n.a. | 0.14 | 0.055 | |
The R-values indicate correlation values. Significant correlations (Spearman rank correlation test P < 0.05) are labeled with *. The power of each correlation test is provided except for S. cerevisiae paralogs, where the null hypothesis was rejected, and for ohnologs for which a power test could not be performed. n.a.: not applicable.