| Literature DB >> 27345953 |
Stephen A James1, Claire West1, Robert P Davey2, Jo Dicks1, Ian N Roberts1.
Abstract
Despite the considerable number and taxonomic breadth of past and current genome sequencing projects, many of which necessarily encompass the ribosomal DNA, detailed information on the prevalence and evolutionary significance of sequence variation in this ubiquitous genomic region are severely lacking. Here, we attempt to address this issue in two closely related yet contrasting yeast species, the baker's yeast Saccharomyces cerevisiae and the wild yeast Saccharomyces paradoxus. By drawing on existing datasets from the Saccharomyces Genome Resequencing Project, we identify a rich seam of ribosomal DNA sequence variation, characterising 1,068 and 970 polymorphisms in 34 S. cerevisiae and 26 S. paradoxus strains respectively. We discover the two species sets exhibit distinct mutational profiles. Furthermore, we show for the first time that unresolved rDNA sequence variation resulting from imperfect concerted evolution of the ribosomal DNA region follows a U-shaped allele frequency distribution in each species, similar to loci that evolve under non-concerted mechanisms but arising through rather different evolutionary processes. Finally, we link differences between the shapes of these allele frequency distributions to the two species' contrasting population histories.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27345953 PMCID: PMC4921842 DOI: 10.1038/srep28555
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
rDNA Sequence Variation Uncovered Within the S. paradoxus Dataset.
| Strain | Population | Inter-genomic variants | Intra-genomic variants | CX | Total | ||||
|---|---|---|---|---|---|---|---|---|---|
| SNP | INS | DEL | pSNP | pINS | pDEL | ||||
| Q89.8 | European | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 |
| Q32.3 | European | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 2 |
| S36.7 | European | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 2 |
| Q95.3 | European | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 3 |
| Y7.2 | European | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 3 |
| Z1.1 | European | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 3 |
| T21.4 | European | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 4 |
| Y6.5 | European | 1 | 1 | 0 | 0 | 1 | 2 | 0 | 5 |
| Q62.5 | European | 2 | 1 | 0 | 2 | 0 | 1 | 0 | 6 |
| Q59.1 | European | 0 | 1 | 0 | 5 | 2 | 0 | 1 | 9 |
| CBS 432 (T) | European | 5 | 2 | 1 | 0 | 0 | 2 | 0 | 10 |
| DBVPG 4650 | European | 2 | 1 | 0 | 4 | 1 | 3 | 0 | 11 |
| CBS 5829 | European | 6 | 2 | 1 | 3 | 0 | 0 | 0 | 12 |
| KPN 3828 | European | 7 | 2 | 3 | 1 | 0 | 0 | 0 | 13 |
| KPN 3829 | European | 7 | 2 | 2 | 1 | 0 | 1 | 0 | 13 |
| N-17 | European | 1 | 2 | 0 | 16 | 2 | 6 | 1 | 28 |
| IFO 1804 | Far Eastern | 39 | 2 | 4 | 0 | 1 | 0 | 0 | 46 |
| N-44 | Far Eastern | 38 | 3 | 3 | 1 | 0 | 1 | 0 | 46 |
| N-43 | Far Eastern | 40 | 4 | 5 | 1 | 0 | 0 | 0 | 50 |
| N-45 | Far Eastern | 4 | 1 | 0 | 36 | 4 | 5 | 0 | 50 |
| A12 | American | 84 | 9 | 10 | 0 | 0 | 0 | 0 | 103 |
| A4 | American | 88 | 10 | 6 | 0 | 0 | 0 | 1 | 105 |
| UFRJ 50816 | American | 91 | 6 | 10 | 0 | 1 | 0 | 0 | 108 |
| YPS138 | American | 95 | 8 | 7 | 0 | 0 | 0 | 0 | 110 |
| UFRJ 50791 | American | 95 | 10 | 8 | 0 | 0 | 0 | 0 | 113 |
| DBVPG 6304 | American | 97 | 6 | 6 | 2 | 1 | 1 | 1 | 114 |
| Total | 704 | 80 | 69 | 72 | 15 | 25 | 5 | 970 | |
Table of variation for each S. paradoxus strain, compared to the reference strain CBS 432, as identified using the TURNIP software. The three fixed, inter-genomic variants - single nucleotide polymorphism, insertion and deletion - are denoted as SNP, INS and DEL respectively. Their unresolved, intra-genomic forms - partial single nucleotide polymorphism, partial insertion and partial deletion - are denoted as pSNP, pINS and pDEL respectively. Complex mutations, the manifestation of different mutation types at a single rDNA site, are denoted as CX. For each strain, the population from which they derive is also given. Ordering the strains by total polymorphism count results in the strains being split into their population groups.
rDNA Sequence Variation Uncovered Within the S. cerevisiae Dataset.
| Strain | Genome Type | Group | Inter-genomic variants | Intra-genomic variants | CX | Total | ||||
|---|---|---|---|---|---|---|---|---|---|---|
| SNP | INS | DEL | pSNP | pINS | pDEL | |||||
| W303 | Mosaic | OM | 0 | 0 | 0 | 3 | 0 | 1 | 0 | 4 |
| L_1374 | Structured mosaic | W/E | 6 | 0 | 7 | 2 | 0 | 1 | 0 | 16 |
| DBVPG 1106 | Structured mosaic | W/E | 7 | 0 | 9 | 1 | 0 | 1 | 0 | 18 |
| DBVPG 1788 | Structured mosaic | W/E | 8 | 0 | 11 | 0 | 0 | 0 | 0 | 19 |
| YJM975 | Structured mosaic | W/E | 6 | 0 | 7 | 4 | 0 | 3 | 0 | 20 |
| YJM978 | Structured mosaic | W/E | 6 | 0 | 7 | 4 | 0 | 3 | 0 | 20 |
| YJM981 | Structured mosaic | W/E | 6 | 0 | 6 | 4 | 0 | 5 | 0 | 21 |
| YPS128 | Structured clean | NA | 14 | 0 | 10 | 0 | 0 | 0 | 0 | 24 |
| S288c | Mosaic | OM | 0 | 0 | 0 | 14 | 1 | 10 | 0 | 25 |
| BC187 | Structured mosaic | W/E | 7 | 0 | 5 | 7 | 1 | 5 | 0 | 25 |
| DBVPG 1373 | Structured mosaic | W/E | 8 | 0 | 5 | 7 | 1 | 4 | 0 | 25 |
| YPS606 | Structured clean | NA | 14 | 0 | 9 | 2 | 0 | 2 | 0 | 27 |
| NCYC 110 | Structured clean | WA+ | 15 | 2 | 8 | 2 | 0 | 1 | 0 | 28 |
| DBVPG 6765 | Structured mosaic | W/E | 13 | 0 | 8 | 3 | 0 | 4 | 0 | 28 |
| DBVPG 6044 | Structured clean | WA+ | 15 | 1 | 9 | 2 | 1 | 1 | 0 | 29 |
| SK1 | Mosaic | WA+ | 16 | 0 | 7 | 3 | 0 | 3 | 0 | 29 |
| UWOPS87-2421 | Mosaic | UM | 14 | 0 | 11 | 4 | 0 | 1 | 0 | 30 |
| 322134S | Mosaic | OM | 6 | 0 | 5 | 12 | 2 | 5 | 0 | 30 |
| Y9 | Structured mosaic | SA | 8 | 1 | 6 | 10 | 3 | 4 | 1 | 33 |
| Y55 | Mosaic | WA+ | 15 | 1 | 9 | 7 | 0 | 2 | 0 | 34 |
| 273614N | Mosaic | OM | 4 | 2 | 6 | 15 | 1 | 6 | 0 | 34 |
| Y12 | Structured mosaic | SA | 9 | 1 | 6 | 11 | 4 | 5 | 0 | 36 |
| 378604X | Mosaic | OM | 0 | 0 | 1 | 20 | 4 | 11 | 0 | 36 |
| DBVPG 6040 | Mosaic | OM | 0 | 0 | 0 | 27 | 2 | 11 | 0 | 40 |
| K11 | Structured mosaic | SA | 23 | 5 | 9 | 2 | 0 | 2 | 0 | 41 |
| YS9 | Mosaic | OM | 1 | 0 | 1 | 27 | 4 | 8 | 0 | 41 |
| YIIc17_E5 | Mosaic | YII | 7 | 0 | 6 | 18 | 4 | 6 | 0 | 41 |
| UWOPS05-227-2 | Structured clean | MA | 24 | 0 | 11 | 7 | 0 | 0 | 0 | 42 |
| UWOPS05-217-3 | Structured clean | MA | 27 | 0 | 7 | 3 | 0 | 6 | 0 | 43 |
| UWOPS83-787-3 | Mosaic | UM | 8 | 0 | 6 | 21 | 1 | 7 | 0 | 43 |
| UWOPS03-461-4 | Structured clean | MA | 29 | 0 | 15 | 0 | 0 | 0 | 0 | 44 |
| NCYC 361 | Mosaic | OM | 0 | 0 | 0 | 27 | 4 | 13 | 0 | 44 |
| YS4 | Mosaic | OM | 9 | 1 | 6 | 24 | 4 | 5 | 0 | 49 |
| DBVPG 1853 | Mosaic | OM | 14 | 1 | 2 | 18 | 5 | 9 | 0 | 49 |
| Total | 339 | 15 | 215 | 311 | 42 | 145 | 1 | 1068 | ||
Table of variation for each S. cerevisiae strain, compared to the reference strain S288c, as identified using the TURNIP software. The seven mutation types - single nucleotide polymorphism, insertion, deletion, partial single nucleotide polymorphism, partial insertion, partial deletion and complex mutation - are again denoted as SNP, INS, DEL, pSNP, pINS, pDEL and CX respectively. For each strain, the genome type (mosaic, structured clean or structured mosaic) and the geographic/industrial group (MA [Malaysian], NA [North American], SA [Sake], WA+ [West African + other mosaics], W/E [Wine/European], YII [strain YIIc17-E5], UM [UWOPS mosaics] or OM [Other Mosaics]), as determined in14, are also given.
Figure 1Geographical Location of the S. paradoxus Strain Collection.
World map with the location of the collection sites for the S. paradoxus strains indicated by stars. Stars are coloured by population type. In brackets following each strain are the numbers of SNPs, insertions, deletions, pSNPs, partial insertions, partial deletions and complex mutations identified for that strain in this study. World map template downloaded and modified from https://commons.wikimedia.org/wiki/File:World_map_blank_shorelines_semiwikimapia.svg.
Figure 2Mutational Profiles of S. paradoxus and S. cerevisiae.
Percentages of each type of mutation within each species are shown, clearly indicating contrasting mutational profiles between the two species.
Estimates of Type 1 and Type 2 partial insertions and deletions in S. paradoxus and S. cerevisiae.
| Species | Group | pINS | pDEL | ||||
|---|---|---|---|---|---|---|---|
| Type 1 | Type 2 | Total | Type 1 | Type 2 | Total | ||
| European | 8 | 0 | 8 | 16 | 2 | 18 | |
| Far Eastern | 2 | 3 | 5 | 5 | 1 | 6 | |
| American | 2 | 0 | 2 | 0 | 1 | 1 | |
| Structured clean | 1 | 0 | 1 | 4 | 6 | 10 | |
| Structured mosaic | 5 | 4 | 9 | 16 | 21 | 37 | |
| Mosaic | 22 | 10 | 32 | 66 | 32 | 98 | |
S. paradoxus strains are subdivided by their population groups and S. cerevisiae strains by their genome structure groups.
Region-By-Region Breakdown of rDNA Variation in S. paradoxus and S. cerevisiae.
| Region | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Inter-genomic variants | Intra-genomic variants | CX | Total | Inter-genomic variants | Intra-genomic variants | CX | Total | |||||||||
| SNP | INS | DEL | pSNP | pINS | pDEL | SNP | INS | DEL | pSNP | pINS | pDEL | |||||
| 26S | 7 | 0 | 0 | 1 | 0 | 0 | 0 | 8 | 4 | 0 | 0 | 18 | 2 | 1 | 0 | 25 |
| ETS2 | 77 | 25 | 26 | 7 | 1 | 7 | 1 | 144 | 4 | 0 | 9 | 12 | 0 | 12 | 0 | 37 |
| IGS1 | 237 | 48 | 37 | 30 | 14 | 15 | 4 | 385 | 110 | 11 | 125 | 122 | 20 | 97 | 1 | 486 |
| 5S | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| IGS2 | 316 | 7 | 6 | 19 | 0 | 0 | 0 | 348 | 178 | 4 | 11 | 111 | 9 | 10 | 0 | 323 |
| ETS1 | 48 | 0 | 0 | 9 | 0 | 1 | 0 | 58 | 24 | 0 | 54 | 18 | 0 | 16 | 0 | 112 |
| 18S | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 9 | 0 | 1 | 0 | 10 |
| ITS1 | 15 | 0 | 0 | 5 | 0 | 0 | 0 | 20 | 19 | 0 | 0 | 18 | 5 | 0 | 0 | 42 |
| 5.8S | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| ITS2 | 2 | 0 | 0 | 1 | 0 | 1 | 0 | 4 | 0 | 0 | 16 | 3 | 6 | 8 | 0 | 33 |
| Total | 704 | 80 | 69 | 72 | 15 | 25 | 5 | 970 | 339 | 15 | 215 | 311 | 42 | 145 | 1 | 1068 |
The number of polymorphisms of each type split according to different regions of the ribosomal DNA unit for S. paradoxus and S. cerevisiae.
Figure 3Distribution of Sequence Variation Along the rDNA Unit.
The distribution of rDNA sequence variants within the rDNA unit and their unit occupancy frequencies along the tandem array. (a) Inter- and intra-genomic variants within the S. paradoxus dataset; fixed, inter-genomic variants (SNPs + INSs + DELs) are shown as blue bars while partial, intra-genomic variants (pSNPs + pINSs + pDELs) along with complex variants are shown as red bars, with the boxed areas in light grey highlighting coding RNA regions. Representation of an rDNA unit is shown below. Most variation is identified in the ETS2, IGS1 and IGS2 regions. (b) Inter- and intra-genomic variants within the S. cerevisiae dataset; data represented as in (a). Most variation is identified in the IGS1, IGS2 and ETS1 regions. (c) Bar chart showing unit occupancies of most intra-genomic variants (pSNPs + Type 1 pINSs + Type 1 pDELs) in the S. paradoxus and S. cerevisiae datasets, in unit occupancy bins of size 10%. Though both exhibiting U-shaped curves, the variant occupancy distributions for S. cerevisiae and S. paradoxus are markedly different.