| Literature DB >> 24682414 |
Claire West1, Stephen A James1, Robert P Davey2, Jo Dicks2, Ian N Roberts3.
Abstract
The ribosomal RNA encapsulates a wealth of evolutionary information, including genetic variation that can be used to discriminate between organisms at a wide range of taxonomic levels. For example, the prokaryotic 16S rDNA sequence is very widely used both in phylogenetic studies and as a marker in metagenomic surveys and the internal transcribed spacer region, frequently used in plant phylogenetics, is now recognized as a fungal DNA barcode. However, this widespread use does not escape criticism, principally due to issues such as difficulties in classification of paralogous versus orthologous rDNA units and intragenomic variation, both of which may be significant barriers to accurate phylogenetic inference. We recently analyzed data sets from the Saccharomyces Genome Resequencing Project, characterizing rDNA sequence variation within multiple strains of the baker's yeast Saccharomyces cerevisiae and its nearest wild relative Saccharomyces paradoxus in unprecedented detail. Notably, both species possess single locus rDNA systems. Here, we use these new variation datasets to assess whether a more detailed characterization of the rDNA locus can alleviate the second of these phylogenetic issues, sequence heterogeneity, while controlling for the first. We demonstrate that a strong phylogenetic signal exists within both datasets and illustrate how they can be used, with existing methodology, to estimate intraspecies phylogenies of yeast strains consistent with those derived from whole-genome approaches. We also describe the use of partial Single Nucleotide Polymorphisms, a type of sequence variation found only in repetitive genomic regions, in identifying key evolutionary features such as genome hybridization events and show their consistency with whole-genome Structure analyses. We conclude that our approach can transform rDNA sequence heterogeneity from a problem to a useful source of evolutionary information, enabling the estimation of highly accurate phylogenies of closely related organisms, and discuss how it could be extended to future studies of multilocus rDNA systems. [concerted evolution; genome hydridisation; phylogenetic analysis; ribosomal DNA; whole genome sequencing; yeast].Entities:
Mesh:
Substances:
Year: 2014 PMID: 24682414 PMCID: PMC4055870 DOI: 10.1093/sysbio/syu019
Source DB: PubMed Journal: Syst Biol ISSN: 1063-5157 Impact factor: 15.683
rDNA sequence variation uncovered within the S. paradoxus dataset
| Strain | Population | SNP | pSNP | Total | Copy Number (S.E.) |
| Q32.3 | European | 0 | 0 | 0 | 74 (0.109) |
| Q89.8 | European | 0 | 0 | 0 | 81 (0.228) |
| Q95.3 | European | 0 | 0 | 0 | 46 (0.093) |
| S36.7 | European | 0 | 0 | 0 | 57 (0.109) |
| T21.4 | European | 0 | 0 | 0 | 66 (0.074) |
| Y6.5 | European | 1 | 0 | 1 | 65 (0.085) |
| Y7.2 | European | 1 | 0 | 1 | 78 (0.108) |
| Z1.1 | European | 1 | 0 | 1 | 83 (0.103) |
| Q62.5 | European | 2 | 2 | 4 | 68 (0.096) |
| CBS 432 (T) | European | 5 | 0 | 5 | 68 (0.072) |
| Q59.1 | European | 0 | 5 | 5 | 52 (0.062) |
| DBVPG 4650 | European | 2 | 4 | 6 | 87 (0.107) |
| KPN 3828 | European | 7 | 1 | 8 | 82 (0.112) |
| KPN 3829 | European | 7 | 1 | 8 | 79 (0.124) |
| CBS 5829 | European | 6 | 3 | 9 | 88 (0.095) |
| N-17 | European | 1 | 17 | 18 | 78 (0.068) |
| IFO 1804 | Far Eastern | 39 | 0 | 39 | 96 (0.187) |
| N-44 | Far Eastern | 38 | 1 | 39 | 52 (0.085) |
| N-45 | Far Eastern | 4 | 36 | 40 | 66 (0.049) |
| N-43 | Far Eastern | 40 | 1 | 41 | 64 (0.126) |
| A12 | American | 84 | 0 | 84 | 45 (0.065) |
| A4 | American | 88 | 0 | 88 | 66 (0.098) |
| UFRJ 50816 | American | 92 | 0 | 92 | 72 (0.090) |
| UFRJ 50791 | American | 95 | 0 | 95 | 64 (0.107) |
| YPS138 | American | 95 | 0 | 95 | 76 (0.099) |
| DBVPG 6304 | American | 97 | 2 | 99 | 53 (0.094) |
| Total | 705 | 73 | 778 |
Notes: Table of SNP and pSNP polymorphisms for each S. paradoxus strain, compared to the reference strain CBS432, as identified using the TURNIP software. Polymorphism counts are taken from West et al., in preparation. For each strain, the population and estimated ribosomal DNA copy number (along with the standard error of the copy number estimate) are also given. Ordering the strains by total polymorphism count results in the strains being split into their population groups.
rDNA sequence variation uncovered within the S. cerevisiae dataset
| Strain | Group | Genome type | Modified genome type | SNP | pSNP | Total | Copy number (S.E.) |
| W303 | OM | Mosaic | Mosaic | 0 | 3 | 3 | 182 (0.142) |
| L_1374 | W/E | Structured | Structured mosaic | 6 | 2 | 8 | 60 (0.096) |
| DBVPG 1106 | W/E | Structured | Structured mosaic | 7 | 1 | 8 | 98 (0.112) |
| DBVPG 1788 | W/E | Structured | Structured mosaic | 8 | 0 | 8 | 67 (0.101) |
| YJM981 | W/E | Structured | Structured mosaic | 6 | 3 | 9 | 354 (0.495) |
| YJM975 | W/E | Structured | Structured mosaic | 6 | 4 | 10 | 65 (0.095) |
| YJM978 | W/E | Structured | Structured mosaic | 6 | 4 | 10 | 65 (0.136) |
| YPS128 | NA | Structured | Structured clean | 14 | 0 | 14 | 62 (0.094) |
| S288c | OM | Mosaic | Mosaic | 0 | 14 | 14 | 111 (0.163) |
| BC187 | W/E | Structured | Structured mosaic | 7 | 7 | 14 | 71 (0.135) |
| DBVPG 1373 | W/E | Structured | Structured mosaic | 8 | 7 | 15 | 75 (0.127) |
| DBVPG 6765 | W/E | Structured | Structured mosaic | 13 | 3 | 16 | 70 (0.077) |
| YPS606 | NA | Structured | Structured clean | 14 | 2 | 16 | 67 (0.096) |
| NCYC 110 | WA + | Structured | Structured clean | 15 | 2 | 17 | 163 (0.199) |
| DBVPG 6044 | WA + | Structured | Structured clean | 15 | 2 | 17 | 107 (0.121) |
| Y9 | SA | Structured | Structured mosaic | 8 | 10 | 18 | 79 (0.149) |
| UWOPS87-2421 | UM | Mosaic | Mosaic | 14 | 4 | 18 | 57 (0.109) |
| 322134S | OM | Mosaic | Mosaic | 6 | 12 | 18 | 109 (0.140) |
| SK1 | WA + | Mosaic | Mosaic | 16 | 3 | 19 | 72 (0.080) |
| 27361N | OM | Mosaic | Mosaic | 4 | 15 | 19 | 93 (0.119) |
| Y12 | SA | Structured | Structured mosaic | 9 | 11 | 20 | 78 (0.143) |
| 378604X | OM | Mosaic | Mosaic | 0 | 20 | 20 | 87 (0.117) |
| Y55 | WA + | Mosaic | Mosaic | 15 | 7 | 22 | 72 (0.060) |
| K11 | SA | Structured | Structured mosaic | 23 | 2 | 25 | 50 (0.082) |
| YIIc17_E5 | YII | Mosaic | Mosaic | 7 | 18 | 25 | 80 (0.117) |
| DBVPG 6040 | OM | Mosaic | Mosaic | 0 | 27 | 27 | 132 (0.106) |
| NCYC 361 | OM | Mosaic | Mosaic | 0 | 27 | 27 | 189 (0.189) |
| YS9 | OM | Mosaic | Mosaic | 1 | 27 | 28 | 56 (0.130) |
| UWOPS83-787-3 | UM | Mosaic | Mosaic | 8 | 21 | 29 | 64 (0.102) |
| UWOPS03-461-4 | MA | Structured | Structured clean | 29 | 0 | 29 | 89 (0.090) |
| UWOPS05-217-3 | MA | Structured | Structured clean | 27 | 3 | 30 | 133 (0.186) |
| UWOPS05-227-2 | MA | Structured | Structured clean | 24 | 7 | 31 | 70 (0.108) |
| YS4 | OM | Mosaic | Mosaic | 9 | 24 | 33 | 88 (0.110) |
| DBVPG 1853 | OM | Mosaic | Mosaic | 14 | 23 | 37 | 144 (0.205) |
| Total | 339 | 315 | 654 |
Notes: Table of SNP and pSNP polymorphisms for each S. cerevisiae strain, compared to the reference strain S288c, as identified using the TURNIP software. Polymorphism counts are taken from West et al. (in preparation). For each strain, the strain group (geographic or phylogenetic origin/industrial usage), the genome type (mosaic or structured), the modified genome type (mosaic, structured clean, and structure mosaic) determined in this study, and the estimated ribosomal DNA copy number (along with the standard error of the copy number estimate) are also given. Key for groups: MA (Malaysian); NA (North American); SA (Sake); WA + (West African + other mosaics); W/E (Wine/European); YII (strain YIIc17-E5); UM (UWOPS mosaics); OM (Other Mosaics)
Figure 1.pSNP + SNP polymorphism counts in S. paradoxus strains. Bar chart of pSNP plus SNP variation in each S. paradoxus strain, labeled to show the split into distinct populations. The strains are ordered by increasing number of pSNPs + SNPs, and naturally split into the three geographical locations.
Figure 2.Neighbor-joining phylogenetic trees of the S. paradoxus strain set. a) Saccharomyces paradoxus neighbor-joining phylogenetic tree with S. cerevisiae strain S288c as the nominated root. Bootstrap support values greater than 50 shown. Clear separation into strain collection site can be observed. Little variation within the European group, particularly the 10 UK strains (Q95.3 to Q59.1) and the 2 Siberian strains (KPN3828 and KPN3829), is apparent. N-45 is found to be the most divergent of the 4 Far Eastern strains. The American strains proved to be most divergent as a group. b) Saccharomyces paradoxus phylogenetic tree derived from 623 287 genome-wide SNPs (Liti et al. 2009). Reprinted by permission from Macmillan Publishers Ltd: Nature 458:337-341, ©2009.
Phylogenetic grouping of polymorphisms in S. paradoxus and S. cerevisiae
| Polymorphism type | |||
| SNPs | Within strain | 17 | 18 |
| Within group | 91 | 11 | |
| Across group | 1 | 3 | |
| With root | 244 | 291 | |
| pSNPs | Within strain | 15 | 66 |
| Within group | 0 | 22 | |
| Across group | 1 | 7 | |
| pSNPs + SNPs | Within group | 16 | 14 |
| Across group | 26 | 36 | |
| Total | 411 | 468 | |
Notes: The number of polymorphisms of each type (SNP, pSNP, or pSNP + SNP) across the entire strain set for S. paradoxus and S. cerevisiae.
Figure 3.Neighbor-joining phylogenetic trees of the S. cerevisiae strain set. a) Saccharomyces cerevisiae neighbor-joining tree with S. paradoxus strain Q32.3 as the nominated root. Bootstrap support values greater than 50 shown. Dotted line equivalent to 0.355 units of distance. Groups of interest are shown as colored boxes. b) Saccharomyces cerevisiae phylogenetic tree derived from 235 127 genome-wide SNPs (Liti et al. 2009). Reprinted by permission from Macmillan Publishers Ltd: Nature 458:337-341, ©2009.
Correlation and regression analysis of rDNA copy number with strain features
| Species | Factor 1 | No. of strains | Factor 2 | No. of levels | ||
| rDNA copy number | 26 | Geographical origin | 3 | − 0.287 | 0.293 | |
| rDNA copy number | 34 | Strain group | 8 | − 0.253 | 0.240 | |
| rDNA copy number | 34 | Genome type | 2 | − 0.057 | 0.676 | |
| rDNA copy number | 34 | Modified genome type | 3 | − 0.037 | 0.896 | |
| rDNA copy number | 32 | Strain group | 8 | − 0.627 | 3.89 × 10−5a | |
| rDNA copy number | 32 | Genome type | 2 | 0.299 | 0.049 | |
| rDNA copy number | 32 | Modified genome type | 3 | − 0.129 | 0.006 |
Notes: Pearson's correlation coefficients (r) were calculated between rDNA copy number and strain features for various numbers of strains. Negative Binomial Generalised Linear Models were also fitted to the same datasets, with P-values for the resulting χ2 analysis of deviance test also found.
aFurthermore, the Negative Binomial regression indicated that the rDNA copy numbers of the Sake, Wine/European, North American and UWOPS Mosaics groups were significantly different from possessed by the Other Mosaics group, with P = 0.002, P = 1.09 × 10−5, P = 0.003, and P = 0.001 respectively.
Figure 4.The variation of copy number within groups. a) Box plot of S. paradoxus geographical groups and copy number of each strain. b) box plot of S. cerevisiae groups versus copy number of each strain. Outliers YJM981 and DBVPG 1106 are represented as circles in the Wine/European group.