| Literature DB >> 24425782 |
Anders Bergström1, Jared T Simpson, Francisco Salinas, Benjamin Barré, Leopold Parts, Amin Zia, Alex N Nguyen Ba, Alan M Moses, Edward J Louis, Ville Mustonen, Jonas Warringer, Richard Durbin, Gianni Liti.
Abstract
The question of how genetic variation in a population influences phenotypic variation and evolution is of major importance in modern biology. Yet much is still unknown about the relative functional importance of different forms of genome variation and how they are shaped by evolutionary processes. Here we address these questions by population level sequencing of 42 strains from the budding yeast Saccharomyces cerevisiae and its closest relative S. paradoxus. We find that genome content variation, in the form of presence or absence as well as copy number of genetic material, is higher within S. cerevisiae than within S. paradoxus, despite genetic distances as measured in single-nucleotide polymorphisms being vastly smaller within the former species. This genome content variation, as well as loss-of-function variation in the form of premature stop codons and frameshifting indels, is heavily enriched in the subtelomeres, strongly reinforcing the relevance of these regions to functional evolution. Genes affected by these likely functional forms of variation are enriched for functions mediating interaction with the external environment (sugar transport and metabolism, flocculation, metal transport, and metabolism). Our results and analyses provide a comprehensive view of genomic diversity in budding yeast and expose surprising and pronounced differences between the variation within S. cerevisiae and that within S. paradoxus. We also believe that the sequence data and de novo assemblies will constitute a useful resource for further evolutionary and population genomics studies.Entities:
Keywords: functional variation; genome evolution; loss-of-function variants; population genomics; subtelomeres; yeast
Mesh:
Substances:
Year: 2014 PMID: 24425782 PMCID: PMC3969562 DOI: 10.1093/molbev/msu037
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
Sequencing and De Novo Assembly of Yeast Strain Genomes.
| Strain | Subpopulation | Source | Location | Cov | Number of Scaffolds | Assembly Size | Contig N50 | Max. Scaffold Size | Scaffold N50 |
|---|---|---|---|---|---|---|---|---|---|
| UWOPS87-2421 | Mosaic | Hawaii | 821 | 559/536 | 11,658,429/11,671,772 | 103,653/108,473 | 546,126/546,126 | 160,939/200,122 | |
| UWOPS83-787.3 | Mosaic | Bahamas | 627 | 513/495 | 11,676,943/11,678,122 | 113,702/113,702 | 557,771/885,033 | 187,881/244,661 | |
| YPS128 | North American | USA | 64 | 855/791 | 11,741,720/11,768,379 | 94,895/99,341 | 451,294/518,960 | 109,555/187,623 | |
| SK1 | Mosaic | Soil | USA | 56 | 1,315/1,119 | 11,704,669/11,753,398 | 49,149/54,072 | 321,072/569,610 | 67,234/267,272 |
| L1528 | Wine/European | Wine | Chile | 49 | 951/808 | 11,565,290/11,594,025 | 36,108/39,574 | 246,181/463,986 | 55,077/101,460 |
| W303 | Mosaic | Laboratory | USA | 48 | 1,391/1,219 | 11,630,510/11,666,829 | 34,537/37,986 | 209,223/284,547 | 52,228/122,824 |
| DBVPG6765 | Wine/European | Unknown | Unknown | 46 | 967/779 | 11,622,418/11,670,953 | 46,113/51,348 | 280,644/540,506 | 65,667/208,326 |
| Y12 | Sake | Sake | Japan | 44 | 1,721/1,582 | 11,610,064/11,649,747 | 25,846/26,998 | 113,242/229,213 | 36,944/60,148 |
| DBVPG1106 | Wine/European | Grapes | Australia | 43 | 1,280/1,163 | 11,539,734/11,580,105 | 26,613/27,246 | 172,964/183,080 | 36,291/47,371 |
| Y55 | Mosaic | Grapes | France | 38 | 1,516/1,219 | 11,635,423/11,701,139 | 24,282/25,808 | 263,008/418,695 | 42,157/141,898 |
| DBVPG6044 | West African | Bili wine | West Africa | 35 | 1,137/983 | 11,598,603/11,656,308 | 42,143/45,282 | 236,917/350,970 | 52,144/94,331 |
| YJM975 | Wine/European | Clinical | Italy | 33 | 3,069/2,429 | 11,406,731/11,688,902 | 5,701/5,849 | 33,464/59,403 | 7,414/15,038 |
| UWOPS03-461.4 | Malaysian | Bertram palm | Malaysia | 32 | 3,646/3,213 | 11,499,146/11,694,268 | 5,691/5,844 | 40,466/60,006 | 6,919/10,747 |
| DBVPG1373 | Wine/European | Soil | Netherlands | 32 | 1,232/970 | 11,559,454/11,626,504 | 21,249/22,464 | 185,406/368,851 | 35,542/98,526 |
| YJM978 | Wine/European | Clinical | Italy | 32 | — | — | — | — | — |
| DBVPG1788 | Wine/European | Soil | Finland | 30 | — | — | — | — | — |
| L1374 | Wine/European | Wine | Chile | 26 | — | — | — | — | — |
| BC187 | Wine/European | Wine | USA | 23 | — | — | — | — | — |
| YJM981 | Wine/European | Clinical | Italy | 16 | — | — | — | — | — |
| Y8.5 | European | UK | 502 | 439/ | 11,623,026/ | 111,332/ | 547,620/ | 204,111/ | |
| Z1.1 | European | UK | 423 | 426/412 | 11,616,631/11,624,232 | 113,350/115,435 | 555,399/555,399 | 238,384/289,391 | |
| Y9.6 | European | UK | 408 | 743/ | 11,615,713/ | 67,180/ | 271,892/ | 83,447/ | |
| Z1 | European | UK | 367 | 685/ | 11,686,590/ | 102,641/ | 357,366/ | 128,755/ | |
| Q59.1 | European | UK | 54 | 931/803 | 11,714,439/11,757,637 | 67,680/73,926 | 362,301/496,642 | 71,043/150,995 | |
| N-44 | Far Eastern | Russia | 51 | 2,058/1488 | 11,576,182/11,733,376 | 11,054/12,242 | 66,254/156,128 | 13,806/36,398 | |
| YPS138 | American | USA | 43 | 2,047/1529 | 11,613,204/11,740,006 | 11,025/11,801 | 69,807/117,759 | 14,797/33,285 | |
| S36.7 | European | UK | 42 | 1,153/1112 | 11,658,851/11,672,986 | 46,580/46,880 | 174,638/187,287 | 507,27/56,021 | |
| Y6.5 | European | UK | 41 | 1,134/981 | 11,655,419/11,704,717 | 47,030/52,107 | 197,044/357,414 | 50,244/78,826 | |
| Y7 | European | UK | 39 | 1,164/976 | 11,653,944/11,721,355 | 43,703/46,679 | 163,882/230,460 | 45,872/88,762 | |
| Q95.3 | European | UK | 35 | 1,229/1,002 | 11,660,442/11,734,036 | 42,716/46,351 | 190,059/307,994 | 48,261/111,311 | |
| UFRJ50816 | American | Brazil | 34 | — | — | — | — | — | |
| T21.4 | European | UK | 34 | 1,187/991 | 11,656,698/11,699,313 | 35,788/41,410 | 131,328/340,019 | 40,273/70,866 | |
| IFO1804 | Far Eastern | Russia | 26 | — | — | — | — | — | |
| W7 | European | UK | 23 | 2,070/ | 11,647,694/ | 12,927/ | 94,615/ | 14,511/ | |
| Q74.4 | European | UK | 23 | — | — | — | — | — | |
| Q89.8 | European | UK | 22 | — | — | — | — | — | |
| Q62.5 | European | UK | 20 | — | — | — | — | — | |
| Q69.8 | European | UK | 18 | — | — | — | — | — | |
| Y8.1 | European | UK | 16 | — | — | — | — | — | |
| KPN3829 | European | Russia | 16 | — | — | — | — | — | |
| Q32.3 | European | UK | 12 | — | — | — | — | — |
Note.—Strains without values were not de novo assembled. Additional information of the sequenced strains was reported in Liti, Carter, et al. (2009). The S. cerevisiae Y12 originally reported from Ivory Coast Palm wine subsequently clarified (Fay J, personal communication) that the strain sent was the Sake strain K12 from Japan.
aRefers to raw coverage.
bValues before and after forward slashes correspond to before and after additional scaffolding with low coverage paired-end Sanger data. All values in units of base-pairs.
FYeast genome structures revealed by de novo assemblies augmented by genetic linkage data. (A) Scaffolding de novo assemblies using genetic linkage information from advanced intercross lines dramatically improves assembly connectivity and reveals extensive structural conservation of the core chromosomes in four of the major S. cerevisiae lineages. Displayed is a dot plot of sequence similarity between the assembly scaffolds of the strain YPS128 from the North American phylogenetic lineage and the 16 nuclear chromosomes of the S. cerevisiae reference genome (strain S288c), before and after the incorporation of the genetic linkage data into the scaffolding process. After scaffolding by genetic linkage, the majority of the assembly sequence is contained in 16 large scaffolds that are collinear with the chromosomes of the reference genome. Results are highly similar for the other three strains for which genetic linkage data is available; the West African strain DBVPG6044, the Wine/European strain DBVPG6765 and the sake/Japanese strain Y12 (the recent sequencing of the sake strain Kyokai no. 7 (Akao et al. 2011) revealed two intrachromosomal inversions in chromosomes V and XIV in relation to the reference strain S288c, however these are not shared by the sake strain Y12 sequenced here). Only scaffolds bigger than 50 kb are displayed. (B) Structural rearrangements relative to the chromosome organization of the S. cerevisiae reference genome, all localized to the subtelomeric regions. A directed arrow indicates that a sequence region is aligning to the part of the reference genome where the arrow starts but in the de novo assembly is located in the part of the genome corresponding to where the arrow ends. (C) A subtelomeric 18-kb region that assembled well in several strains and could be localized by genetic linkage is displayed with coordinates corresponding to the YPS128 chromosome XIII scaffold. Six genes were found in this region by ab initio gene prediction (arrows indicate coding direction).
FGenome content variation within natural yeast populations. (A) The relationship between genetic distance between strains as measured in SNPs and the amount of genomic material being present/absent between strains. All pairwise strain comparisons within each of the two species are included. (B) The number of nonreference genes found in each strain genome. Strain colors denote subpopulation origin (for S. cerevisiae: green = Wine/European, red = West African, cyan = Malaysian, yellow = North American, dark blue = Sake/Japanese, black = mosaic genome; for S. paradoxus: orange = American, brown = Far Eastern, magenta = European). The strain trees are neighbor-joining trees based on genome-wide SNP distances and the scale bars indicate sequence distance in units of SNPs per basepair (distance scales differ between the species).
FConvergent evolution of ARR cluster copy number. (A) Growth rate, length of mitotic lag, and mitotic growth efficiency in medium containing 5 mM sodium arsenite oxide for strains with different ARR cluster copy number. Units are on a log2 scale and relative to the S. cerevisiae reference strain derivative BY4741. The strain data points are jittered along the horizontal dimension to increase visibility. (B) Distribution of the ARR cluster copy number variant within the populations of S. cerevisiae and S. paradoxus. Strain colors denote subpopulation origin as in figure 2. The strain trees are neighbor-joining trees based on genome-wide SNP distances, and the scale bars indicates sequence distance in units of SNPs per basepair (distance scales differ between the species). (C) The two copies of the ARR gene cluster in the Wine/European strain BC187 were computationally phased and the sequences of the two copies were clustered with the corresponding sequences from the clean lineage strains of S. cerevisiae using the neighbor-joining algorithm. Although the Japanese/Sake strain (Y12) carries two copies, the haplotypes are very similar in sequence and are represented here by a consensus version where the few positions that are polymorphic between the two haplotypes have been masked out. The scale bar indicates sequence distance in units of SNPs per basepair.
FDistribution of SNPs within the S. cerevisiae population. (A) The derived allele frequency spectrum for SNPs with different coding effects. The ancestral state of each SNP was inferred by using S. paradoxus as an outgroup. (B) SNP alleles inferred to be derived are much more frequently predicted to be deleterious by SIFT than alleles predicted to be ancestral (21.5% vs. 5.4%, respectively). (C) The effect on gene sequences of derived alleles that are found in only a single strain. Strain colors denote subpopulation origin as in figure 2. The strain tree is a neighbor-joining tree based on genome-wide SNP distances and the scale bar indicates sequence distance in units of SNPs per basepair.
FLoss-of-function variants in the S. cerevisiae population. (A) Frequencies of indels and stop-gain SNPs in different categories of genes. Essential genes refer to genes for which the deletion in the BY reference strain background is not viable. (B) The distribution of the number of paralogs for genes with loss-of-function variants and for genes overall. The number of paralogs for each protein coding gene in the S. cerevisiae reference genome was estimated as the number of other genes in the genome returning BlastP hits with an e-value < 10−50 and with the alignment covering at least 80% of the query protein length. We note that because of CNV the exact number of paralogs for a given gene will vary between strains. The fraction of genes with zero paralogs is omitted. (C) A 2-bp insertion in the strain DBVPG6765 disrupts the translational reading frame of the gene RIM15. Sequences of S. cerevisiae strains and one S. paradoxus strain (the reference strain CBS432) for a segment surrounding the insertion in RIM15 are displayed. (D) The phenotypic effect of the frameshifting insertion variant was tested by deleting the RIM15 gene in the DBVPG6765 strain and in three other strains representing major phylogenetic lineages within S. cerevisiae. Diploid hybrids were then constructed between DBVPG6765 and the other three strains, containing alleles of RIM15 from both parental strains or only from one of them. These diploid strains were tested for their ability to sporulate in KAc medium by scoring the proportion of cells that have undergone sporulation at different time points. In all of the three genetic backgrounds, presence of only the DBVPG6765 RIM15 allele leads to dramatically lower sporulation efficiency.