| Literature DB >> 33677552 |
Yury A Barbitoff1,2, Andrew G Matveenko1,2, Anton B Matiiv1,2, Evgeniia M Maksiutenko1,3, Svetlana E Moskalenko1,3, Polina B Drozdova4, Dmitrii E Polev5, Alexandra Y Beliavskaia6, Lavrentii G Danilov1, Alexander V Predeus2,7, Galina A Zhouravleva1.
Abstract
Thousands of yeast genomes have been sequenced with both traditional and long-read technologies, and multiple observations about modes of genome evolution for both wild and laboratory strains have been drawn from these sequences. In our study, we applied Oxford Nanopore and Illumina technologies to assemble complete genomes of two widely used members of a distinct laboratory yeast lineage, the Peterhof Genetic Collection (PGC), and investigate the structural features of these genomes including transposable element content, copy number alterations, and structural rearrangements. We identified numerous notable structural differences between genomes of PGC strains and the reference S288C strain. We discovered a substantial enrichment of mid-length insertions and deletions within repetitive coding sequences, such as in the SCH9 gene or the NUP100 gene, with possible impact of these variants on protein amyloidogenicity. High contiguity of the final assemblies allowed us to trace back the history of reciprocal unbalanced translocations between chromosomes I, VIII, IX, XI, and XVI of the PGC strains. We show that formation of hybrid alleles of the FLO genes during such chromosomal rearrangements is likely responsible for the lack of invasive growth of yeast strains. Taken together, our results highlight important features of laboratory yeast strain evolution using the power of long-read sequencing.Entities:
Keywords: zzm321990 FLO genes; 74-D694; Yeast genome; long reads; structural variant
Mesh:
Substances:
Year: 2021 PMID: 33677552 PMCID: PMC8759820 DOI: 10.1093/g3journal/jkab029
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Yeast strains used in this work
| Strain | Genotype | Reference | Source |
|---|---|---|---|
|
| ( | Laboratory collection | |
|
| ( | Laboratory collection | |
|
| Sigma1278b background: | ( | Gift from H.-U. Mösch |
|
| Sigma1278b background: | ( | Gift from H.-U. Mösch |
|
|
| ( | Gift from Y.I. Pavlov |
|
|
| ( | Gift from J. McCusker |
|
|
| ( | Gift from T. Petes |
|
| This work | Laboratory collection |
Comparison of the main assembly statistics for the sequenced genomes
| Quality metric | U-1A-D1628 (draft) |
| 74-D694 (draft) |
| S288C |
|---|---|---|---|---|---|
|
| 36 | 32 | 31 | 29 | 17 |
|
| 824 | 825 | 806 | 808 | 924 |
|
| 263.9 | 257.3 | 291.3 | 294.8 | n.a. |
|
| 206.5 | 35.4 | 241.5 | 41.4 | n.a. |
|
| 1,769 | 2,124 | 1,657 | 2,125 | 2,126 |
|
| 225 | 4 | 286 | 3 | 2 |
|
| 143 | 9 | 194 | 9 | 9 |
Contigs corresponding to the mitochondrion sequences were replaced by a complete circular mitochondrial DNA sequence obtained by hybrid assembly prior to quality control for both final assemblies.
Figure 1Chromosome-level assembly of the PGC strain genomes. (A) Major quality statistics for the final assemblies of U-1A-D1628 and 74-D694 genomes compared to the S288C reference. Assembly metrics were estimated using QUAST (Gurevich ). * indicates that number of SNPs and indels per 100 kbp were estimated using S288C sequence as reference. Number of BUSCOs was estimated using saccharomycetes_odb10 lineage-specific database. (B) Dotplot visualization of the whole-genome alignment of U-1A-D1628 and 74-D694 assemblies onto the S288C reference genome. Plots were obtained using D-GENIES (http://dgenies.toulouse.inra.fr). (C) Alignment of the mitochondrial genome sequences of the respective strains. For PGC strains, mitochondrial genome sequence was assembled using a hybrid assembly framework. Circular mtDNA sequences were aligned using MAFFT and the resulting MSA was visualized using a custom set of scripts. The resulting picture was manually corrected to reflect alignment gaps in repetitive regions.
Figure 2Analysis of TE location in the genomes of U-1A-D1628, 74-D694, S288C, and W303 strains. (A) Locations of complete Ty elements in the genomes of the indicated strains. For U-1A-D1628 (designated as “1A”), 74-D694 (“74”), and W303 short contigs were omitted. Line type indicates whether an element matches the contig in forward or reverse direction. (B) NJ trees constructed using the proportion of shared Ty element locations (left) or the number of single-nucleotide substitutions (right) estimated by assembly-to-reference alignment of complete genomes. (C). A scatterplot showing the relationship between pairwise fraction of shared Ty element locations and the number of single-nucleotide substitutions.
Figure 3Translocations detected in the strains and their influence on the invasive growth. (A) Translocations in 74-D694 and U-1A-D1628 strains affecting FLO genes. Gradient-colored fields between reference chromosomes depict possible translocation routes. Genes, nonfunctional due to frameshift mutations, are marked with an asterisk. (B) Confirmation of detected translocations with PCR. Numbers correspond to primers designated on (A); S, 74, and 1A correspond to S1, 74-D694, and U-1A-D1628 genomic DNA, used as a template; M, DNA molecular weight marker (SibEnzyme, 1 kb). (C). Agar invasion visualized using the wash test. The S. cerevisiae strains U-1A-D1628 and 74-D694, were transformed with plasmids containing FLO1, FLO5, FLO8, or FLO11 genes and streaked onto YPD plates. Plates were photographed before and after invasive growth assay (see Materials and Methods). As a positive control invasive growth was assessed for 10560-6B and 10560-23C yeast strains.
Figure 4Analysis of CUP1 and ENA gene copy number. (A) Visualization of the long read coverage profiles in the region of chromosome VIII spanning the CUP1 cluster. Orange lines represent aligned long reads, red dashed lines represent reads supporting duplication of the region. S288C reads are taken from the study by Giordano (see Materials and Methods). (B) Number of CUP1 gene copies relative to S288C estimated using qPCR analysis. Each point represents an independent biological replicate. (C) Copper sensitivity of different strains assessed by growth assay. Cells of the respective strains were plated onto the SC media containing indicated concentrations of CuSO4. Tenfold serial dilutions are shown. (D) Schematic representation of the genes in the ENA locus on chromosome IV for S288C (top) and PGC strains (bottom). For the 74-D694 strain, the ENA locus is identical to U-1A-D1628 and is located at chromosome IV: 527,320–546,117. (E) A heatmap representation of the numbers of single-nucleotide substitutions between the coding sequences of the ENA genes from the PGC strains and S288C.
Figure 5Amyloidogenicity prediction for the proteins harboring insertions or deletions in tandem repeat coding sequences. The amyloidogenic properties of the Nup100 (A) and Sch9 (B) proteins in different strains are displayed. The stars indicate significant substitutions. Coordinates of the substitutions represent positions in the alignment. Profiles were generated using ArchCandy software (Ahmed ).