Literature DB >> 34849824

Genomic diversity, chromosomal rearrangements, and interspecies hybridization in the Ogataea polymorpha species complex.

Sara J Hanson1, Eoin Ó Cinnéide2, Letal I Salzberg2, Kenneth H Wolfe2, Jamie McGowan3,4, David A Fitzpatrick3,4, Kate Matlin1.   

Abstract

The methylotrophic yeast Ogataea polymorpha has long been a useful system for recombinant protein production, as well as a model system for methanol metabolism, peroxisome biogenesis, thermotolerance, and nitrate assimilation. It has more recently become an important model for the evolution of mating-type switching. Here, we present a population genomics analysis of 47 isolates within the O. polymorpha species complex, including representatives of the species O. polymorpha, Ogataea parapolymorpha, Ogataea haglerorum, and Ogataea angusta. We found low levels of nucleotide sequence diversity within the O. polymorpha species complex and identified chromosomal rearrangements both within and between species. In addition, we found that one isolate is an interspecies hybrid between O. polymorpha and O. parapolymorpha and present evidence for loss of heterozygosity following hybridization.
© The Author(s) 2021. Published by Oxford University Press on behalf of Genetics Society of America.

Entities:  

Keywords:  zzm321990 Ogataeazzm321990 ; chromosomal rearrangements; interspecies hybridization; mating-type switching; population genomics

Mesh:

Year:  2021        PMID: 34849824      PMCID: PMC8496258          DOI: 10.1093/g3journal/jkab211

Source DB:  PubMed          Journal:  G3 (Bethesda)        ISSN: 2160-1836            Impact factor:   3.154


Introduction

The yeast Ogataea polymorpha is one of a small number of yeasts in the Pichiaceae family (Shen , 2018) with the ability to metabolize methanol as a sole carbon source (Yamada ; Kurtzman and Robnett 2010). The methylotrophic characteristics of O. polymorpha have made it an important model system for examining metabolic processes and peroxisome biology (Siverio 2002; van der Klei and Veenhuis 2002; Yurimoto ; Hartner and Glieder 2006). In addition, the strongly inducible promoters for genes involved in methanol metabolism have made it a useful tool for recombinant protein production (Gellissen and Melber 1996). O. polymorpha has also emerged as a model for the evolution of yeast mating-type switching. Yeast mating occurs between haploid cells of the opposite mating type (a and α), which is designated by the transcription factors present at the mating-type locus (MATa or MATα). When a haploid cell does not have an available mating partner, it can undergo a programmed DNA rearrangement to displace the genes found at the MAT locus and replace them with genes for the opposite mating type. This switching mechanism occurs through a two-locus “flip/flop” system in O. polymorpha, in which the MATa and MATα genes are found separated by 19 kb of sequence and flanked by 2 kb inverted repeat sequences (Hanson ; Maekawa and Kaneko 2014). The MAT region is adjacent to a centromere, resulting in transcriptional silencing of the MAT genes closest to the centromere and the designation of mating type by the distal MAT genes. Mating-type switching occurs through recombination between the flanking inverted repeats, which causes an inversion of the MAT region and a change in mating type. This mode of mating-type switching has been demonstrated in five yeast species (Hanson ; Maekawa and Kaneko 2014; Riley ; Yoko-O ; Wongwisansri ) and has been inferred by genome structure in 26 additional species, and appears to have evolved independently 11 times (Riley ; Krassowski ). Investigation into the genetic diversity in O. polymorpha will be a valuable resource for future investigation into its cellular processes and the impacts of mating-type switching on the evolution of its genome. Exploration of genetic diversity in yeast populations has revealed insights into intraspecific variation and the influence of recombination and selection on genome evolution (Peter and Schacherer 2016). Extensive datasets have been created to examine population dynamics in Saccharomyces cerevisiae (Strope ; Gallone ; Gonçalves ; Zhu ; Peter ), Schizosaccharomyces pombe (Fawcett ; Jeffares ), and pathogenic yeasts (Ford ; Hirakawa ; Carreté ; Ropars ; Wang ; Chow ). Population genomics have also been performed on a variety of nonmodel yeast species (Almeida ; Bergström ; Friedrich ; Ortiz-Merino ), including the methylotroph Komagataella phaffii (Braun-Galleani ) and nonmethylotrophic yeasts in the Pichiaceae family (Douglass ; Gounot ). In this study, we sequenced 47 isolates of yeast in the O. polymorpha species complex, representing four species (O. polymorpha, Ogataea parapolymorpha, Ogataea angusta, and Ogataea haglerorum). We examined the genome-wide genetic diversity across the isolates, as well as the genetic diversity in functional regions including centromeres, telomeres, and the MAT region. We further identified evidence of chromosomal rearrangements within and between species and found that one isolate is a diploid interspecies hybrid between O. polymorpha and O. parapolymorpha.

Materials and methods

Genomic DNA extractions

Overnight cultures of yeast were grown in YPD broth (1% w/v yeast extract, 2% w/v peptone, 2% w/v glucose) in a 37°C shaking incubator. Genomic DNA was extracted from the yeast samples using an Epicentre MasterPure Yeast DNA Purification Kit (Lucigen) according to manufacturer’s instructions or by acid-washed bead homogenization, phenol chloroform extraction, and concentration using a Genomic DNA Clean & Concentrator-10 kit (Zymo Research). For MinION library preparation, CBS1977 genomic DNA was extracted using the Qiagen Genomic Tip 100/G kit according to manufacturer’s instructions with the following modifications: a final wash step of 2 × 1 ml ethanol, during which the DNA pellet was transferred to an Eppendorf tube; the sample was then vacuum-centrifuge dried.

Genome sequencing and assembly

Genomic DNA libraries were prepared and sequenced by BGI Tech Solutions (Hong Kong). Approximately 100X genome coverage with 150-bp paired-end reads were generated for each isolate using an Illumina HiSeq 4000. Reads were assembled using SPAdes version 3.11 (Bankevich ) and contaminating sequences were removed using coverage-versus-length plots (Douglass ). Assembly statistics were generated using QUAST version 5.0.2 (Gurevich ). Structural variation was assessed by generating genome-wide pairwise dot plots using D-Genies version 1.2.0 (Cabanettes and Klopp 2018). For MinION sequencing library preparation, 400 ng of CBS1977 DNA was barcoded using a Rapid Barcoding Kit (SQK-RBK004). The final sample was concentrated using AMPure XP beads and re-eluted in 10 mM Tris 50 mM NaCl. The library sample was applied to a MinION flow cell (version FLO-MIN106) and run for 50 hours. Approximately 332,000 reads were generated, read quality was assessed using NanoPlot version 1.30.1 and all reads <1000 bp were filtered using NanoFilt version 2.7.1. The genome of CBS1977 was assembled using Canu version 1.8 (Koren ), using the following command: “canu -p canu -d canu_run2_fitlered_reads genomeSize = 8.9m corOutCoverage = 200 ‘batOptions=-dg 3 -db 3 -dr 1 -ca 500 -cp 50’ -nanopore-raw CBS1977_all_filtered_q7.fastq &.” Truncated or frameshifted protein-coding ORFs were predicted using IDEEL (Watson 2018). The assembly was polished two times with Pilon version 1.23 (Walker ) using Illumina sequencing data from CBS1977 (this study). IDEEL plots were generated to evaluate the expected ORF length.

Genome annotation

Gene annotation on each assembled genome was performed using Augustus version 3.3.3 (Stanke and Morgenstern 2005) with the following parameters: –strand=both –species=lodderomyces_elongisporus. tRNAs were annotated using tRNA-scanSE version 2.0.5 (Chan and Lowe 2019) with -E parameter. MAT regions for each genome were identified by performing a local blastn version 2.2.31 (Altschul ) search using the O. polymorpha NCYC495 MAT region sequence as a query. Identified MAT region annotations were manually curated using Artemis version 18.0.3 (Carver ).

Variant calling

Variant calling was performed within each species using the previously published O. polymorpha NCYC495 (Riley ) or O. parapolymorpha DL-1 (Ravin ) genome assemblies, and the O. angusta 61-244 (Oang9) or O. haglerorum 81-453-3 (Ohag10) genome assemblies from this study. Reference genome FASTA files were indexed with BWA version 0.7.17 (Li and Durbin 2009), SAMtools version 1.10 (Li ), and Picard version 2.22.5 (http://broadinstitute.github.io/picard/). Sequencing reads were mapped to the reference fasta files using the BWA-MEM algorithm with -M, -Y, and -R parameters. BAM alignment files generated by bwa were converted to SAM format, sorted, and indexed using SAMtools. Deduplication and indexing was performed using Picard. Structural variants were identified with Delly version 0.8.3 (Rausch ), filtered for “PASS,” and assessed through the manual confirmation of evidence in read mapping. Variants were called using GATK version 4 (Poplin ) HaplotypeCaller and compiled across isolates for each species using CombineGVCFs. VCF files for each isolate were generated from the GVCF files using GenotypeGVCF. Heterozygous variants were filtered from the dataset using VariantFiltration and SelectVariants. SNP density, nucleotide diversity (P), and Tajima’s D were calculated using VCFtools version 0.1.16 (Danecek ).

Population structure analysis

VCF files were converted to PHYLIP SNP alignments using a python script (Ortiz 2019). Maximum likelihood phylogenetic analysis was performed using PhyML version 3.1 (Guindon and Gascuel 2003) with GTR substitution model (Waddell and Steel 1997) and 100 bootstrap replicates.

Phylogenomics

A phylogenomics analysis was performed on all O. polymorpha species complex isolates, the previously published O. polymorpha NCYC495 genome (Riley ), an additional 20 Ogataea species (Shen ), and Pichia kudriavzevii (Douglass ) as an outgroup. A second analysis was also performed on a dataset containing one representative isolate for each of the newly sequenced O. angusta (Oang9), O. haglerorum (Ohag10), and O. parapolymorpha (Opar4) species. BUSCO analysis (Waterhouse ) revealed 1148 BUSCO families present in all isolates and 1278 BUSCO families that are present in all 25 species. Each BUSCO family was individually aligned with MUSCLE (Edgar 2004) and trimmed using trimAl (Capella-Gutierrez ) with the parameter “-automated1” to remove poorly aligned regions. Trimmed alignments were concatenated together resulting in a supermatrix alignment of 632,568 amino acids for the analysis including all isolates, and 644,187 amino acids for the analysis with one representative per species. To speed up computation, phylogenetically uninformative sites were removed from the alignment that contained one representative per species generating a final alignment of 319,116 amino acids. Maximum-likelihood (ML) phylogenetic reconstruction was performed using IQ-TREE (Nguyen ) with the LG+F+R4 model, which was the best-fit model according to ModelFinder (Kalyaanamoorthy ), and 100 bootstrap replicates were undertaken to infer branch support values. For the alignment that contained all isolates, an approximately maximum-likelihood phylogeny, and local support values were generated using FastTree (Price ). Both phylogenies were visualized and annotated using the Interactive Tree of Life (iTOL) (Letunic and Bork 2019).

Hybrid genome analysis

The size of the genome assembly for CBS1977 indicated that it was likely a diploid. BLAST analysis of segments of the genome assembly suggested that it resulted from a hybridization event between O. polymorpha and O. parapolymorpha. To determine parental contributions to the diploid hybrid CBS1977 genome, SWeBLAST (Fourment ) was used to perform nucleotide BLAST on 1000 bp windows of each MinION and Illumina contig in the genome assembly against the O. polymorpha NCYC495 and O. parapolymorpha DL-1 reference genomes with a 97% nucleotide identity cutoff.

Data availability

The genome sequence data, genome assemblies, and annotations generated in this study were submitted to the NCBI database under the BioProject accession number PRJNA706707. BioSample accessions are SAMN18128820-SAMN18128866, SRA accessions are SRR13943463-SRR13943509 and SRR13944969, and genome assembly and annotation accessions are JAHKSL000000000, JAHLUA000000000-JAHLUZ000000000, and JAHLVA000000000-JAHLVT000000000). Supplementary material is available at G3 online.

Results and discussion

Genome sequencing and assembly of 47 Ogataea isolates

We obtained 47 yeast isolates identified as O. polymorpha in the Phaff collection (University of California-Davis, CA, USA), the CBS collection (Westerdijk Fungal Diversity Institute, Utrecht, The Netherlands), and the NRRL collection (Agricultural Research Service, National Center for Agricultural Utilization Research, Peoria, IL, USA) (Table 1). Many of the isolates were originally isolated from decaying plant matter, soil, and insect frass, consistent with their methylotrophic characteristics, as the methanol and methoxy groups in decaying plant matter can be metabolized as a source of carbon (Fall and Benson 1996; Kurtzman and Robnett 2010). In addition, several clinical and agricultural samples were included from an infected human knee, catheter fungemia, swine intestine, and cow mastitis. The geographic distribution of the isolates included North America, Europe, Australia, and South Africa.
Table 1

Ogataea isolates sequenced in study

StrainSpeciesStrain IDSourcecLocationcPloidy
Opol1a O. polymorpha CBS4732/Y-5445/ATCC 34438SoilBrazilHaploid
Opol2 O. polymorpha CBS1976/NRRL Y-1798/ ATCC 14754/NCYC495Spoiled Florida orange juiceUSAHaploid
Opol3 O. polymorpha Phaff 72-225Glutinous/nonglutinous riceUSAHaploid
Opol4 O. polymorpha NRRL Y-2423Swine intestinal tractPortugalHaploid
Opol5 O. polymorpha CBS8852/NRRL Y-27293.Knee replacementWorcester, MA, USAHaploid
Opol6 O. polymorpha NRRL Y-27863/ATCC MYAA-3665Patient's blood, catheter infectionChicago, IL, USAHaploid
Opol7 O. polymorpha NRRL Y-6005Waste liquid from olive processingSpainHaploid
Opol8 O. polymorpha NRRL YB-179SoilCosta RicaHaploid
Opol9 O. polymorpha CBS5032Maize mealSouth AfricaHaploid
Opol10 O. polymorpha CBS7031SoilUnknownHaploid
Opol11 O. polymorpha CBS7239Catalase-negative mutant of CBS4732 (PMID 7000025)GermanyHaploid
Opar1a O. parapolymorpha CBS12304/NRRL YB-1982Insect frass, quaking aspenDuluth, MN, USAHaploid
Opar2 O. parapolymorpha Phaff 73-26SoilMA, USAHaploid
Opar3 × OpolHybrid (O. polymorpha × O. parapolymorpha)CBS1977Milk from cow with mastitisUKDiploid
Opar4 O. parapolymorpha CBS11895/NRRL Y-7560/ ATCC 26012SoilCambridge, MA, USAHaploid
Oang1 O. angusta Phaff 50-165/NRRL Y-2217 Drosophila pseudoobscura Jacksonville, CA, USAHaploid
Oang2 O. angusta Phaff 50-97/NRRL Y-2212 D. pseudoobscura Keen Camp, CA, USAHaploid
Oang3 O. angusta Phaff 51-138 D. pseudoobscura Mather, CA, USAHaploid
Oang4 O. angusta Phaff 51-177 Aulacigaster sp.Mather, CA, USAHaploid
Oang5 O. angusta Phaff 52-251 D. pseudoobscura Mather, CA, USAHaploid
Oang6 O. angusta Phaff 60-394/ATCC 24190 D. pseudoobscura Winters, CA, USAHaploid
Oang7 O. angusta Phaff 61-224 Aulacigaster sp.Gualala River, CA, USAHaploid
Oang8 O. angusta Phaff 61-235 Drosophila viridis Gualala River, CA, USAHaploid
Oang9b O. angusta Phaff 61-244 Aulacigaster sp.Gualala River, CA, USAHaploid
Oang10 O. angusta CBS2575/NCYC1450 Aulacigaster sp.USAHaploid
Ohag1 O. haglerorum Phaff 78-557.3 Opuntia stricta Hemmant, Queensland, AUHaploid
Ohag2 O. haglerorum Phaff 79-204.41 O. stricta Hemmant, Queensland, AUHaploid
Ohag3 O. haglerorum Phaff 81-408.1 O. phaeacantha Saguaro Natl. Monument West, AZ, USAHaploid
Ohag4 O. haglerorum Phaff 81-410 O. phaeacantha Saguaro Natl. Monument West, AZ, USAHaploid
Ohag5 O. haglerorum Phaff 81-419.3 O. phaeacantha Bear Canyon, Tucson, AZ, USAHaploid
Ohag6 O. haglerorum Phaff 81-419.5 O. phaeacantha Bear Canyon, Tucson, AZ, USAHaploid
Ohag7 O. haglerorum Phaff 81-433.4 O. phaeacantha Santa Rita Mountains, Tucson, AZ, USAHaploid
Ohag8 O. haglerorum Phaff 81-436.3 O. phaeacantha Santa Rita Mountains, Tucson, AZ, USAHaploid
Ohag9 O. haglerorum Phaff 81-440.2 O. phaeacantha Santa Rita Mountains, Tucson, AZ, USAHaploid
Ohag10b O. haglerorum Phaff 81-453.3 O. phaeacantha Near Sells, AZ, USAHaploid
Ohag11 O. haglerorum Phaff 81-461.3 O. phaeacantha Near Sells, AZ, USAHaploid
Ohag12 O. haglerorum Phaff 81-463.1 O. phaeacantha Near Sells, AZ, USAHaploid
Ohag13 O. haglerorum Phaff 81-471.3 O. phaeacantha Rincon Mountains, AZ, USAHaploid
Ohag14 O. haglerorum Phaff 81-480 O. phaeacantha Rincon Mountains, AZ, USAHaploid
Ohag15 O. haglerorum Phaff 83-405.1 O. phaeacantha Tucson Mountains, AZ, USAHaploid
Ohag16 O. haglerorum Phaff 83-425.4 O. phaeacantha Tucson, AZ, USAHaploid
Ohag17 O. haglerorum Phaff 83-437.2.1 O. phaeacantha Santa Rita Mountains, Tucson, AZ, USAHaploid
Ohag18 O. haglerorum Phaff 83-437.2.2 O. phaeacantha Santa Rita Mountains, Tucson, AZ, USAHaploid
Ohag19 O. haglerorum Phaff 83-442.1 O. phaeacantha AZ, USAHaploid
Ohag20 O. haglerorum Phaff 83-471.3 O. phaeacantha Santa Catalina Mountains, AZ, USAHaploid
Ohag21 O. haglerorum Phaff 83-474.2 O. phaeacantha Pima Canyon, Tucson, AZ, USAHaploid
Ohag22 O. haglerorum Phaff 83-476.5 O. phaeacantha Pima Canyon, Tucson, AZ, USAHaploid

Type strain.

Reference strain for varaint analysis.

Information provided in culture collection database.

Ogataea isolates sequenced in study Type strain. Reference strain for varaint analysis. Information provided in culture collection database. Following short-read sequencing and genome assembly of the isolates, we identified them as representatives of four distinct species previously described as members of the O. polymorpha species complex (Suh and Zhou 2010; Kurtzman 2011; Naumov ): O. polymorpha (11 isolates), O. angusta (10 isolates), O. haglerorum (22 isolates), and O. parapolymorpha (3 isolates) (Supplementary Figure S1). Post-zygotic isolation has been described between O. polymorpha, O. angusta, and O. haglerorum, for which hybrids show reduced spore viability (Naumov ). O. parapolymorpha industrial strain DL-1 is “semi-sterile” (Lahtchev ) due to a mutation in the nitrogen-sensing transcription factor EFG1, although it is able to form rare diploids when crossed with O. polymorpha (Hanson ). Of the 3 O. parapolymorpha isolates, 2 showed very high nucleotide similarity (>99%) to DL-1. The third isolate, Opar1 (CBS12304T), has an intact EFG1 locus without the single nucleotide insertion found in DL-1. This may account for the previously described homothallic behavior of this isolate (Suh and Zhou 2010; Kurtzman 2011). The genome assemblies for all four species were similar in their overall size (8.8–8.9 Mb; Table 2). The N50 of the genome assemblies for the haploid strains ranged from 263.7 to 856.3 kb and the number of contigs ranged from 45 to 154 (Table 2). The GC content of O. parapolymorpha strain DL-1 (47.8%) has previously been described as higher than many other yeast species such as S. cerevisiae (38%; Peter ), which may be related to its thermotolerant characteristics (Ravin ). We found that the GC content for the O. polymorpha isolates was similarly high (47.7–47.8%), and that the GC content for the O. angusta and O. haglerorum isolates was even higher at 49.4–49.5% (Table 2).
Table 2

Ogataea genome assembly and annotation statistics

StrainStrain IDGenome length (Mb)N50 (kb)# Contigs% GCtRNA GenesProtein-coding genes
Opol1aCBS4732/Y-5445/ATCC 344388.93556.06747.7975,442
Opol2CBS1976/NRRL Y-1798/ATCC 14754/NCYC4958.93556.46447.7965,446
Opol3Phaff 72-2258.95788.84547.7975,444
Opol4NRRL Y-24238.92608.67947.7985,436
Opol5CBS8852/NRRL Y-27293.8.95636.56447.7995,451
Opol6NRRL Y-27863/ATCC MYAA-36658.97616.15147.7975,454
Opol7NRRL Y-60058.91631.15147.7965,431
Opol8NRRL YB-1798.93552.08547.7995,440
Opol9CBS50328.9626.95247.8995,417
Opol10CBS70318.97636.47947.7995,455
Opol11CBS72398.94516.46747.7975,442
Opar1aCBS12304/NRRL YB-19828.87557.15547.7975,417
Opar2Phaff 73-268.92263.711247.8995,456
Opar3 × OpolCBS197714.8830.1152147.91559,920
Opar4CBS11895/NRRL Y-7560/ATCC 260128.92618.18747.8975,424
Oang1Phaff 50-165/NRRL Y-22178.88848.75449.5975,409
Oang2Phaff 50-97/NRRL Y-22128.88655.97349.4975,437
Oang3Phaff 51-1388.89553.912749.51005,430
Oang4Phaff 51-1778.89743.69649.5975,443
Oang5Phaff 52-2518.89651.610849.5975,437
Oang6Phaff 60-394/ATCC 241908.88856.35749.5975,419
Oang7Phaff 61-2248.91651.815449.4995,453
Oang8Phaff 61-2358.9557.210749.5975,452
Oang9bPhaff 61-2448.91850.210149.4975,446
Oang10CBS2575/NCYC14508.91787.010949.5975,441
Ohag1Phaff 78-557.38.85583.55049.4975,390
Ohag2Phaff 79-204.418.85555.74949.4985,392
Ohag3Phaff 81-408.18.87556.86849.4995,393
Ohag4Phaff 81-4108.86416.07349.4975,412
Ohag5Phaff 81-419.38.87465.910949.41035,419
Ohag6Phaff 81-419.58.86632.46349.41005,415
Ohag7Phaff 81-433.48.86583.96549.41005,401
Ohag8Phaff 81-436.38.87556.47149.4975,407
Ohag9Phaff 81-440.28.87555.97549.4975,404
Ohag10bPhaff 81-453.38.88556.77349.4995,411
Ohag11Phaff 81-461.38.85579.95249.4995,395
Ohag12Phaff 81-463.18.86584.16249.4975,404
Ohag13Phaff 81-471.38.86556.96449.4975,413
Ohag14Phaff 81-4808.86437.76649.4975,423
Ohag15Phaff 83-405.18.86466.26649.4975,408
Ohag16Phaff 83-425.48.87583.57449.4995,422
Ohag17Phaff 83-437.2.18.86466.25649.4985,413
Ohag18Phaff 83-437.2.28.86497.35749.4985,401
Ohag19Phaff 83-442.18.86584.35949.4975,402
Ohag20Phaff 83-471.38.86556.46549.4975,412
Ohag21Phaff 83-474.28.86582.97149.4995,402
Ohag22Phaff 83-476.58.89632.28949.41015,406

Type strain.

Reference strain for varaint analysis.

Ogataea genome assembly and annotation statistics Type strain. Reference strain for varaint analysis.

Phylogenomic analysis resolves interspecies relationships in the O. polymorpha species complex

To establish the relationships among the four species in the O. polymorpha species complex, and their relationship to other species in the Ogataea genus, we performed a phylogenomic analysis using shared Benchmarking Universal Single-Copy Orthologs (BUSCO) (Waterhouse ). The annotation was performed using the 2137 genes in the Saccharomycetes BUSCO set and was highly complete (96.7–97.3%) for each of the 47 genomes sequenced (Supplementary Figure S2). Maximum likelihood analysis of a concatenated alignment of the 1278 BUSCO sequences shared across 25 yeast species resolved the relationship among the four members of the O. polymorpha species complex (Figure 1), which matches the relationships shown by previous analysis of the rDNA and translation elongation factor-1α sequences (Naumov ). The topology of the rest of the tree is consistent with previous analysis (Shen ).
Figure 1

Relationship of the O. polymorpha species complex to other Ogataea species. Supermatrix phylogeny of 24 Ogataea species derived from 1278 BUSCO families giving an alignment 319,116 amino acids in length. P. kudriavzevii is included as an outgroup. Maximum Likelihood phylogeny was reconstructed with IQ-TREE implementing the JTT+F+R5 model. Bootstrap support values are indicated at all nodes.

Relationship of the O. polymorpha species complex to other Ogataea species. Supermatrix phylogeny of 24 Ogataea species derived from 1278 BUSCO families giving an alignment 319,116 amino acids in length. P. kudriavzevii is included as an outgroup. Maximum Likelihood phylogeny was reconstructed with IQ-TREE implementing the JTT+F+R5 model. Bootstrap support values are indicated at all nodes.

Population structure in the O. polymorpha species complex

Previous studies showed that species within the O. polymorpha species complex occupy different environmental niches, with specificity of O. parapolymorpha to insect-damaged trees (Kurtzman 2011), O. haglerorum to rotting Opuntia cacti (Naumov ), and O. angusta to insects (Suh and Zhou 2010). In contrast, O. polymorpha has been described as a generalist, having been isolated from a variety of sources and with broad geography (Kurtzman 2011). The information available for our sequenced isolates are consistent with these observations, as the O. polymorpha isolates were sampled from clinical, soil, and agricultural sources from a broad geographic distribution, the O. haglerorum isolates were sampled from rotting Opuntia samples in Australia and Arizona, and O. angusta isolates were sampled from Drosophila and Aulacigaster insect species (Table 1). For each of the Ogataea species, we examined genomic diversity by aligning reads to a reference genome [previously published NCYC495 for O. polymorpha (Riley ) and DL-1 for O. parapolymorpha (Ravin ) and Oang9 for O. angusta and Ohag10 for O. haglerorum generated in this study]. Single nucleotide polymorphisms (SNPs) and insertion-deletion (indels) from the mapped reads were quantified (Table 3). We then analyzed the population structure for three of the four Ogataea species in our study using maximum likelihood analysis of SNP alignments (O. parapolymorpha was excluded due to the low number of representative isolates).
Table 3

Summary of genetic variation in Ogataea

Total
Per kb
StrainSNPIndelSNPIndel
Relative to NCYC495
Opol1a26,8241,2753.000.14
Opol29,6096611.080.07
Opol342,3991,8204.740.20
Opol431,8821,4253.570.16
Opol535,0331,5783.910.18
Opol638,8781,6654.340.19
Opol742,5861,8814.780.21
Opol836,7721,7074.120.19
Opol933,8771,5463.800.17
Opol1049,3072,0495.500.23
Opol1127,0731,2193.030.14
Relative to DL-1
Opar1a113,0303,22112.740.36
Opar22075690.020.06
Opar41975580.020.06
Relative to Oang9
Oang152,1921,9915.880.22
Oang247,9371,8685.400.21
Oang348,8321,7895.490.20
Oang448,9601,7885.510.20
Oang548,8441,7955.490.20
Oang647,0591,7425.300.20
Oang748,6131,7785.460.20
Oang847,7351,6425.360.18
Oang9bn/an/an/an/a
Oang1049,1831,7665.520.20
Relative to Ohag10
Ohag119,9719792.260.11
Ohag219,9389312.250.11
Ohag320,5811,0552.320.12
Ohag420,5089742.310.11
Ohag520,2381,0092.280.11
Ohag620,1891,0012.280.11
Ohag719,6659892.220.11
Ohag820,8991,0262.360.12
Ohag920,7251,0592.340.12
Ohag10bn/an/an/an/a
Ohag1115,4248211.740.09
Ohag1219,5239742.200.11
Ohag1320,9971,0532.370.12
Ohag1420,7931,0032.350.11
Ohag1520,4471,0022.310.11
Ohag1620,2341,0152.280.11
Ohag1719,9539432.250.11
Ohag1819,9139342.250.11
Ohag1920,6991,0432.340.12
Ohag2020,7881,0782.350.12
Ohag2120,6631,0472.330.12
Ohag2219,5349692.210.11

Type strain.

Reference strain for varaint analysis.

Summary of genetic variation in Ogataea Type strain. Reference strain for varaint analysis. The population structure of O. polymorpha corresponded better to geography than the source from which the samples were isolated (Figure 2A). North American samples grouped together, except for Opol3, for which the isolation location is unclear in the Phaff collection database. The clustering of isolates was also consistent with geography for two European (Opol4 and Opol7) and two South American isolates, although Opol11 is a derivative of Opol1 (CBS4732). Two clinical samples isolated from the United States (Opol5 and Opol6) were grouped together and are most closely related to the industrial strain NCYC495 (Opol2).
Figure 2

Population structure of the O. polymorpha species complex. Maximum likelihood phylogenies created using SNP alignments for (A) O. polymorpha, (B) O. angusta, and (C) O. haglerorum isolates. Bootstrap support was 100% except where indicated below the branch, and branch lengths are given above each branch. Geographic information for isolates is indicated using colored boxes. (D) Supermatrix phylogeny of 48 Ogataea isolates generated using 1,148 BUSCO families.

Population structure of the O. polymorpha species complex. Maximum likelihood phylogenies created using SNP alignments for (A) O. polymorpha, (B) O. angusta, and (C) O. haglerorum isolates. Bootstrap support was 100% except where indicated below the branch, and branch lengths are given above each branch. Geographic information for isolates is indicated using colored boxes. (D) Supermatrix phylogeny of 48 Ogataea isolates generated using 1,148 BUSCO families. O. angusta and O. haglerorum isolates have a much more limited geographic distribution (Table 1), and did not show a high degree of population structure (Figure 2, B and C). The O. angusta isolates were obtained from insect samples in northern California, near Sacramento. The topology of the O. angusta tree shows that samples collected from the same location are more similar to one another independent of the species of insect from which they were isolated (Figure 2B). O. haglerorum isolates were sampled from Opuntia cacti in southern Arizona, except for two from Queensland, Australia. Although the two Australian isolates (Ohag1 and Ohag2) group together, the population structure of the other isolates do not correspond to geography. For example, isolates from the Santa Rita Mountains (Ohag7, Ohag8, Ohag9, Ohag17, and Ohag18) do not form a monophyletic group (Figure 2C). The Australian samples do not show a high amount of divergence from the rest of the samples, which likely reflects the introduction of Opuntia species to Australia from their native United States with the last two to three centuries (Friedel 2020).

Evidence for structural variation within and between species

Structural variation, including inversions, translocations, copy number variations, and fusions/fissions, has roles in adaptation and speciation by impacting gene expression and recombination (Mérot ). Isolates of O. polymorpha, O. parapolymorpha, and O. haglerorum show an overall high amount of synteny, with very few structural rearrangements. Two isolates of O. polymorpha contained rearrangements relative to the NCYC495 reference genome. Opol9 (CBS5032) has a 335-kb pericentromeric inversion in chromosome 4, as well as two translocations that combine parts of chromosomes 3, 5, and 7 (Figure 3, A–C). Opol4 contains a translocation between chromosomes 1 and 5 (Figure 3D). These rearrangements do not involve repetitive genomic elements; most occur in intergenic regions, while the Opol9 chromosome 4 inversion breakpoints occur within the cytochrome b2 locus (CYB2) (Figure 3, A and B). The inversion interrupts the 1377 bp CYB2 locus into a 654 and 723 bp loci. One of the translocation breakpoints occurs adjacent to the ZPS1 sequence, which is a GPI-anchored protein that responds to low zinc conditions that is present in two tandem copies on chromosome 7 (Figure 3C).
Figure 3

Structural rearrangements in O. polymorpha. Chromosomal breakpoints identified in O. polymorpha isolate Opol9 (CBS5032) on (A) NODE_10, (B) NODE_5, and (C) NODE_2, and in O. polymorpha isolate Opol4 (NRRL Y-2423) on (D) NODE_14 are detailed. Chromosomes are numbered based on O. polymorpha NCYC495 genome assembly (shown at the bottom in each panel) and color-coding of genes corresponds to their locations in the NCYC495 genome. White circles indicate the location of centromeres and white boxes indicate the location of a genomic repeat sequence that is found on four chromosomes in the NCYC495 genome.

Structural rearrangements in O. polymorpha. Chromosomal breakpoints identified in O. polymorpha isolate Opol9 (CBS5032) on (A) NODE_10, (B) NODE_5, and (C) NODE_2, and in O. polymorpha isolate Opol4 (NRRL Y-2423) on (D) NODE_14 are detailed. Chromosomes are numbered based on O. polymorpha NCYC495 genome assembly (shown at the bottom in each panel) and color-coding of genes corresponds to their locations in the NCYC495 genome. White circles indicate the location of centromeres and white boxes indicate the location of a genomic repeat sequence that is found on four chromosomes in the NCYC495 genome. In O. haglerorum, six isolates have structural rearrangements relative to the O. polymorpha NCYC495 reference genome assembly (Figure 4). Three of these rearrangements are translocations that occur in repetitive genomic elements. Ohag3 (Phaff 81-408-1) contains a translocation between chromosomes 2 and 7 at the rDNA locus (Figure 4A), which is found adjacent to a centromere containing repetitive Ty-like retrotransposable and long terminal-repeat elements on chromosome 7 in O. polymorpha. For a translocation between chromosomes 1 and 6 that is shared between Ohag10 and Ohag11 (Figure 4B), as well as a translocation between chromosomes 6 and 7 in Ohag21 (Figure 4C), the breakpoints occur at a 1-kb repeat sequence. In NCYC495, this sequence occurs on four chromosomes (chromosomes 1, 2, 6, and 7) with 97–98% nucleotide identity between each copy. Based on the topology of the SNP phylogeny (Figure 2C), the shared translocation in Ohag10 and Ohag11 is likely to be the result of a single rearrangement event. Two isolates have translocations in intergenic regions: Ohag17 contains a translocation between chromosomes 1 and 4 (Figure 4D), and Ohag9 has a translocation between chromosomes 1 and 2 (Figure 4E). In Ohag9, the translocated regions of chromosomes 1 and 2 are separated by three genes that are found on chromosomes 2, 3, and 5 in NCYC495.
Figure 4

Structural rearrangements in O. haglerorum. Chromosomal breakpoints identified in O. haglerorum isolates (A) Ohag3 (81-408-1) on NODE_5, (B) Ohag10 (81-453-3) on NODE_15 and Ohag11 (81-461-3) on NODE_16, (C) Ohag21 (83-474-2) on NODE_11, (D) Ohag17 (83-437-2-1) on NODE_14, and (E) Ohag9 (81-440-2) on NODE_10 are detailed. Chromosomes are numbered based on O. polymorpha NCYC495 genome assembly and color-coding of genes corresponds to their locations in the NCYC495 genome. White circles indicate the location of centromeres and white boxes indicate the location of a genomic repeat sequence that is found on four chromosomes in the NCYC495 genome.

Structural rearrangements in O. haglerorum. Chromosomal breakpoints identified in O. haglerorum isolates (A) Ohag3 (81-408-1) on NODE_5, (B) Ohag10 (81-453-3) on NODE_15 and Ohag11 (81-461-3) on NODE_16, (C) Ohag21 (83-474-2) on NODE_11, (D) Ohag17 (83-437-2-1) on NODE_14, and (E) Ohag9 (81-440-2) on NODE_10 are detailed. Chromosomes are numbered based on O. polymorpha NCYC495 genome assembly and color-coding of genes corresponds to their locations in the NCYC495 genome. White circles indicate the location of centromeres and white boxes indicate the location of a genomic repeat sequence that is found on four chromosomes in the NCYC495 genome. In O. angusta, multiple translocations are found between the genome of Oang4 (51–177) and the NCYC495 genome (Figure 5). One of the breakpoints is a translocation between the centromeres of chromosomes 5 and 7 (Figure 5B). The rest of the breakpoints are translocations between chromosomes at the same 1-kb repeat region involved in O. haglerorum translocations (Figure 5, A–D). Other O. angusta isolates have contig breaks in their assemblies at many of the rearrangement breakpoints, leaving the possibility that these rearrangements are shared broadly within the species. These chromosomal rearrangements could explain the reduced spore viability observed in interspecies crosses within the O. polymorpha species complex (Naumov ).
Figure 5

Structural rearrangements in O. angusta. Chromosomal breakpoints identified in O. angusta isolate Oang4 (51-177) on (A) NODE_1, (B) NODE_5 (C) NODE_2, and (D) NODE_6 are detailed. Chromosomes are numbered based on O. polymorpha NCYC495 genome assembly and color-coding of genes corresponds to their locations in the NCYC495 genome. White circles indicate the location of centromeres and white boxes indicate the location of a genomic repeat sequence that is found on four chromosomes in the NCYC495 genome.

Structural rearrangements in O. angusta. Chromosomal breakpoints identified in O. angusta isolate Oang4 (51-177) on (A) NODE_1, (B) NODE_5 (C) NODE_2, and (D) NODE_6 are detailed. Chromosomes are numbered based on O. polymorpha NCYC495 genome assembly and color-coding of genes corresponds to their locations in the NCYC495 genome. White circles indicate the location of centromeres and white boxes indicate the location of a genomic repeat sequence that is found on four chromosomes in the NCYC495 genome. We further examined the genomes of the O. polymorpha species complex isolates for evidence of copy number variations (CNVs). We identified two duplications and 22 deletions that were at least 100 bp in length (Supplementary Figure S3). Several of these impact genes with roles in nutrient uptake (e.g., allantoin permease, inositol transporter, amino acid transporter, and sugar transporter). In Oang3, we identified a deletion that included the transcription factor RME1, which is required for mating-type switching and mating in O. polymorpha (Hanson ; Yamamoto ). Loss of this gene potentially impacts the fertility of this isolate. If so, along with the semi-sterility observed in O. parapolymorpha DL-1 resulting from the loss of function of the transcription factor EFG1 (Hanson ), this would be the second example of fertility loss due to disruption of a transcriptional regulator.

Genetic variation in the O. polymorpha species complex

A concatenated alignment of 1,148 BUSCO amino acid sequences for the O. polymorpha species complex isolates indicated a low level of sequence divergence between species (Figure 2D). We used JSpecies (Richter ) to compare the genome sequences among the 47 isolates and found the pairwise average nucleotide identity between species in the O. polymorpha species complex ranges from 86.7% (O. polymorpha vs O. haglerorum) to 93.7% (O. polymorpha vs O. parapolymorpha). We assessed the SNP diversity between isolates within each species in the O. polymorpha species complex (Table 3). We found that O. parapolymorpha isolate Opar1 (CBS12304) showed the highest SNP density (12.74 SNPs/kb) relative to the reference DL-1 genome sequence (Table 3, Supplementary Table S1). O. haglerorum isolates demonstrated the least amount of diversity among isolates, with average genome-wide SNP density between 1.74 and 2.36 SNPs/kb relative to Ohag10 (Table 3, Supplementary Table S2). O. angusta isolates SNP density ranged between 5.30 and 5.88 SNPs/kb relative to Oang9 (Table 3, Supplementary Table S3), which is higher than what has been described in wild isolates of S. cerevisiae (median 4.1 SNPs/kb) (Peter ). O. polymorpha isolate SNP density was 3.00-5.50 SNPs/kb relative to the NCYC495 reference genome sequence (Table 3, Supplementary Table S4), which is comparable to the 1.66-4.66 SNPs/kb observed in the methylotrophic yeast K. phaffii (Braun-Galleani ). The previously published O. polymorpha genome sequence (Riley ), is a laboratory strain derived from CBS1976/NCYC495 (Opol2), and had 1.08 SNPs/kb relative to our Opol2 genome assembly. We next examined the distribution of SNPs across the genome for each species (Figure 6). SNP density was similar between chromosomes/contigs for each isolate (Figure 6, Supplementary Tables S1–S4). For each species, SNP density was higher in telomeric regions than the rest of the genome and was higher than or similar to the genome-wide average in centromeric regions (Figures 6 and 8, Supplementary Table S5). We did not observe notable large-scale fluctuations in signatures of selection (Tajima’s D; Figure 6).
Figure 6

Genome-wide genetic diversity in Ogataea species. Plots show density of SNPs (SNPs/kb) and Tajima’s D calculated in 10 kb windows across the genome for all isolates of (A) O. polymorpha, (B) O. haglerorum, and (C) O. angusta. Schematics below each set of plots indicate chromosome with position of centromeres indicated by purple circles and the MAT region indicated by orange boxes. O. haglerorum and O. angusta contigs greater than 50 kb in length were ordered according to their alignment with the O. polymorpha genome, and contig break locations in reference genomes (Oang9 and Ohag10) are indicated by dashed gray lines.

Figure 8

SNP Density at Genomic Features in Ogataea. Box and whisker plots show the SNPs/kb at telomeres (within 50 kb of terminal contig ends in genome assemblies), centromeres, at the centromere of chromosome 3, the mating-type locus, and genome-wide for O. polymorpha, O. angusta, and O. haglerorum.

Genome-wide genetic diversity in Ogataea species. Plots show density of SNPs (SNPs/kb) and Tajima’s D calculated in 10 kb windows across the genome for all isolates of (A) O. polymorpha, (B) O. haglerorum, and (C) O. angusta. Schematics below each set of plots indicate chromosome with position of centromeres indicated by purple circles and the MAT region indicated by orange boxes. O. haglerorum and O. angusta contigs greater than 50 kb in length were ordered according to their alignment with the O. polymorpha genome, and contig break locations in reference genomes (Oang9 and Ohag10) are indicated by dashed gray lines.

Genetic diversity of the MAT region in the O. polymorpha species complex

All four species in the O. polymorpha species complex have previously been described as homothallic (Suh and Zhou 2010; Kurtzman 2011; Kurtzman ; Naumov ), suggesting that they are all able to undergo the flip/flop mating-type switching mechanism previously described in O. polymorpha and other species in the Ogataea genus (Hanson ; Maekawa and Kaneko 2014; Krassowski ; Yoko-O ; Wongwisansri ). We annotated the MAT region in the newly sequenced O. angusta and O. haglerorum genomes. The O. angusta MAT region was the same size and contained the same set of genes as the previously annotated O. polymorpha and O. parapolymorpha sequences (Figure 7A) (Hanson ). The O. haglerorum MAT region was ∼500 bp shorter in length (18 vs 18.5 kb) due to the HPODL_4020 locus, which has no known role in sexual processes, undergoing pseudogenization (Figure 7A). The degradation of the HPODL_4020 sequence was shared across all 22 of the O. haglerorum sequences. Other variations in Ogataea MAT region gene content have been described. The O. thermomethanolica MAT region lacks the TPK3 locus (Wongwisansri ), while in O. minuta the MAT region is longer (23 vs 18.5 kb) despite containing the same set of genes. O. minuta also has longer inverted repeat sequences that flank the MAT region (3.6 vs 2 kb) (Yoko-O ).
Figure 7

Genetic diversity in the Ogataea MAT Region. (A) Schematic of 19 kb MAT region content, drawn to scale. The genes specifying mating-type a are shown in green, and those specifying mating-type a are shown in pink. The gene HPODL_4020 (shown in gray) is a pseudogene in O. haglerorum. Plots show density of SNPs (SNPs/kb) and Tajima’s D calculated in 1 kb windows across the MAT region, and 100 kb upstream and downstream for (B) O. polymorpha, (C) O. angusta, and (D) O. haglerorum. Gray dashed lines indicate contig breaks. Schematic at the bottom shows the location of the centromere (purple), the MAT region (orange), and the inverted repeat sequences (blue).

Genetic diversity in the Ogataea MAT Region. (A) Schematic of 19 kb MAT region content, drawn to scale. The genes specifying mating-type a are shown in green, and those specifying mating-type a are shown in pink. The gene HPODL_4020 (shown in gray) is a pseudogene in O. haglerorum. Plots show density of SNPs (SNPs/kb) and Tajima’s D calculated in 1 kb windows across the MAT region, and 100 kb upstream and downstream for (B) O. polymorpha, (C) O. angusta, and (D) O. haglerorum. Gray dashed lines indicate contig breaks. Schematic at the bottom shows the location of the centromere (purple), the MAT region (orange), and the inverted repeat sequences (blue). Chromosomal inversions are an example of a negative recombination modifier due to the inviability of products that result when meiotic recombination occurs between chromosomes that are heterozygous for the inversion (Schaeffer 2008; Wellenreuther and Bernatchez 2018). The structure of the MAT region causes it to be a heterozygous inversion in any diploid cell formed by mating between two cells of opposite mating types. A recombination event in this region in a diploid cell would lead to large-scale chromosomal rearrangements that would result in inviable meiotic products (Hanson ). We therefore expect that there should be little or no recombination in the region between the MATa and MATα genes in natural populations of Ogataea. To assess the impact of the MAT inversion on genetic diversity, we examined SNP density in the MAT region. Overall, SNP density in the MAT region (including the adjacent inverted repeat sequence) in each species is lower than in the rest of the genome (Figures 7, B–D and 8, Supplementary Table S5). In addition, for the majority of isolates the centromere adjacent to the MAT region (CEN3) showed SNP density similar to the MAT region (Figures 7, B–D and 8, Supplementary Table S5), much lower than the genome-wide average and the average across all centromeres (Figure 8, Supplementary Table S5). SNP Density at Genomic Features in Ogataea. Box and whisker plots show the SNPs/kb at telomeres (within 50 kb of terminal contig ends in genome assemblies), centromeres, at the centromere of chromosome 3, the mating-type locus, and genome-wide for O. polymorpha, O. angusta, and O. haglerorum. The pattern of reduced SNP density at the MAT region may be due to its centromere proximity. In S. cerevisiae, meiotic recombination is suppressed within 10 kb of centromeres (Mancera ; Pan ) due to the suppression of Spo11-mediated double-strand breaks by the kinetochore and pericentric cohesin complexes found in these regions (Vincenten ; Nambiar and Smith 2018; Kuhl and Vader 2019). In addition, the centromeres of O. polymorpha contain repetitive LTR and Ty-like retrotransposable elements. These elements have 15-fold suppression of meiotic recombination on average compared to the rest of the genome in S. cerevisiae (Pan ), although there is substantial variation in rates of double-strand break formation across specific Ty elements in this species (Sasaki ). These features suggest that the low nucleotide sequence diversity at centromeres and across the MAT region in the O. polymorpha species complex may be the result of the low levels of recombination at these features. Rates of recombination and nucleotide diversity are hypothesized to be positively correlated due to background selection (Kaiser and Charlesworth 2009; Charlesworth and Campos 2014). Background selection can lead to Hill-Robertson interference when neutral variants are purged due to their linkage to deleterious mutations. The evidence for the relationship between recombination and nucleotide diversity varies across species of plants, animals, and fungi (Cutter and Payseur 2013), and has been shown to be correlated in Sch. pombe (Jeffares ) and to have a weak correlation in S. cerevisiae (Cutter and Moses 2011; Cutter and Payseur 2013). In the methylotrophic yeast K. phaffii, which also uses flip/flop mating-type switching, meiotic recombination rates were 3.5X lower genome-wide than in S. cerevisiae, and nucleotide diversity was lower in the 150–200 kb surrounding centromeres (Braun-Galleani ). In K. phaffii, the invertible MAT region is much larger than in O. polymorpha (138 kb) and contains a centromere. Although there was no evidence for meiotic recombination in this region in K. phaffii, high nucleotide diversity was observed in contrast to the other centromeres. In “pseudo-homothallic” fungal species, linkage of the MAT loci to a low or no recombination region like a centromere is thought to ensure that mating types will segregate during meiosis I (Hood and Antonovics 2004; Knop 2006; Ellison ). A similar logic might apply in this case, in which recombination would not only potentially prevent proper segregation of mating types in meiosis, but could result in gross structural rearrangements in the genome for a diploid that has a heterozygous inversion. The proximity of the MAT region to a centromere may therefore reduce the likelihood of recombination occurring in this region. In the case of K. phaffii, an additional set of inverted repeats within the invertible region may allow for recombination events to occur that reestablish collinearity in a diploid to allow for meiotic recombination to occur in the region (Hanson ).

CBS1977 is an interspecies diploid hybrid between O. polymorpha and O. parapolymorpha

The short-read genome assembly for CBS1977 was nearly twice the length of the other 46 isolates (Table 2), and blastn analysis of several contigs suggested that it is an interspecies diploid hybrid that resulted from a cross between O. polymorpha and O. parapolymorpha. Interspecies hybridization has played a critical role in the evolution of yeasts (Marcet-Houben and Gabaldón 2015; Gabaldón 2020) and has been demonstrated to facilitate adaption and generate biodiversity (Smukowski Heil ; Tusso ; Zhang ). Yeast interspecies hybrids have been observed frequently in isolates from anthropogenic environments, such as industrial, agricultural, or clinical samples (Louis ; Hittinger 2013; Pryszcz , 2015; Wendland 2014; Hagen ; Schröder ; Ortiz-Merino , 2018; Braun-Galleani ; Lopandic 2018; Mixão and Gabaldón 2018; Piombo ; Smukowski Heil ; Samarasinghe ), which may be attributed to increased stress tolerance for these isolates due to their heterozygosity. CBS1977 is potentially an example of this, as it was originally isolated from milk from a cow with mastitis (Table 1). The high levels of heterozygosity in the CBS1977 genome reduced the quality of the short-read genome assembly (N50 = 30.1 kb; Table 2). To improve the assembly and examine the structure of the CBS1977 genome in more detail, we performed long-read MinION sequencing, which resulted in an assembly of 59 contigs with a much larger N50 of 1.26 Mb. To examine the contributions of each parent to the hybrid genome sequence, we performed a sliding window blastn analysis of the hybrid genome contigs from both the long-read and short-read assemblies against the two reference genomes for O. polymorpha (Riley ) and O. parapolymorpha (Ravin ). Both the short-read assembly (Illumina/SPAdes) and the long-read assembly (MinION/canu, with a pilon correction step) indicated that large sections of the CBS1977 genome are heterozygous because DNA from both the O. polymorpha parent and the O. parapolymorpha parent was retained (Figure 9). However, the two assemblies differed significantly in the amount of the genome that was estimated to be heterozygous. The long-read assembly contained 3.1 Mb of heterozygous sequence, with the remaining 5.7 Mb of the genome assembly appearing to be homozygous (Table 4). However, when the latter regions were compared to the short-read assembly, an additional 2.8 Mb of the genome was found to be heterozygous (Table 4). Thus, “homozygous” regions totaling 2.8 Mb in the long-read assembly corresponded to “heterozygous” regions totaling 2 × 2.8 Mb in the short-read assembly. Since the short-read contigs in these regions matched the reference genome sequences of the two parental species, we believe that these regions are in fact heterozygous, and therefore that the difference between the two assemblies is due to over-aggressive “collapsing” of the heterozygous regions into single contigs by the canu assembler. Figure 9 shows the parental contributions that we infer from a combined analysis of the two assemblies. We also found a few regions of the genome that were represented by additional (third) long-read assembly contigs, suggesting locations where duplications may have occurred. These additional contigs are indicated in Figure 9. The mitochondrial genome of CBS1977 comes from O. parapolymorpha.
Figure 9

Inferred Genome Structure for Interspecies Diploid Hybrid CBS1977. Nucleotide identity for hybrid genome was determined by BLAST analysis of 1 kb sliding windows across the CBS1977 genome assembly against the O. polymorpha NCYC 495 and O. parapolymorpha DL-1 reference genome sequences. Regions that most closely match the O. polymorpha and O. parapolymorpha parental genomes are indicated in blue and orange, respectively. The right telomere of chromosome 4 could not be assigned due to high sequence identity to both parental genomes and is indicated in gray. Centromeric regions are denoted by white circles, ∼1 kb genomic repeat sequences found on NCYC 495 chromosomes 1, 2, 6, and 7 are denoted by a black line, MATa and MATα loci on chromosome 3 are denoted by green and pink boxes, respectively, and the ribosomal DNA locus on chromosome 7 is denoted by yellow boxes. Regions of the genome that contained more than one contig in either the MinION or Illumina assemblies that matched the same parental genome are indicated below the chromosome, and the name of the contigs are indicated.

Table 4

Summary of homozygous and heterozygous composition for the interspecies diploid hybrid isolate CBS1977

ChromosomeBLAST hit length (kb) MinION heterozygous (kb) a Illumina uniquely heterozygous (kb)bCombined heterozygous (kb)cHomozygous opol parent (kb)dHomozygous opar parent (kb)e% LOH
11507153573236941051.69
215657562760053634.25
31339441106938040458.55
4124349372008222.41
512635555192188215.04
698123219583750955.66
7985482426042337.63
Total8,8833,1122,815211,0241,91633.08

Total length of heterozygous regions supported by MinION assembly.

Total length of heterozygous regions supported only by Illumina assembly (homozygous in MinION assembly).

Total length of heterozygous regions supported by one Illumina contig and one MinION contig or scaffold.

Total length of homozygous regions that have higher sequence identity to Opol parental genome.

Total length of homozygous regions that have higher sequence identity to Opar parental genome.

Inferred Genome Structure for Interspecies Diploid Hybrid CBS1977. Nucleotide identity for hybrid genome was determined by BLAST analysis of 1 kb sliding windows across the CBS1977 genome assembly against the O. polymorpha NCYC 495 and O. parapolymorpha DL-1 reference genome sequences. Regions that most closely match the O. polymorpha and O. parapolymorpha parental genomes are indicated in blue and orange, respectively. The right telomere of chromosome 4 could not be assigned due to high sequence identity to both parental genomes and is indicated in gray. Centromeric regions are denoted by white circles, ∼1 kb genomic repeat sequences found on NCYC 495 chromosomes 1, 2, 6, and 7 are denoted by a black line, MATa and MATα loci on chromosome 3 are denoted by green and pink boxes, respectively, and the ribosomal DNA locus on chromosome 7 is denoted by yellow boxes. Regions of the genome that contained more than one contig in either the MinION or Illumina assemblies that matched the same parental genome are indicated below the chromosome, and the name of the contigs are indicated. Summary of homozygous and heterozygous composition for the interspecies diploid hybrid isolate CBS1977 Total length of heterozygous regions supported by MinION assembly. Total length of heterozygous regions supported only by Illumina assembly (homozygous in MinION assembly). Total length of heterozygous regions supported by one Illumina contig and one MinION contig or scaffold. Total length of homozygous regions that have higher sequence identity to Opol parental genome. Total length of homozygous regions that have higher sequence identity to Opar parental genome. After determining the heterozygous regions from the combination of long-read and short-read assemblies, we found loss of heterozygosity (LOH) has occurred for 33% of the genome in CBS1977 (Table 4). The interspecies hybrid likely formed through a mating event between two haploid cells, based on the presence on chromosome 3 of one MAT locus contributed by each parental species (Figure 9, Supplementary Figure S4). The homozygous regions of the genome are derived from both the O. polymorpha and O. parapolymorpha parental genomes (Table 4), suggesting that the LOH has not resulted from backcrossing of the hybrid to a specific parent. Heterozygosity has been maintained for the centromeric regions of each chromosome, which includes the ribosomal DNA locus adjacent to the centromere on chromosome 7 (Figure 9), so CBS1977 has retained both of the parental rDNA sequences. The maintenance of heterozygosity in centromeric regions may indicate a general suppression of recombination at these sites, consistent with observed recombination patterns in yeasts (Peter ; Tattini ). Genome stabilization through LOH has frequently been observed following hybridization events (Morales and Dujon 2012), and has been associated with adaptation (Smukowski Heil ; Zhang ). LOH results from homologous recombination that resolves by reciprocal genetic exchange (interhomolog crossover) or nonreciprocal genetic exchange between homologs (gene conversion or break-induced replication) (Symington ). In O. polymorpha and O. parapolymorpha, the sexual processes of mating and sporulation are induced by the same environmental conditions (nitrogen starvation), resulting in transient diploids that readily enter meiosis to return to a haploid state (Hanson ; Maekawa and Kaneko 2014). The sustained diploid state of the hybrid suggests that it cannot sporulate, despite the ability of a laboratory-created interspecies hybrid between O. polymorpha and O. parapolymorpha to undergo meiosis (Hanson ). Crossover events are rare in mitotic recombination, in which synthesis-dependent strand annealing (SDSA) is a more common mechanism for double-strand break resolution (Symington ). Small interspersed regions of LOH, which are found throughout the genome of CBS1977 (Figure 9), may be accounted for by SDSA. However, much of the LOH in the genome has occurred in long stretches at the ends of chromosomes (Figure 9). This pattern is more readily explained by break-induced replication (BIR), which is a replication-dependent nonreciprocal genetic exchange that occurs when only one end of a DSB is used for homologous recombination (Kramara ). An experimental evolution study with S. cerevisiae × S. paradoxus hybrids showed that gene conversion events leading to LOH were reduced relative to intraspecies hybrids, which may be explained by reduced recombination due to sequence differences between the parental genomes. BIR occurs more frequently under stressful conditions (Kramara ), as well as when only one side of a DSB has homology with a repair template (Malkova ; Symington ), which may explain why BIR is prevalent in hybrid genomes, which occur more frequently in conditions requiring stress-tolerance and where sequence divergence between homologous chromosomes may make recombination less efficient.

Conclusions

Our study is a first examination of the population genomics of the O. polymorpha species complex. Using phylogenomics, we have established the relationships among the four species in the complex. We surveyed the genetic variation within and between species by comparing the genome sequences of 47 isolates. We found evidence for structural rearrangements in O. polymorpha, O. angusta, and O. haglerorum, and identified one isolate as an interspecies hybrid between O. polymorpha and O. parapolymorpha that formed through haploid mating and has since undergone loss of heterozygosity. These data will provide a useful resource for the continued use of O. polymorpha as a model system in genetics, cell biology, and recombinant protein production. Click here for additional data file.
  105 in total

1.  tRNAscan-SE: Searching for tRNA Genes in Genomic Sequences.

Authors:  Patricia P Chan; Todd M Lowe
Journal:  Methods Mol Biol       Date:  2019

2.  Polymorphism, divergence, and the role of recombination in Saccharomyces cerevisiae genome evolution.

Authors:  Asher D Cutter; Alan M Moses
Journal:  Mol Biol Evol       Date:  2011-01-03       Impact factor: 16.240

3.  High-resolution mapping of meiotic crossovers and non-crossovers in yeast.

Authors:  Eugenio Mancera; Richard Bourgon; Alessandro Brozzi; Wolfgang Huber; Lars M Steinmetz
Journal:  Nature       Date:  2008-07-09       Impact factor: 49.962

4.  Identification of a novel interspecific hybrid yeast from a metagenomic spontaneously inoculated beer sample using Hi-C.

Authors:  Caiti Smukowski Heil; Joshua N Burton; Ivan Liachko; Anne Friedrich; Noah A Hanson; Cody L Morris; Joseph Schacherer; Jay Shendure; James H Thomas; Maitreya J Dunham
Journal:  Yeast       Date:  2017-10-19       Impact factor: 3.239

5.  The genomic and phenotypic diversity of Schizosaccharomyces pombe.

Authors:  Daniel C Jeffares; Charalampos Rallis; Adrien Rieux; Doug Speed; Martin Převorovský; Tobias Mourier; Francesc X Marsellach; Zamin Iqbal; Winston Lau; Tammy M K Cheng; Rodrigo Pracana; Michael Mülleder; Jonathan L D Lawson; Anatole Chessel; Sendu Bala; Garrett Hellenthal; Brendan O'Fallon; Thomas Keane; Jared T Simpson; Leanne Bischof; Bartlomiej Tomiczek; Danny A Bitton; Theodora Sideri; Sandra Codlin; Josephine E E U Hellberg; Laurent van Trigt; Linda Jeffery; Juan-Juan Li; Sophie Atkinson; Malte Thodberg; Melanie Febrer; Kirsten McLay; Nizar Drou; William Brown; Jacqueline Hayles; Rafael E Carazo Salas; Markus Ralser; Nikolas Maniatis; David J Balding; Francois Balloux; Richard Durbin; Jürg Bähler
Journal:  Nat Genet       Date:  2015-02-09       Impact factor: 38.330

6.  Population genomics reveals chromosome-scale heterogeneous evolution in a protoploid yeast.

Authors:  Anne Friedrich; Paul Jung; Cyrielle Reisser; Gilles Fischer; Joseph Schacherer
Journal:  Mol Biol Evol       Date:  2014-10-27       Impact factor: 16.240

7.  Hybridization and emergence of virulence in opportunistic human yeast pathogens.

Authors:  Verónica Mixão; Toni Gabaldón
Journal:  Yeast       Date:  2017-09-14       Impact factor: 3.239

8.  Accurate Tracking of the Mutational Landscape of Diploid Hybrid Genomes.

Authors:  Lorenzo Tattini; Nicolò Tellini; Simone Mozzachiodi; Melania D'Angiolo; Sophie Loeillet; Alain Nicolas; Gianni Liti
Journal:  Mol Biol Evol       Date:  2019-12-01       Impact factor: 16.240

9.  Fast and accurate short read alignment with Burrows-Wheeler transform.

Authors:  Heng Li; Richard Durbin
Journal:  Bioinformatics       Date:  2009-05-18       Impact factor: 6.937

10.  DELLY: structural variant discovery by integrated paired-end and split-read analysis.

Authors:  Tobias Rausch; Thomas Zichner; Andreas Schlattl; Adrian M Stütz; Vladimir Benes; Jan O Korbel
Journal:  Bioinformatics       Date:  2012-09-15       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.