| Literature DB >> 34849824 |
Sara J Hanson1, Eoin Ó Cinnéide2, Letal I Salzberg2, Kenneth H Wolfe2, Jamie McGowan3,4, David A Fitzpatrick3,4, Kate Matlin1.
Abstract
The methylotrophic yeast Ogataea polymorpha has long been a useful system for recombinant protein production, as well as a model system for methanol metabolism, peroxisome biogenesis, thermotolerance, and nitrate assimilation. It has more recently become an important model for the evolution of mating-type switching. Here, we present a population genomics analysis of 47 isolates within the O. polymorpha species complex, including representatives of the species O. polymorpha, Ogataea parapolymorpha, Ogataea haglerorum, and Ogataea angusta. We found low levels of nucleotide sequence diversity within the O. polymorpha species complex and identified chromosomal rearrangements both within and between species. In addition, we found that one isolate is an interspecies hybrid between O. polymorpha and O. parapolymorpha and present evidence for loss of heterozygosity following hybridization.Entities:
Keywords: zzm321990 Ogataeazzm321990 ; chromosomal rearrangements; interspecies hybridization; mating-type switching; population genomics
Mesh:
Year: 2021 PMID: 34849824 PMCID: PMC8496258 DOI: 10.1093/g3journal/jkab211
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Ogataea isolates sequenced in study
| Strain | Species | Strain ID | Source | Location | Ploidy |
|---|---|---|---|---|---|
| Opol1 |
| CBS4732/Y-5445/ATCC 34438 | Soil | Brazil | Haploid |
| Opol2 |
| CBS1976/NRRL Y-1798/ ATCC 14754/NCYC495 | Spoiled Florida orange juice | USA | Haploid |
| Opol3 |
| Phaff 72-225 | Glutinous/nonglutinous rice | USA | Haploid |
| Opol4 |
| NRRL Y-2423 | Swine intestinal tract | Portugal | Haploid |
| Opol5 |
| CBS8852/NRRL Y-27293. | Knee replacement | Worcester, MA, USA | Haploid |
| Opol6 |
| NRRL Y-27863/ATCC MYAA-3665 | Patient's blood, catheter infection | Chicago, IL, USA | Haploid |
| Opol7 |
| NRRL Y-6005 | Waste liquid from olive processing | Spain | Haploid |
| Opol8 |
| NRRL YB-179 | Soil | Costa Rica | Haploid |
| Opol9 |
| CBS5032 | Maize meal | South Africa | Haploid |
| Opol10 |
| CBS7031 | Soil | Unknown | Haploid |
| Opol11 |
| CBS7239 | Catalase-negative mutant of CBS4732 (PMID 7000025) | Germany | Haploid |
| Opar1 |
| CBS12304/NRRL YB-1982 | Insect frass, quaking aspen | Duluth, MN, USA | Haploid |
| Opar2 |
| Phaff 73-26 | Soil | MA, USA | Haploid |
| Opar3 × Opol | Hybrid ( | CBS1977 | Milk from cow with mastitis | UK | Diploid |
| Opar4 |
| CBS11895/NRRL Y-7560/ ATCC 26012 | Soil | Cambridge, MA, USA | Haploid |
| Oang1 |
| Phaff 50-165/NRRL Y-2217 |
| Jacksonville, CA, USA | Haploid |
| Oang2 |
| Phaff 50-97/NRRL Y-2212 |
| Keen Camp, CA, USA | Haploid |
| Oang3 |
| Phaff 51-138 |
| Mather, CA, USA | Haploid |
| Oang4 |
| Phaff 51-177 |
| Mather, CA, USA | Haploid |
| Oang5 |
| Phaff 52-251 |
| Mather, CA, USA | Haploid |
| Oang6 |
| Phaff 60-394/ATCC 24190 |
| Winters, CA, USA | Haploid |
| Oang7 |
| Phaff 61-224 |
| Gualala River, CA, USA | Haploid |
| Oang8 |
| Phaff 61-235 |
| Gualala River, CA, USA | Haploid |
| Oang9 |
| Phaff 61-244 |
| Gualala River, CA, USA | Haploid |
| Oang10 |
| CBS2575/NCYC1450 |
| USA | Haploid |
| Ohag1 |
| Phaff 78-557.3 |
| Hemmant, Queensland, AU | Haploid |
| Ohag2 |
| Phaff 79-204.41 |
| Hemmant, Queensland, AU | Haploid |
| Ohag3 |
| Phaff 81-408.1 |
| Saguaro Natl. Monument West, AZ, USA | Haploid |
| Ohag4 |
| Phaff 81-410 |
| Saguaro Natl. Monument West, AZ, USA | Haploid |
| Ohag5 |
| Phaff 81-419.3 |
| Bear Canyon, Tucson, AZ, USA | Haploid |
| Ohag6 |
| Phaff 81-419.5 |
| Bear Canyon, Tucson, AZ, USA | Haploid |
| Ohag7 |
| Phaff 81-433.4 |
| Santa Rita Mountains, Tucson, AZ, USA | Haploid |
| Ohag8 |
| Phaff 81-436.3 |
| Santa Rita Mountains, Tucson, AZ, USA | Haploid |
| Ohag9 |
| Phaff 81-440.2 |
| Santa Rita Mountains, Tucson, AZ, USA | Haploid |
| Ohag10 |
| Phaff 81-453.3 |
| Near Sells, AZ, USA | Haploid |
| Ohag11 |
| Phaff 81-461.3 |
| Near Sells, AZ, USA | Haploid |
| Ohag12 |
| Phaff 81-463.1 |
| Near Sells, AZ, USA | Haploid |
| Ohag13 |
| Phaff 81-471.3 |
| Rincon Mountains, AZ, USA | Haploid |
| Ohag14 |
| Phaff 81-480 |
| Rincon Mountains, AZ, USA | Haploid |
| Ohag15 |
| Phaff 83-405.1 |
| Tucson Mountains, AZ, USA | Haploid |
| Ohag16 |
| Phaff 83-425.4 |
| Tucson, AZ, USA | Haploid |
| Ohag17 |
| Phaff 83-437.2.1 |
| Santa Rita Mountains, Tucson, AZ, USA | Haploid |
| Ohag18 |
| Phaff 83-437.2.2 |
| Santa Rita Mountains, Tucson, AZ, USA | Haploid |
| Ohag19 |
| Phaff 83-442.1 |
| AZ, USA | Haploid |
| Ohag20 |
| Phaff 83-471.3 |
| Santa Catalina Mountains, AZ, USA | Haploid |
| Ohag21 |
| Phaff 83-474.2 |
| Pima Canyon, Tucson, AZ, USA | Haploid |
| Ohag22 |
| Phaff 83-476.5 |
| Pima Canyon, Tucson, AZ, USA | Haploid |
Type strain.
Reference strain for varaint analysis.
Information provided in culture collection database.
Ogataea genome assembly and annotation statistics
| Strain | Strain ID | Genome length (Mb) | N50 (kb) | # Contigs | % GC | tRNA Genes | Protein-coding genes |
|---|---|---|---|---|---|---|---|
| Opol1 | CBS4732/Y-5445/ATCC 34438 | 8.93 | 556.0 | 67 | 47.7 | 97 | 5,442 |
| Opol2 | CBS1976/NRRL Y-1798/ATCC 14754/NCYC495 | 8.93 | 556.4 | 64 | 47.7 | 96 | 5,446 |
| Opol3 | Phaff 72-225 | 8.95 | 788.8 | 45 | 47.7 | 97 | 5,444 |
| Opol4 | NRRL Y-2423 | 8.92 | 608.6 | 79 | 47.7 | 98 | 5,436 |
| Opol5 | CBS8852/NRRL Y-27293. | 8.95 | 636.5 | 64 | 47.7 | 99 | 5,451 |
| Opol6 | NRRL Y-27863/ATCC MYAA-3665 | 8.97 | 616.1 | 51 | 47.7 | 97 | 5,454 |
| Opol7 | NRRL Y-6005 | 8.91 | 631.1 | 51 | 47.7 | 96 | 5,431 |
| Opol8 | NRRL YB-179 | 8.93 | 552.0 | 85 | 47.7 | 99 | 5,440 |
| Opol9 | CBS5032 | 8.9 | 626.9 | 52 | 47.8 | 99 | 5,417 |
| Opol10 | CBS7031 | 8.97 | 636.4 | 79 | 47.7 | 99 | 5,455 |
| Opol11 | CBS7239 | 8.94 | 516.4 | 67 | 47.7 | 97 | 5,442 |
| Opar1 | CBS12304/NRRL YB-1982 | 8.87 | 557.1 | 55 | 47.7 | 97 | 5,417 |
| Opar2 | Phaff 73-26 | 8.92 | 263.7 | 112 | 47.8 | 99 | 5,456 |
| Opar3 × Opol | CBS1977 | 14.88 | 30.1 | 1521 | 47.9 | 155 | 9,920 |
| Opar4 | CBS11895/NRRL Y-7560/ATCC 26012 | 8.92 | 618.1 | 87 | 47.8 | 97 | 5,424 |
| Oang1 | Phaff 50-165/NRRL Y-2217 | 8.88 | 848.7 | 54 | 49.5 | 97 | 5,409 |
| Oang2 | Phaff 50-97/NRRL Y-2212 | 8.88 | 655.9 | 73 | 49.4 | 97 | 5,437 |
| Oang3 | Phaff 51-138 | 8.89 | 553.9 | 127 | 49.5 | 100 | 5,430 |
| Oang4 | Phaff 51-177 | 8.89 | 743.6 | 96 | 49.5 | 97 | 5,443 |
| Oang5 | Phaff 52-251 | 8.89 | 651.6 | 108 | 49.5 | 97 | 5,437 |
| Oang6 | Phaff 60-394/ATCC 24190 | 8.88 | 856.3 | 57 | 49.5 | 97 | 5,419 |
| Oang7 | Phaff 61-224 | 8.91 | 651.8 | 154 | 49.4 | 99 | 5,453 |
| Oang8 | Phaff 61-235 | 8.9 | 557.2 | 107 | 49.5 | 97 | 5,452 |
| Oang9 | Phaff 61-244 | 8.91 | 850.2 | 101 | 49.4 | 97 | 5,446 |
| Oang10 | CBS2575/NCYC1450 | 8.91 | 787.0 | 109 | 49.5 | 97 | 5,441 |
| Ohag1 | Phaff 78-557.3 | 8.85 | 583.5 | 50 | 49.4 | 97 | 5,390 |
| Ohag2 | Phaff 79-204.41 | 8.85 | 555.7 | 49 | 49.4 | 98 | 5,392 |
| Ohag3 | Phaff 81-408.1 | 8.87 | 556.8 | 68 | 49.4 | 99 | 5,393 |
| Ohag4 | Phaff 81-410 | 8.86 | 416.0 | 73 | 49.4 | 97 | 5,412 |
| Ohag5 | Phaff 81-419.3 | 8.87 | 465.9 | 109 | 49.4 | 103 | 5,419 |
| Ohag6 | Phaff 81-419.5 | 8.86 | 632.4 | 63 | 49.4 | 100 | 5,415 |
| Ohag7 | Phaff 81-433.4 | 8.86 | 583.9 | 65 | 49.4 | 100 | 5,401 |
| Ohag8 | Phaff 81-436.3 | 8.87 | 556.4 | 71 | 49.4 | 97 | 5,407 |
| Ohag9 | Phaff 81-440.2 | 8.87 | 555.9 | 75 | 49.4 | 97 | 5,404 |
| Ohag10 | Phaff 81-453.3 | 8.88 | 556.7 | 73 | 49.4 | 99 | 5,411 |
| Ohag11 | Phaff 81-461.3 | 8.85 | 579.9 | 52 | 49.4 | 99 | 5,395 |
| Ohag12 | Phaff 81-463.1 | 8.86 | 584.1 | 62 | 49.4 | 97 | 5,404 |
| Ohag13 | Phaff 81-471.3 | 8.86 | 556.9 | 64 | 49.4 | 97 | 5,413 |
| Ohag14 | Phaff 81-480 | 8.86 | 437.7 | 66 | 49.4 | 97 | 5,423 |
| Ohag15 | Phaff 83-405.1 | 8.86 | 466.2 | 66 | 49.4 | 97 | 5,408 |
| Ohag16 | Phaff 83-425.4 | 8.87 | 583.5 | 74 | 49.4 | 99 | 5,422 |
| Ohag17 | Phaff 83-437.2.1 | 8.86 | 466.2 | 56 | 49.4 | 98 | 5,413 |
| Ohag18 | Phaff 83-437.2.2 | 8.86 | 497.3 | 57 | 49.4 | 98 | 5,401 |
| Ohag19 | Phaff 83-442.1 | 8.86 | 584.3 | 59 | 49.4 | 97 | 5,402 |
| Ohag20 | Phaff 83-471.3 | 8.86 | 556.4 | 65 | 49.4 | 97 | 5,412 |
| Ohag21 | Phaff 83-474.2 | 8.86 | 582.9 | 71 | 49.4 | 99 | 5,402 |
| Ohag22 | Phaff 83-476.5 | 8.89 | 632.2 | 89 | 49.4 | 101 | 5,406 |
Type strain.
Reference strain for varaint analysis.
Figure 1Relationship of the O. polymorpha species complex to other Ogataea species. Supermatrix phylogeny of 24 Ogataea species derived from 1278 BUSCO families giving an alignment 319,116 amino acids in length. P. kudriavzevii is included as an outgroup. Maximum Likelihood phylogeny was reconstructed with IQ-TREE implementing the JTT+F+R5 model. Bootstrap support values are indicated at all nodes.
Summary of genetic variation in Ogataea
| Total | Per kb | |||
|---|---|---|---|---|
| Strain | SNP | Indel | SNP | Indel |
|
| ||||
| Opol1 | 26,824 | 1,275 | 3.00 | 0.14 |
| Opol2 | 9,609 | 661 | 1.08 | 0.07 |
| Opol3 | 42,399 | 1,820 | 4.74 | 0.20 |
| Opol4 | 31,882 | 1,425 | 3.57 | 0.16 |
| Opol5 | 35,033 | 1,578 | 3.91 | 0.18 |
| Opol6 | 38,878 | 1,665 | 4.34 | 0.19 |
| Opol7 | 42,586 | 1,881 | 4.78 | 0.21 |
| Opol8 | 36,772 | 1,707 | 4.12 | 0.19 |
| Opol9 | 33,877 | 1,546 | 3.80 | 0.17 |
| Opol10 | 49,307 | 2,049 | 5.50 | 0.23 |
| Opol11 | 27,073 | 1,219 | 3.03 | 0.14 |
|
| ||||
| Opar1 | 113,030 | 3,221 | 12.74 | 0.36 |
| Opar2 | 207 | 569 | 0.02 | 0.06 |
| Opar4 | 197 | 558 | 0.02 | 0.06 |
|
| ||||
| Oang1 | 52,192 | 1,991 | 5.88 | 0.22 |
| Oang2 | 47,937 | 1,868 | 5.40 | 0.21 |
| Oang3 | 48,832 | 1,789 | 5.49 | 0.20 |
| Oang4 | 48,960 | 1,788 | 5.51 | 0.20 |
| Oang5 | 48,844 | 1,795 | 5.49 | 0.20 |
| Oang6 | 47,059 | 1,742 | 5.30 | 0.20 |
| Oang7 | 48,613 | 1,778 | 5.46 | 0.20 |
| Oang8 | 47,735 | 1,642 | 5.36 | 0.18 |
| Oang9 | n/a | n/a | n/a | n/a |
| Oang10 | 49,183 | 1,766 | 5.52 | 0.20 |
|
| ||||
| Ohag1 | 19,971 | 979 | 2.26 | 0.11 |
| Ohag2 | 19,938 | 931 | 2.25 | 0.11 |
| Ohag3 | 20,581 | 1,055 | 2.32 | 0.12 |
| Ohag4 | 20,508 | 974 | 2.31 | 0.11 |
| Ohag5 | 20,238 | 1,009 | 2.28 | 0.11 |
| Ohag6 | 20,189 | 1,001 | 2.28 | 0.11 |
| Ohag7 | 19,665 | 989 | 2.22 | 0.11 |
| Ohag8 | 20,899 | 1,026 | 2.36 | 0.12 |
| Ohag9 | 20,725 | 1,059 | 2.34 | 0.12 |
| Ohag10 | n/a | n/a | n/a | n/a |
| Ohag11 | 15,424 | 821 | 1.74 | 0.09 |
| Ohag12 | 19,523 | 974 | 2.20 | 0.11 |
| Ohag13 | 20,997 | 1,053 | 2.37 | 0.12 |
| Ohag14 | 20,793 | 1,003 | 2.35 | 0.11 |
| Ohag15 | 20,447 | 1,002 | 2.31 | 0.11 |
| Ohag16 | 20,234 | 1,015 | 2.28 | 0.11 |
| Ohag17 | 19,953 | 943 | 2.25 | 0.11 |
| Ohag18 | 19,913 | 934 | 2.25 | 0.11 |
| Ohag19 | 20,699 | 1,043 | 2.34 | 0.12 |
| Ohag20 | 20,788 | 1,078 | 2.35 | 0.12 |
| Ohag21 | 20,663 | 1,047 | 2.33 | 0.12 |
| Ohag22 | 19,534 | 969 | 2.21 | 0.11 |
Type strain.
Reference strain for varaint analysis.
Figure 2Population structure of the O. polymorpha species complex. Maximum likelihood phylogenies created using SNP alignments for (A) O. polymorpha, (B) O. angusta, and (C) O. haglerorum isolates. Bootstrap support was 100% except where indicated below the branch, and branch lengths are given above each branch. Geographic information for isolates is indicated using colored boxes. (D) Supermatrix phylogeny of 48 Ogataea isolates generated using 1,148 BUSCO families.
Figure 3Structural rearrangements in O. polymorpha. Chromosomal breakpoints identified in O. polymorpha isolate Opol9 (CBS5032) on (A) NODE_10, (B) NODE_5, and (C) NODE_2, and in O. polymorpha isolate Opol4 (NRRL Y-2423) on (D) NODE_14 are detailed. Chromosomes are numbered based on O. polymorpha NCYC495 genome assembly (shown at the bottom in each panel) and color-coding of genes corresponds to their locations in the NCYC495 genome. White circles indicate the location of centromeres and white boxes indicate the location of a genomic repeat sequence that is found on four chromosomes in the NCYC495 genome.
Figure 4Structural rearrangements in O. haglerorum. Chromosomal breakpoints identified in O. haglerorum isolates (A) Ohag3 (81-408-1) on NODE_5, (B) Ohag10 (81-453-3) on NODE_15 and Ohag11 (81-461-3) on NODE_16, (C) Ohag21 (83-474-2) on NODE_11, (D) Ohag17 (83-437-2-1) on NODE_14, and (E) Ohag9 (81-440-2) on NODE_10 are detailed. Chromosomes are numbered based on O. polymorpha NCYC495 genome assembly and color-coding of genes corresponds to their locations in the NCYC495 genome. White circles indicate the location of centromeres and white boxes indicate the location of a genomic repeat sequence that is found on four chromosomes in the NCYC495 genome.
Figure 5Structural rearrangements in O. angusta. Chromosomal breakpoints identified in O. angusta isolate Oang4 (51-177) on (A) NODE_1, (B) NODE_5 (C) NODE_2, and (D) NODE_6 are detailed. Chromosomes are numbered based on O. polymorpha NCYC495 genome assembly and color-coding of genes corresponds to their locations in the NCYC495 genome. White circles indicate the location of centromeres and white boxes indicate the location of a genomic repeat sequence that is found on four chromosomes in the NCYC495 genome.
Figure 6Genome-wide genetic diversity in Ogataea species. Plots show density of SNPs (SNPs/kb) and Tajima’s D calculated in 10 kb windows across the genome for all isolates of (A) O. polymorpha, (B) O. haglerorum, and (C) O. angusta. Schematics below each set of plots indicate chromosome with position of centromeres indicated by purple circles and the MAT region indicated by orange boxes. O. haglerorum and O. angusta contigs greater than 50 kb in length were ordered according to their alignment with the O. polymorpha genome, and contig break locations in reference genomes (Oang9 and Ohag10) are indicated by dashed gray lines.
Figure 8SNP Density at Genomic Features in Ogataea. Box and whisker plots show the SNPs/kb at telomeres (within 50 kb of terminal contig ends in genome assemblies), centromeres, at the centromere of chromosome 3, the mating-type locus, and genome-wide for O. polymorpha, O. angusta, and O. haglerorum.
Figure 7Genetic diversity in the Ogataea MAT Region. (A) Schematic of 19 kb MAT region content, drawn to scale. The genes specifying mating-type a are shown in green, and those specifying mating-type a are shown in pink. The gene HPODL_4020 (shown in gray) is a pseudogene in O. haglerorum. Plots show density of SNPs (SNPs/kb) and Tajima’s D calculated in 1 kb windows across the MAT region, and 100 kb upstream and downstream for (B) O. polymorpha, (C) O. angusta, and (D) O. haglerorum. Gray dashed lines indicate contig breaks. Schematic at the bottom shows the location of the centromere (purple), the MAT region (orange), and the inverted repeat sequences (blue).
Figure 9Inferred Genome Structure for Interspecies Diploid Hybrid CBS1977. Nucleotide identity for hybrid genome was determined by BLAST analysis of 1 kb sliding windows across the CBS1977 genome assembly against the O. polymorpha NCYC 495 and O. parapolymorpha DL-1 reference genome sequences. Regions that most closely match the O. polymorpha and O. parapolymorpha parental genomes are indicated in blue and orange, respectively. The right telomere of chromosome 4 could not be assigned due to high sequence identity to both parental genomes and is indicated in gray. Centromeric regions are denoted by white circles, ∼1 kb genomic repeat sequences found on NCYC 495 chromosomes 1, 2, 6, and 7 are denoted by a black line, MATa and MATα loci on chromosome 3 are denoted by green and pink boxes, respectively, and the ribosomal DNA locus on chromosome 7 is denoted by yellow boxes. Regions of the genome that contained more than one contig in either the MinION or Illumina assemblies that matched the same parental genome are indicated below the chromosome, and the name of the contigs are indicated.
Summary of homozygous and heterozygous composition for the interspecies diploid hybrid isolate CBS1977
| Chromosome | BLAST hit length (kb) |
| Illumina uniquely heterozygous (kb) | Combined heterozygous (kb) | Homozygous opol parent (kb) | Homozygous opar parent (kb) | % LOH |
|---|---|---|---|---|---|---|---|
| 1 | 1507 | 153 | 573 | 2 | 369 | 410 | 51.69 |
| 2 | 1565 | 756 | 276 | 0 | 0 | 536 | 34.25 |
| 3 | 1339 | 441 | 106 | 9 | 380 | 404 | 58.55 |
| 4 | 1243 | 493 | 720 | 0 | 8 | 22 | 2.41 |
| 5 | 1263 | 555 | 519 | 2 | 188 | 2 | 15.04 |
| 6 | 981 | 232 | 195 | 8 | 37 | 509 | 55.66 |
| 7 | 985 | 482 | 426 | 0 | 42 | 33 | 7.63 |
| Total | 8,883 | 3,112 | 2,815 | 21 | 1,024 | 1,916 | 33.08 |
Total length of heterozygous regions supported by MinION assembly.
Total length of heterozygous regions supported only by Illumina assembly (homozygous in MinION assembly).
Total length of heterozygous regions supported by one Illumina contig and one MinION contig or scaffold.
Total length of homozygous regions that have higher sequence identity to Opol parental genome.
Total length of homozygous regions that have higher sequence identity to Opar parental genome.