Literature DB >> 35199143

Population genetics and microevolution of clinical Candida glabrata reveals recombinant sequence types and hyper-variation within mitochondrial genomes, virulence genes, and drug targets.

Nicolas Helmstetter¹, Aleksandra D Chybowska², Christopher Delaney³, Alessandra Da Silva Dantas¹, Hugh Gifford¹, Theresa Wacker¹, Carol Munro², Adilia Warris¹, Brian Jones⁴, Christina A Cuomo⁵, Duncan Wilson¹, Gordon Ramage³, Rhys A Farrer^1,5.

Abstract

Candida glabrata is the second most common etiological cause of worldwide systemic candidiasis in adult patients. Genome analysis of 68 isolates from 8 hospitals across Scotland, together with 83 global isolates, revealed insights into the population genetics and evolution of C. glabrata. Clinical isolates of C. glabrata from across Scotland are highly genetically diverse, including at least 19 separate sequence types that have been recovered previously in globally diverse locations, and 1 newly discovered sequence type. Several sequence types had evidence for ancestral recombination, suggesting transmission between distinct geographical regions has coincided with genetic exchange arising in new clades. Three isolates were missing MATα1, potentially representing a second mating type. Signatures of positive selection were identified in every sequence type including enrichment for epithelial adhesins thought to facilitate fungal adhesin to human epithelial cells. In patent microevolution was identified from 7 sets of recurrent cases of candidiasis, revealing an enrichment for nonsynonymous and frameshift indels in cell surface proteins. Microevolution within patients also affected epithelial adhesins genes, and several genes involved in drug resistance including the ergosterol synthesis gene ERG4 and the echinocandin target FKS1/2, the latter coinciding with a marked drop in fluconazole minimum inhibitory concentration. In addition to nuclear genome diversity, the C. glabrata mitochondrial genome was particularly diverse, with reduced conserved sequence and conserved protein-encoding genes in all nonreference ST15 isolates. Together, this study highlights the genetic diversity within the C. glabrata population that may impact virulence and drug resistance, and 2 major mechanisms generating this diversity: microevolution and genetic exchange/recombination.

Entities: Chemical

Keywords: zzm321990 Candida glabratazzm321990 ; candidiasis; drug-resistance; epidemiology; evolution; genome sequencing; microevolution; mitochondria

Mesh：

Substances：
Antifungal Agents

Year: 2022 PMID： 35199143 PMCID： PMC9071574 DOI： 10.1093/genetics/iyac031

Source DB: PubMed Journal: Genetics ISSN： 0016-6731 Impact factor: 4.402

Introduction

Candida is the most prominent genus of the Debaryomycetaceae family, with over 400 genetically and phenotypically diverse species currently described (Daniel ). Many of these species are harmless commensals of the mucous membranes and digestive tracts of healthy individuals. Approximately 30 Candida species are of clinical importance in humans. Most of these species that are capable of causing disease in humans belong to the CTG-Serine clade, including Candida albicans, Candida dubliniensis, Candida tropicalis, Candida parapsilosis, Candida lusitaniae, Candida guilliermondii, and Candida auris, while others such as Candida glabrata and Candida bracarensis belong to the genetically distant Nakaseomyces clade (Daniel ; Turner and Butler 2014). In adult patients, C. glabrata is the second most commonly isolated species after C. albicans, which together cause approximately 3 quarters of all systemic candidiasis (Perlroth ; Rajendran ). Infections caused by these species range from mild vulvovaginal candidiasis (VVC or thrush) to severe, drug resistant and difficult to treat invasive infections affecting single organs or the blood stream (candidemia) with or without dissemination to the heart, brain, kidneys, and other parts of the body (Mullick ). Bloodstream infections caused by Candida spp. are associated with mortality rates of 30–60% (Hirano ; Lamoth ). Candidemia is associated with diverse risk factors including neutropenia, chemotherapy, diabetes, old age, compromised immune function, prolonged antibiotic and steroid treatment, and intravenous catheters that can harbor fungal biofilms (Yapar 2014). Pathogenic Candida species including C. glabrata have also exhibited alarming increases in resistance against all major classes of antifungal drugs, hindering effective treatments and resulting in increasing mortality rates (Cowen ; Odds ; Pfaller ; Rajendran ; Barber ). Candida glabrata typically grows in the yeast form and is considered to have evolved an infection strategy based on stealth and evasion without causing severe damage in murine models (Brunke and Hube 2013). This ability of C. glabrata and some of its relatives in the Nakaseomyces clade to infect humans is thought to have evolved recently (Gabaldón ), as several of its closest relatives have to date been exclusively isolated environmentally (Caenorhabditis castellii, Nakaseomyces baccilisporus, and Nakaseomyces delphensis) (Gabaldón ). Pathogenicity in the Nakaseomyces correlates with the number of epithelial adhesins (EPA) encoded in their genomes, which facilitate adherence and colonization of human epithelial cells (Cormack ). In contrast to C. albicans, the pathogenicity of Nakaseomyces species does not coincide with number or presence of Phospholipase-B and Superoxide Dismutase genes (Ibrahim ; Martchenko ; Cafarchia ; Bink ). Many Candida genes involved in virulence are therefore likely to have diverse functions, some of which may not be conserved between distant clades. Like many fungal pathogens, C. glabrata’s niche(s) and life cycle are poorly understood. Candida glabrata is increasingly identified among clinical samples where it is responsible for an increasing proportion of cases of candidemia (Beck-Sagué and Jarvis 1993; Rajendran ). Candida glabrata has also been identified environmentally, including as a component of the mycobiota of yellow-legged gulls (Al-Yasiri ), in droppings and cloaca swabs of birds of prey, migratory birds and passeriformes (Cafarchia ), and other potentially transitory sources including spontaneously fermenting coffee beans (de Melo Pereira ). Concerted efforts for sampling are required to determine the true ecological distribution of C. glabrata, as they are for several other important Candida species such as C. auris (Arora ). Furthermore, the relatedness of global isolates and their routes of transmission (either patient to patient, or between patient and the environment) requires further studies comparing genotypes to collected metadata including the location of isolation. High levels of genetic heterogeneity have been identified in the C. glabrata population, as molecular methods have identified diverse strains, clades and sequence types (STs) both inter- and intra-nationally (Biswas ; Carreté ). As of January 2022, the multi locus sequence type (MLST) database for C. glabrata included 233 STs from 1,414 isolates from 29 countries, based on the sequence identity for 6 genetic loci (Gabaldón ). All isolates of C. glabrata reported to date have been haploid, with occasional aneuploids e.g. transitory disomies of chromosomes E and G (Carreté ). Genetic heterogeneity within many fungal populations is shaped by their ability to switch between clonal and sexual recombination (Drenth ). The ability for C. glabrata to undergo a sexual cycle remains unknown, with all reported attempts in the laboratory to encourage mating thus far unsuccessful. Candida glabrata has therefore been regarded as an asexual species, despite the presence of well-conserved mating loci (Wong ; Gabaldón and Fairhead 2019) and 14 examples of phylogenetic incompatibilities from multi locus sequencing (Dodgson ). More recent genomic analysis from 34 globally isolated C. glabrata strains revealed evidence of population admixture, suggesting a thus far undiscovered sexual cycle (Carreté ). Greater sampling efforts and genomic analyses are therefore required to fully explore signatures of adaptation, virulence, and recombination. In this study, we explore the population genetics and microevolution of C. glabrata using comparative genomic analysis of 68 clinical C. glabrata isolates from 8 hospitals across Scotland, combined with 83 publicly available and globally isolated genomes, finding evidence of recombinant STs, hypervariable mitochondrial genomes, as well as variation in virulence genes and drug targets between STs and between serial isolates from prolonged or recurrent infection.

Materials and methods

Library preparation, sequencing, and antifungal tests

Candida glabrata was collected from blood in 2012 from 8 hospitals in Scotland (Supplementary Table 1). These isolates were collected as part of a retrospective study of all cases of Candida blood stream infections carried out within Scotland under NHS Caldicott Guardian approval from March 2012 to February 2013, as described previously (Rajendran , 2016b). Genomic and mitochondrial DNA was extracted from 68 isolates using the QIAamp DNA mini extraction kit (Qiagen) according to the manufacturer’s instructions. A small modification was made prior to extraction which was to mechanical disrupt the yeast. This was achieved by bead-beating the cells with sterile acid-washed 0.5 mm diameter glass beads (Thistle Scientific) for 3 × 30 s. Following isolation and extraction using the QIAamp columns the DNA was eluted into elution buffer before storage at −20°C and transport to the sequencing facility. Library preparation was performed by the Centre for Genome-Enabled Biology and Medicine at the University of Aberdeen. Briefly, gDNA quality was assessed on a Tapestation 4200 with a high sensitivity genomic DNA tape (Agilent) and quantified by fluorimetry using Quant-IT dsDNA high-sensitivity (HS) assay (Thermo Fisher). Dual indexed Illumina libraries were prepared from 1 ng purified gDNA using an Illumina Nextera XT DNA library preparation kit and Nextera XT v2 indices, which were purified from free adapters using AMPure XP beads (Beckman Coulter). Libraries were quantified using Quant-IT dsDNA HS assay (Thermo Fisher) and average fragment size was calculated on Tapestation 4200 (Agilent), then equimolar pooled at 10 nM. Concentration of the pool was verified by qPCR (Kapa library quantification kit, Roche) on QuantStudio 6 using SYBR green, and 1.8 pM of the library pool was sequenced on an Illumina NextSeq500 with 150 bp paired-end reads and 8 bp index reads to average alignment depths of 41.9X (Supplementary Table 2). This data was supplemented with paired-end illumina reads from Carreté , and isolate CBS138 from Xu . Minimum inhibitory concentration (MIC) tests for fluconazole were performed at the Mycology Reference Laboratory, Public Health England, Bristol, according to standard Clinical and Laboratory Standards Institute (CLSI) broth microdilution M27 guidelines (Alexander and Clinical and Laboratory Standards Institute 2017).

Variant calling

The Genome Analysis Toolkit (GATK) v.4.1.2.0 (McKenna ) was used to call variants. Our workflow description language (WDL) scripts were executed by Cromwell workflow execution engine v.48 (Voss ). Briefly, raw sequences were pre-processed by mapping reads to the reference genome C. glabrata CBS138 using BWA-MEM v.0.7.17 (Li 2013). Next, duplicates were marked, and the resulting file was sorted by coordinate order. Intervals were created using a custom bash script allowing parallel analysis of large batches of genomics data. Using the scatter-gather approach, HaplotypeCaller was executed in GVCF mode with the haploid ploidy flag. Variants were imported to GATK 4 GenomicsDB and hard filtered if QualByDepth (QD) < 2.0, FisherStrand (FS) > 60.0, root mean square mapping quality (MQ) < 40.0, genotype quality (GQ) ≥50, allele depth (AD) ≥ 0.8, or Coverage (DP) ≥ 10. To identify aneuploid chromosomes, depth of coverage was calculated for each of 206 fungal samples. Sorted BAM files prepared in the pre-processing phase of SNP calling were passed to Samtools v.1.2 (Li ) and mpileup files were generated. Read depth was normalized by total alignment depth and plotted against the location in the genome using 10 kb nonoverlapping sliding windows. To identify structural variation, assembly de novo was achieved using Spades v3.12 (Bankevich ) using default parameters.

Phylogenetic and population genetic analysis

To construct species-specific phylogenetic trees, all sites that were either a homozygous reference or SNP in every isolate were identified using ECATools (https://github.com/rhysf/ECATools) and concatenated into a FASTA file. Our rooted tree C. bracarensis included 1,198 phylogenetically informative sites, while the unrooted C. glabrata tree included 34,980 phylogenetically informative sites. Phylogenetic trees were constructed with RAxML PThreads v.7.7.8 (Stamatakis 2006) using the general-time-reversible model and CAT rate approximation with 100 bootstrap support, both with rooting to C. bracarensis AGP (Correia ) or midpoint rooting without C. bracarensis. We constructed a tree using the same models with 1000 bootstrap support for all Saccharomycetaceae species that had a genome assembly in NCBI or JGI Mycocosm. We also constructed neighbor-joining trees using PAUP v4.0b10 (Swofford n.d.) and a NeighborNet Network with SplitsTree v.4.15.1 (Huson 1998). Trees were visualized using FigTree v. 1.4.4 (http://tree.bio.ed.ac.uk/software/figtree/). A multisample variant call format (VCF) corresponding to all 151 genomes was made with VCFTools v0.1.12 vcf-merge (Danecek ) and converted to ped and map file formats for use in PLINK v1.90 (Purcell ). VCFTools was used to calculate genetic diversity metric pi, using the –site-pi parameter. Unsupervised ADMIXTURE (Alexander ) (settings: –haploid=“*” –s time) was run on a moderately linkage disequilibrium (LD)-pruned alignment (PLINK –indep-pairwise 60 10 0.1) for values of K between 1 and 35. A value of K = 20 provided the lowest cross-validation error. Principle component analysis was performed using SmartPCA v4 (Abraham and Inouye 2014). Consensus gene sequences for each isolate were generated, and genes FKS2 (CAGL0K04037g) positions 240-828, LEU2 (CAGL0H03795g), NMT1 (CAGL0A04059g), TRP1 (CAGL0C04092g), UGP1 (CAGL0L01925g), and URA3 (CAGL0I03080g) were used to identify known MLST for each of the isolates, and confirm that isolate CG57 was an unknown MLST and registered as the new ST204 on PubMLST (https://pubmlst.org/organisms/candida-glabrata [last accessed January 2022]) (Jolley ). We applied Weir’s estimator (Hudson and Kaplan 1985) of Wright’s Fixation Index (F) according to the equations given in Multilocus 1.3 (Agapow and Burt 2001) using nonoverlapping sliding windows. The scripts have been made available online (https://github.com/rhysf/FSTwindows).

Selection and microevolutionary analysis

The direction and magnitude of natural selection for each ST were assessed by measuring the rates of nonsynonymous substitution (dN), synonymous substitution (dS), and omega (ω = dN/dS) using the yn00 program of PAML (Yang 2007), which implements the Yang and Nielsen method, taking into account codon bias (Yang and Nielsen 2000). Further GC corrections were not applied. The program was run on every gene in each isolate using the standard nuclear code translation table. To examine the functional significance of genes with ω > 1, we evaluated their Pfam domains and gene ontology (GO) terms for statistical enrichment (genes with ω > 1 vs, the remaining genes) using the 2-tailed Fisher exact test with Storey false discovery rate (FDR)-corrected P-values (q) of < 0.05. GO terms were acquired using Blast2Go v6.0.1 (Conesa , p. 2) using Blastp-fast to the NCBI BLAST nr-database (E-value < 1E-5). Genes of interest were defined including both FKS and 12 ERG pathway genes, as well as all genes listed in Table 1 of Weig , which included adhesins including EPA genes, aspartic proteases, phospholipases, cell wall biogenesis, structural wall proteins, regulatory, efflux pumps. This gene list was then screened for genes with either signature of positive selection or those undergoing microevolutionary changes (nonsynonymous and frameshift indels). GO-term and PFAM enrichment [2-tailed Fisher exact test with FDR-corrected P-values (q) of < 0.05] for genes with dN/dS (ω) > 1, and genes with either microevolutionary frameshifts or nonsynonymous mutations across the 7 sets of serial isolates. The relative proportion (Rel. prop) was calculated as (number of terms in set 1/number of terms in set 2) × (genes with any terms in set 2/genes with any terms in set 1).

Results

Recombinant sequence types in Scottish clinical samples

Clinical isolates of C. glabrata from across Scotland are highly genetically diverse. Using whole-genome sequencing, we analyzed the genomes for 68 isolates of C. glabrata from 47 separate patients across 8 Scottish hospitals, generating the largest panel of C. glabrata genome sequences to date. These 68 isolates belonged to 20 separate sequence-types (STs) of C. glabrata, which represent genetically related sub-populations based on alleles from 6 loci/genes. One isolate (CG57 from a single patient in Forth Valley Royal Hospital) belonged to a new ST that has not been previously identified anywhere else (ST204) (Fig. 1, Supplementary Tables 1 and 2). Variant calling using the diploid model of GATK found few examples of heterozygosity (<0.41 per kb for every isolate) suggesting all isolates were haploid (Supplementary Table 3). Our panel of C. glabrata isolates was supplemented with a further 83 genomes from 3 recent studies of global C. glabrata isolates (Biswas ; Carreté ; Xu ), as well as sequences from the outgroup C. bracarensis (Correia ), which has also been identified from clinical settings and is the closest known relative of C. glabrata (Gabaldón ).

Fig. 1.

Candida glabrata isolates were collected across 8 health boards across Scotland in 2012, belonging to 20 separate sequence types, including the newly described ST204. Duplicate isolates stemming from the same patient at different time points have been excluded. Phylogenetic analysis of our Scottish collection along with the worldwide C. glabrata isolates revealed high genetic diversity among the 29 separate STs represented by our combined panel (Fig. 2, Supplementary Fig. 1). Allelic diversity among C. glabrata isolates [mean nucleotide diversity (π) = 0.00665, σ = 0.047 was higher than previously reported [π = 0.00485 based on the Internal Transcribed Spacer (ITS) of 29 strains (Irinyi )]. Our WGS-based calculation of C. glabrata π was the highest of any species in the Saccharomycetaceae that had both an available genome assembly and a calculation of π (albeit those are based on ITS sequences and fewer strains than we had) (Irinyi ) (Fig. 3a). However, C. glabrata genetic diversity was typical among the Saccharomycotina (mean/ = 0.0055, median = 0.0039, standard deviaton/σ = 0.0055) (Fig. 3b). Nucleotide diversity within the population was present across the nuclear genome (Fig. 3c), with window length having some impact on the result [smaller window lengths (5 kb) resulted in higher average π in approximately half of the genome: chromosomes A–F, M, and H]. Most of the allelic diversity across the 151 C. glabrata isolates came from the nuclear genome (min. = 0.09 SNPs/kb, max. = 6.54 SNPs/kb, = 5.55 SNPs/kb) compared with the mitochondrial genome (min. = 0.05 SNPs/kb, max. = 3.64 SNPs/kb, = 1.21 SNPs/kb). Indeed, a significant difference between nuclear SNPs/kb and mitochondrial SNPs/kb was found using a 2-tailed t-test for all 151 genomes (P = 5.6147E-111).

Fig. 2.

Fig. 3.

a) A RAxML phylogenetic tree with 1000 bootstrap support of all Saccharomycetaceae species that had a genome assembly in NCBI or JGI Mycocosm and nucleotide diversity (π). Note: C. glabrata is calculated from whole-genome sequence data presented in this study, while the other species are based on ITS sequences only (Irinyi ). b) π (based on ITS sequences only) for all Saccharomycotina and non-Saccharomycotina that are listed in the ISHAM ITS reference DNA barcoding database (Irinyi ). c) nonoverlapping 5, 10, and 20 kb windows of π (π for all sites in the genome divided by window length).

A NeighborNet network using SplitsTree, with ST labels replacing isolate names at the nodes. Green = found in Scotland, purple = not found in Scotland. The scale bar represents nucleotide substitutions per site. a) A RAxML phylogenetic tree with 1000 bootstrap support of all Saccharomycetaceae species that had a genome assembly in NCBI or JGI Mycocosm and nucleotide diversity (π). Note: C. glabrata is calculated from whole-genome sequence data presented in this study, while the other species are based on ITS sequences only (Irinyi ). b) π (based on ITS sequences only) for all Saccharomycotina and non-Saccharomycotina that are listed in the ISHAM ITS reference DNA barcoding database (Irinyi ). c) nonoverlapping 5, 10, and 20 kb windows of π (π for all sites in the genome divided by window length). Seven clade (C) delineations for C. glabrata were recently proposed (Carreté ), which were equivalent to pre-existing STs including C1 (ST19), C2 (ST7), C3 (ST8), C4 (ST22), C6 (ST136), and C7 (ST3). We found that C5 was polyphyletic, encompassing isolates belonging to the genetically divergent ST6, ST10, and ST15 (Supplementary Fig. 1). Therefore, we recommend the use of ST delineations rather than those clade delineations. Several C. glabrata STs had evidence of genetic recombination. Our neighbor-net network tree of all isolates suggested historic gene-flow between several STs including for example ST7, ST55, and ST162 (Fig. 2). Genomic regions with low Wright’s fixation index (F) values, consistent with genetic exchange, were also identified from pairwise comparisons (n = 435) across 5- and 10-kb nonoverlapping windows of all STs (Supplementary Fig. 2). F values calculated from 5-kb windows were slightly lower than those calculated from 10-kb windows (averaging −0.046 for each ST pairwise comparison), indicating that window length impacts this measure of genetic variation. Twelve pairwise comparisons from 10-kb windows had F < 0.9 across the genome (Supplementary Fig. 3), with the lowest for ST18 and ST26 (F = 0.64). In addition, ST7, ST55, and ST162 had lower F values across the genome (F = 0.65–0.83) than other pairwise comparisons demonstrating incompatible phylogenetic signals between these STs (Supplementary Fig. 2). PCA of whole-genome SNPs revealed little evidence of clustering of several C. glabrata STs, which is consistent with gene flow between them (Fig. 4a). For unsupervised model-based clustering with ADMIXTURE, we first identified that K = 20 had the lowest cross-validation error (Fig. 4b), and was therefore used for subsequent analysis. Two isolates were consistently (6 independent Admixture runs) found to have evidence for mixed ancestry: ST177 CG1 and our newly discovered ST204 CG57 (Fig. 4c, Supplementary Fig. 4). Other isolates were found to have evidence of mixed ancestry in the majority of runs including ST124 WM18.66, ST126 WM05.155, and ST8 M17.

Fig. 4.

Population genetics of C. glabrata ST. a) PCA of whole-genome SNPs using SmartPCA revealed little evidence of sub-clustering among STs (isolates are calculated and plotted individually, but labeled by their ST alone for clarity). SmartPCA failed to calculate the eigenvalues for some isolates including those belonging to ST4. b) The CV error from running unsupervised ADMIXTURE for variant-sites across the C. glabrata population, testing K-values between 1 and 35. K = 20 provided the lowest CV error. c) ADMIXTURE plot for all isolates using K = 20, revealing several isolates with evidence of mixed ancestry. Isolates are ordered according to the neighbor-joining tree constructed with PAUP in Supplementary Fig. 1. Only one of the Scottish isolates (CG46) had evidence for Chromosome Copy Number Variation (CCNV)/aneuploidy, found in chromosome C (Supplementary Fig. 5). Distributions of normalized chromosome read depths of chromosome C (average depth per 10 kb window = 0.68) differ significantly from the rest of the genome of CG46 (average depth per 10 kb window = 1.05; Kolmogorov–Smirnov test: P = 2.09E-25), with coverages of chromosome C significantly lower than in the rest of the genome (Wilcoxon rank-sum test: P = 1.353E-25). No other CCNVs were found, despite many isolates having been treated with antifungals that have previously been correlated with CCNV (Carreté ). Together, these results suggest occasional genetic recombination within the C. glabrata population, without an association with aneuploidy. Mating types and mating-type switching are poorly understood in C. glabrata, although it is thought that Mating-type regulatory protein α2 is expressed in all MTLα strains and not in MTLa strains (Srikantha ; Butler ). MATα2 (CAGL0B01265g) was present in all Scottish isolates (breadth of coverage; BOC > 87%). However, MATα1 appeared to be absent or partially absent in 3/9 ST6 isolates (CG12 = 16% BOC, CG121 = 18% BOC, CG42 = 12.5% BOC), while present in the remaining 6 ST6 isolates, and all the other STs (BOC 100%). The functional relevance of MATα1 deletion or truncation is unclear but may be a hallmark of the rarer of the 2 mating types.

Hypervariable mitochondrial genomes among sequence types

F analysis highlighted the mitochondrial genome of C. glabrata as hyper-variable (Supplementary Table 2, Supplementary Fig. 6). Forty-three genes were identified in ≥10 pairwise F comparisons, including all 11 mitochondrial protein-encoding genes. To explain this enrichment of low F mitochondrial genes, we studied the 151 genome alignments. While the nuclear genome had 97.3–99.4% BOC, the mitochondrial genome had 20.4–99.9% BOC, with 42% of isolates (n = 63/151) containing >10% ambiguous mitochondrial bases (2 kb) (Supplementary Table 2) (here, we define ambiguous as sites with too few reads aligning to be called by GATK, or reads that cannot be called by GATK due to not passing variant filtration). Surprisingly, a pattern of low and/or patchy read coverage was identified in every isolate including the ST15 reference isolate CBS138 (Supplementary Fig. 6), indicating that the reference mitochondrial sequence assembly (Koszul ) may have a high error rate, and given additional differences identified in nonreference isolates, that C. glabrata mitochondrial genomes are highly heterogenous. The mitochondrial genome for some C. glabrata isolates appears reduced in size and encodes fewer protein-encoding genes (Fig. 5). As many as 22/37 (59%) mitochondrial encoded genes were entirely absent in at least 1 isolate, including Cg1, Cg1II, and Cg1III (putative endonucleases of exons and introns in the mitochondrial COX1 gene), ATP8 and ATP9 (subunits 8 and 9 of the enzyme complex required for ATP synthesis), RPM1/RPR1 (RNA component of mitochondrial RNAse P), VAR1 (putative mitochondrial ribosomal protein of the small subunit,) and most of the tRNA genes (15/23). Nine separate STs had absent mitochondrial genes. Normalized depth of coverage was variable across the genes, with < 1 average normalized depth across all isolates for Cg1, Cg1II, and Cg1III, ATP8, RPM1, VAR1, and tRNA-Met1. While nonuniform coverage in terms of depth and breadth was found across all mitochondrial genomes belonging to all datasets, our newly sequenced isolates have the lowest mean breadth across mitochondrial genes ( = 92.18, σ = 20.07) compared with Biswas ( = 98.48, σ = 11.47) and Carreté ( = 97.03, σ = 14.11), suggesting there are some discrepancies between library preparation impacting mitochondrial read sequencing.

Fig. 5.

Breadth of coverage and depth of coverage across each of the 37 mitochondrial encoded genes for all 151 isolates compared in this study (Each point represents an isolate). a) Breadth of coverage as a % across each gene. b) The normalized depth of coverage for each gene (total read depth for each gene/total read depth across both nuclear and mitochondrial genomes). c) Breadth of coverage as a % across each gene, categorized by sequence types (ST)s. d) Normalized depth of coverage for each gene, categorized by ST. Notably, only 1/50 Biswas isolates (WM03.450) had entirely absent mitochondrial genes compared with 8/32 Carreté isolates and 15/68 of our newly sequence isolates. Total sequencing depth can be ruled out as the main cause for low mitochondrial coverage, given Carreté had the highest sequencing depth ( = 360X) and had many isolates with absent mitochondrial genes, compared with Biswas ( = 74X) and ours ( = 42X). We used assembly de novo to further explore the mitochondrial sequence for isolate WM03.450 (ST83), which had the greatest number of ambiguous bases across its mitochondrial genome (16 kb/80%). Our WM03.450 Illumina-based assembly (12.9 Mb; N.contigs = ∼3 thousand; N50 = 85 kb) is 562 kb longer than the CBS138 reference sequence, indicating substantial genomic differences between these isolates and STs. Aligning our assembly to the reference CBS138 mitochondrial genome using Blastn identified 10 contig matches with a combined alignment length of only 1.9 kb (mean 157 nt per contig), suggesting the low alignment is not due to conserved nucleotide sequences that have undergone large rearrangements. Aligning the assembly to the 11 mitochondrial protein sequences using Blastx identified only 6/11 genes across 6 separate contigs, 4 of which were <364 nt length, and 2 that are 10.4 and 81.5 kb. Conversely, assembly de novo and Blastx of our Illumina reads for the reference isolate CBS138 against the published CBS138 genome identified all 11 mitochondrial genes present on 4 contigs, with 18.9 kb total sequence length, of which Blastn aligned 9.3 kb to the published mitochondrial assembly. Together, these analyses suggest that whole gene deletions in the C. glabrata mitochondria are common.

Signatures of selection identified among sequence types

In contrast to the C. glabrata mitochondrial genome, we found that gene deletions in the nuclear genome are rare. Indeed, fewer than 6 presence/absence (P/A) polymorphisms (strictly defined as zero reads aligning) were identified per isolate (∼0.1% of 5,210 protein-encoding genes) (Supplementary Table 4). Of these, 2 consecutive nuclear-encoded genes (CAGL0A02255g and CAGL0A02277g) on chromosome A were entirely absent of read coverage in 25 out of the 68 Scottish isolates (37%), which included all representatives of 11 separate STs (ST4, 7, 8, 24, 25, 55, 67, 83, 177, and our newly described 204). These STs do not cluster phylogenetically, ruling out a single evolutionary event causing this deletion. The 2 genes have identical nucleotide sequences and encode the same amino acid sequence, which is conserved across a range of other Ascomycota species, as well as having sequence similarity to the K62 Killer Preprotoxin protein encoded by the Saccharomyces paradoxus L-A virus M62 satellite (BLASTp E-value = 1e-36), suggesting a possible viral origin. CAGL0F09273g is a separate, putative adhesin-like protein (adhesin cluster V) with a “hyphally regulated cell wall protein N-terminal” PFAM that is lost in 11 isolates including all ST4 (CG68A and CG77), 4 ST7 (CG157, CG48A, CG48F, and CG78), 3 ST8 (CG127, CG52, and CG82), ST19 CG119, ST24 CG166, and ST147 CG133. Again, this gene must have been lost multiple times, given its presence in several ST7 and ST8 isolates. This gene is the last gene on chromosome F, has an π = 0.00244, which is lower than the overall average across the genome, and has previously been reported to undergo CCNV within serial clinical isolates (Carreté ), suggesting it is able to undergo variation within microevolutionary timescales, which may impact the adhesive ability of these C. glabrata isolates. Between 61 and 85 genes with a signature of positive selection [dN/dS = ω, and ω > 1 (Kryazhimskiy and Plotkin 2008)] were found in each ST apart from the reference ST (ST15 CG151), for which only a single gene with ω > 1 (ω = 1.0019) was identified (Supplementary Table 5). Apart from the reference ST, STs had between 4 and 14 genes with ω > 2, showing stronger signatures of diversifying or positive selection. Of the 2,083 total genes with ω > 1 across all clades, 608 were unique genes (11.6% of all genes) i.e. they had this signature in multiple clades, owing to either ancestry or selection acting on the same gene families. To explore selection, we took an unbiased approach using PFAM and GO-term enrichment comparing the numbers of each term in those 608 genes compared with the remaining nonselected genes, as well as a targeted approach for genes of interest (see Materials and methods: Selection and microevolutionary analysis) including adhesins, proteases, efflux pumps, FKS, and ERG pathway genes. Genes with signatures of positive selection within the C. glabrata population targets diverse genes and gene functions. Our unbiased approach for enrichment of functional domains in 608 gene products with signatures of positive selection identified only 3 significantly enriched [2-tailed Fisher exact test with FDR-corrected P-values (q) of < 0.05] PFAM domains and 16 GO terms (Table 1). The enriched PFAM domains were (1) Flocculin repeat (PF00624.20; q = 7.42E-17), (2) GLEYA domain (PF10528.11; q = 0.01), and (3) Armadillo repeat (PF00514.25; q = 0.02). Flocculin is a sub-telomeric gene family involved in flocculation or cell aggregation in S. cerevisiae (Teunissen and Steensma 1995), while GLEYA domains are present in C. glabrata EPA proteins. Thirty Flocculin PFAM domains were assigned to only 6 genes in C. glabrata, 2 of which have ω > 1: CAGL0I07293g and CAGL0I00220g, and together account for 23/30 Flocculin repeat PFAMs. Enriched GO-terms covered a range of possible biological functions including ribosomal/RNA-binding and mitochondrial structural proteins.

Table 1

GO-term and PFAM enrichment [2-tailed Fisher exact test with FDR-corrected P-values (q) of < 0.05] for genes with dN/dS (ω) > 1, and genes with either microevolutionary frameshifts or nonsynonymous mutations across the 7 sets of serial isolates.

Category
dN/dS > 1	GO/PFAM term	Genes ω < 1	Genes ω > 1	Fisher P	Q-value	Rel. prop	GO/PFAM description
	GO: 0003723	428	25	2.26E-04	3.04E-02	1.95	RNA binding
	GO: 0003735	152	3	7.38E-05	1.49E-02	5.78	Structural constituent of ribosome
	GO: 0003824	1745	155	8.61E-05	1.64E-02	1.28	Catalytic activity
	GO: 0005515	1100	81	4.48E-06	1.41E-03	1.55	Protein binding
	GO: 0005740	278	11	5.08E-05	1.06E-02	2.88	Mitochondrial envelope
	GO: 0005759	160	4	1.90E-04	2.65E-02	4.56	Mitochondrial matrix
	GO: 0006412	287	11	2.68E-05	6.05E-03	2.98	Translation
	GO: 0019693	100	1	3.63E-04	4.35E-02	11.4	Ribose phosphate metabolic process
	GO:0022626	87	0	1.16E-04	1.99E-02	N/A	Cytosolic ribosome
	GO:0022857	277	13	4.08E-04	4.70E-02	2.43	Transmembrane transporter activity
	GO:0036094	724	52	2.58E-04	3.31E-02	1.59	Small molecule binding
	GO:0043168	689	48	1.89E-04	2.65E-02	1.64	Anion binding
	GO:0044281	498	27	1.57E-05	3.86E-03	2.1	Small molecule metabolic process
	GO:0044391	143	3	2.32E-04	3.05E-02	5.44	Ribosomal subunit
	GO:0071840	1325	113	2.76E-04	3.47E-02	1.34	Cellular component organization or biogenesis
	GO:1901362	364	19	1.92E-04	2.65E-02	2.18	Organic cyclic compound biosynthetic process
	PF00624.20	4	23	6.53E-20	7.42E-17	0.02	Flocculin repeat
	PF10528.11	12	10	1.87E-05	1.07E-02	0.13	GLEYA domain
	PF00514.25	8	8	5.72E-05	2.17E-02	0.11	Armadillo repeat
Microevolution (Frameshift)	GO/PFAM term	Genes without frameshift	Genes with frameshift	Fisher P	Q-value	Rel. prop	GO/PFAM description
	GO:0009986	30	9	7.53E-12	4.26E-08	0.03	Cell surface
	GO:0009987	3616	17	1.20E-05	3.38E-02	1.7	Cellular process
	PF05001.15	0	17	1.10E-39	1.25E-36	0	RNA polymerase Rpb1 C-terminal repeat
	PF10528.11	13	9	5.54E-15	3.15E-12	0.01	GLEYA domain
	PF00399.21	24	9	4.09E-13	1.55E-10	0.02	Yeast PIR protein repeat
	PF08238.14	14	6	2.64E-09	7.49E-07	0.02	Sel1 repeat
	PF11765.10	6	3	2.58E-05	5.86E-03	0.01	Hyphally regulated cell wall protein N-terminal
	PF09770.11	0	2	4.79E-05	9.08E-03	0	Topoisomerase II-associated protein PAT1
Microevolution (nonsynonymous)	GO/PFAM term	Genes without nonsynonymous	Genes with nonsynonymous	Fisher P	Q-value	Rel. prop	GO/PFAM description
	GO:0009986	28	11	7.88E-10	4.45E-06	0.06	Cell surface
	PF10528.11	12	10	3.14E-11	3.58E-08	0.03	GLEYA domain
	PF11765.10	4	5	1.03E-06	5.87E-04	0.02	Hyphally regulated cell wall protein N-terminal

The relative proportion (Rel. prop) was calculated as (number of terms in set 1/number of terms in set 2) × (genes with any terms in set 2/genes with any terms in set 1).

Our targeted approach highlighted 21/129 genes of interest that have ω > 1, with at least 1 found in every ST apart from the reference ST15 and ST46 (Supplementary Table 6). Notably, none of the aspartic proteases, phospholipases, cell wall biogenesis, efflux pumps, ergosterol biosynthesis pathway genes or FKS genes were found to have hallmarks of positive selection, implying these are conserved within the population. Several genes with ω > 1 were found in multiple STs, including adhesive protein CAGL0J01727g (adhesin cluster VI) that is under positive selection in 7 STs (18, 26, 36, 45, 147, 177, and 204) and adhesive protein CAGL0I07293g (adhesin cluster V) under positive selection in 7 mostly distinct STs (3, 8, 25, 83, 123, 136, and 177). C. glabrata encodes 17 putative adhesive proteins without N-terminal signal peptides, casting doubt on their role in adhesin. One of these is a pseudogene (CAGL0E00110g) with ω > 1 in 13/29 STs. The structural cell wall protein AWP7 belonging to the Srp1p/Tip1p family was under selection in 7 STs.

Candida glabrata nosocomial in-patient microevolution targets pathogenicity factors and drug targets

Our Scottish C. glabrata panel included 7 sets of between 2 and 9 isolates from recurrent cases of candidiasis. To explore the microevolution of C. glabrata within a human host, and the effects of antifungal treatment (fluconazole, nystatin, and posaconazole) on fungal genetics, we documented all genetic changes between serial isolates (Table 2, Supplementary Table 7). Although exact dates of isolation were not documented, phylogenetic analysis (Supplementary Fig. 1, Fig. 6) confirmed these serial isolates were highly related, with between 64 and 140 mutations (1.13468 × 10−5 per base pair) identified between pairs of serial isolates (Fig. 6). While the mutation rate or generation time for C. glabrata is not known (Carreté ), this small number of mutations likely suggests recent clonal origins appropriate for microevolutionary analysis. Serial isolates had an estimated time between isolation (based on blood culture dates) between 0 and 239 days (mean 15 days). Five serial isolates from 4 separate patients/cases showed MIC changes from the earlier sampled isolate (Table 3), including 2 increases (CG107A–B + 8 µg/ml, CG97B–C + 4 µg/ml), 1 decrease (CG84G–H −4 µg/ml,) and 1 large transient increase (CG93A–E = 4 µg/ml; CG93H, I, K > 64 µg/ml, CG93K = 4 µg/ml).

Table 2

Summary of microevolution across 7 sets of between 2 and 9 C. glabrata isolates from recurrent cases of candidiasis.

Case ID	Initial	Relapse	ST	All Mutations	Coding	Noncoding	Coding Indel	Coding Indel (frameshift)	Coding (Non.Syn.)	Coding Nonsense	Coding (Syn.)	Coding (revert to ref.)
1	CG107A	CG107B	36	97	83	14	18	9	16	0	17	32
2	CG18A	CG18B	10	140	127	13	20	7	24	0	20	63
3	CG191A	CG191B	10	64	53	11	13	4	11	0	8	21
3	CG191B	CG191C	10	78	66	12	14	9	10	0	8	34
3	CG191C	CG191D	10	83	72	11	17	12	14	0	10	31
3	CG191D	CG191E	10	96	85	11	14	4	12	0	9	50
3	CG191E	CG191F	10	87	77	10	23	16	19	0	17	18
4	CG48A	CG48F	7	92	79	13	10	3	11	0	13	45
5	CG84F	CG84G	67	76	64	12	17	5	8	0	5	34
5	CG84G	CG84H	67	71	59	12	21	5	9	0	3	26
6	CG93A	CG93B	162	125	114	11	20	4	15	0	16	63
6	CG93B	CG93C	162	119	105	14	23	10	23	0	18	41
6	CG93C	CG93D	162	124	110	14	19	7	16	0	9	66
6	CG93D	CG93E	162	112	97	15	24	10	20	1	12	40
6	CG93E	CG93H	162	119	105	14	21	5	20	0	18	46
6	CG93H	CG93I	162	116	102	14	20	6	13	0	14	55
6	CG93I	CG93J	162	96	82	14	22	13	13	0	11	36
6	CG93J	CG93K	162	135	120	15	21	11	22	1	28	48
7	CG97A	CG97B	25	79	66	13	14	3	15	0	10	27
7	CG97B	CG97C	25	86	75	11	11	4	12	0	18	34

We documented 1,995 mutations between all serial isolates, which were either in protein-coding regions (Coding) or Intergenic and intron regions (noncoding). Coding mutations were further characterized into Coding Indels, some of which caused frameshifts [Coding Indel (frameshift)], nonsynonymous mutations (Coding Non.Syn.), nonsense mutations, (Coding Nonsense), synonymous mutations (Coding Syn.), and bases that reverted back to the ST15 CBS138 reference base (either from a previous microevolutionary change or a pre-existing variant between the initial isolate and the reference ST15 CBS138).

Fig. 6.

Table 3

MIC values of fluconazole for each of the serial isolates.

Case ID	Strain	MIC (µg/ml)	Change
1	107a	8
1	107b	16	8
2	18a	8
2	18b	8
3	191a	8
3	191b	8
3	191c	8
3	191d	8
3	191e	8
3	191f	8
4	48a	4
4	48f	4
5	84f	8
5	84g	8
5	84h	4	−4
6	93a	4
6	93b	4
6	93c	4
6	93d	4
6	93e	4
6	93h	>64	>60
6	93i	>64
6	93j	>64
6	93k	4	−60
7	97a	4
7	97b	4
7	97c	8	4

Microevolutionary changes across 7 sets of C. glabrata isolates. a) a RAxML phylogenetic tree of the serial isolates using the general-time-reversible model and CAT rate approximation with 100 bootstrap support. Branch lengths indicate the mean number of changes per site. b) The number of serial mutations total (All), those within protein-coding sequence (CDS), intergenic and intronic regions. c) Those same serial mutations per kb [calculated as the count of serial mutations divided by the total length of the feature (where All = whole genome) and multiplied by 1,000]. d) Serial mutations within CDS categorized by their effect on the sequencing: Insertion/Deletion (Indel), synonymous mutation (Syn.), nonsynoynmous mutation (Non.Syn.), and nonsense mutation. e) Those same serial mutations within CDS per kb. Summary of microevolution across 7 sets of between 2 and 9 C. glabrata isolates from recurrent cases of candidiasis. We documented 1,995 mutations between all serial isolates, which were either in protein-coding regions (Coding) or Intergenic and intron regions (noncoding). Coding mutations were further characterized into Coding Indels, some of which caused frameshifts [Coding Indel (frameshift)], nonsynonymous mutations (Coding Non.Syn.), nonsense mutations, (Coding Nonsense), synonymous mutations (Coding Syn.), and bases that reverted back to the ST15 CBS138 reference base (either from a previous microevolutionary change or a pre-existing variant between the initial isolate and the reference ST15 CBS138). MIC values of fluconazole for each of the serial isolates. Mutations identified between serial isolates were mostly in protein-coding sequence (CDS) regions (between 53 and 127 mutations per pairs of serial isolates, collectively adding up to 1,741/1,995 total mutations = 87%), despite protein-coding regions taking up only 7.9/12.3 Mb (64%) (Fig. 6, b and c). The remaining serial mutations were within intergenic regions (236 mutations; 12%) and intronic regions (18 mutations; 1%). Intronic regions had the highest count of serial mutations after accounting for the total sequence in introns (Fig. 6c), albeit with ≤3 found per pair of serial isolates. Hypergeometric tests revealed that the number of mutations in coding sequence compared with noncoding sequence was higher than expected by chance, suggesting a highly significant enrichment of mutations in protein-coding genes (P = 3e-120). To explore the 1,741 microevolutionary changes within coding regions, we categorized them into 5 groups of newly arising mutations (regardless of prior state): (1) insertions/deletions (indels) (n = 362; 21%), (2) synonymous mutations (n = 264; 15%), (3) nonsynonymous mutations (n = 303, 17%), (4) nonsense mutations (n = 2), and (5) reversion back to reference base (n = 810; 47%). Of the indels, 147/362 (41%) were frameshifts that disrupted 54 genes. Nonsynonymous mutations were detected in 139 genes (Fig. 6, d and e). Enrichment for PFAM/GO-terms of these genes with frameshift and nonsynonymous mutations [2-tailed Fisher exact test with FDR-corrected P-values (q) of < 0.05] revealed 3 enriched GO-terms and 8 enriched terms (Table 1). Both categories (frameshifts and nonsynonymous mutations) were enriched for GO: 0009986 cell surface (q = 3.21E-08 and 1.09E-06, respectively), suggesting that C. glabrata undergoes rapid mutations in several of its cell surface proteins during prolonged/serial blood stream infections. Enriched PFAM terms included the “RNA polymerase RPB1 C-terminal repeat” for genes with frameshift indels (q = 1.25E-36), GLEYA domains for genes with either frameshift (q = 3.15E-12) or nonsynonymous mutations (q = 3.58E-08). Several repeat-associated PFAMs and the “Hyphally regulated cell wall protein N-terminal” domain were enriched for nonsynonymous mutations (q = 5.87E-04). Several genes of interest (see Materials and methods: Selection and microevolutionary analysis) had microevolutionary changes (n = 29/129) (Supplementary Table 8). Notably, 1 of the 2 newly acquired nonsense mutations was identified in FKS2 (Case 6 J–K), coinciding with a substantial drop in fluconazole MIC (Table 3). The other was in the uncharacterized CAGL0K04631g at an earlier time point in the same patient (Case 6 D–E). Twenty adhesins including EPA genes were mutated between serial isolates, including in all 7 sets/cases of isolates and at every time point. For example, EPA3 had 5 indels in Case 1 (A–B), a synonymous mutation in Case 2 [A–B; nucleotide position (pos.) 2304], Case 3 (D–E; pos. 1539), Case 4 (A–F; pos. 1119), Case 5 (F–G; pos. 2259), a nonsynonymous mutation (pos. 2224) and large (30 nt) insertion in Case 5 (G–H), 2 large deletions (42 and 16 nt), and 2 synonymous and 1 nonsynonymous mutations in Case 6 (A–B; pos. 1002, 2319 and 2276, respectively). The longer 42 nt deletion from Case 6 (A–B) reverts back to reference in Case 6 (B–C), suggesting either (1) a nondescendent isolate (intra-host variation), (2) a false-negative reference in 6C or (3) a false-positive deletion in 6A. The same 42 nt deletion, along with a new insertion at the site of the previous synonymous mutation appears in Case 6 (C–D), thereby suggesting the variant is real and option c less likely. That 42 nt deletion reverts back to reference in Case 6 (D–E), and appears again in Case 6 (E–H). By Case 6 (H–I), the gene has a new synonymous mutation, and in Case 6 (I–J) it has accumulated a new 15 nt deletion. EPA3 is therefore a hot-spot of variation. Another EPA gene that accumulated a large number of mutations was AWP12, which accumulated 5 nonsynonymous mutations and 1 synonymous mutation (Case 6H–I). Other genes that had accumulated mutations between serial isolates included those encoding an aspartic protease YPS5, several structural wall proteins belonging to the Srp1/Tip family, regulatory protein PDR1, the ergosterol synthesis gene ERG4 (a nonsynonymous mutation in Case 3A–B), and both FKS1 and FKS2. Therefore, C. glabrata genes that are antifungal targets and gene families involved in drug-resistance and pathogenicity can therefore undergo rapid mutation within a human host.

Discussion

In this study, we sequenced and analyzed the largest panel of C. glabrata genomes to date. These isolates were collected from blood-stream infections of patients at several Scottish hospitals in 2012. Our 68 genomes were analyzed alongside 83 further publicly available and globally isolated genomes (Biswas ; Carreté ; Xu ), revealing greater genetic diversity than previously recognized, including a nucleotide diversity of 0.00665, which is much higher than has been calculated for the distantly related C. albicans at 0.00298 (Irinyi ). Surprisingly, we found that only one of our Scottish isolates had evidence of aneuploidy, despite many having been treated with antifungals, which has previously been correlated (Carreté ). Chromosome C in CG46 had lower depth of coverage compared with the rest of the genome, perhaps due to chromosome loss in a subset of cells. The patient that CG46 was isolated from was initially treated with Fluconazole. Following resistance to Fluconazole being detected, the patient was subsequently treated with Caspofungin, suggesting a potential link between those antifungal treatments and the observed aneuploidy. We found that the mitochondrial genome of C. glabrata was hyper-diverse compared with its nuclear genome for many isolates, including several long deletions spanning one or more genes, with the potential to impact many important biological functions including drug resistance and persistence (Garcia-Rubio ). High levels of variation in mitochondrial genomes within the major fungal phyla have previously been noted in terms of gene order, genome size, composition of intergenic regions, presence of repeats, introns, associated ORFs, and evidence for mitochondrial recombination in all fungal phyla (Aguileta ). This variation is lacking in Metazoa (Aguileta ). Our results suggest some of these types of mitochondrial genetic diversity are likely present within the C. glabrata population. Isolates in this study belonged to 29 separate STs of C. glabrata, each of which was separated by large number of variants. However, as many as 193 MLST STs have been documented (Jolley ). Therefore, it is likely that the true genetic diversity of C. glabrata is much higher than we have been able to calculate with whole-genome sequences (albeit the largest panel studied to date). Indeed, several further STs may yield further evidence of recombination or lack of, and may ultimately require a new effort to group STs into lineages (also dependent on the frequency of recombination that erode these divisions). The genetic diversity of C. glabrata in hospitals around Scotland is extremely high, with representatives from 20 STs. Such high genetic diversity (and many of the same STs) have also been found from genome sequencing and phylogenetic analysis of isolates collected in other countries such as Australia (Biswas ), suggesting they must have been transported across or between continents, perhaps by anthropogenic or even natural means [for example its association with birds (Cafarchia ; Al-Yasiri ) and food (de Melo Pereira )]. Greater sampling and genotyping of clinical and environmental isolates will be required for understanding ancestry or endemicity. Candida glabrata has long been regarded as a haploid asexual yeast, although evidence has recently emerged of a cryptic sexual cycle (Wong ; Dodgson ; Gabaldón and Fairhead 2019). Our genome sequencing and population genetics supports this work, revealing compelling evidence that at least 12 STs stem from recombination between other STs. However, further work remains to document and describe individual recombinant isolates. Providing genetic recombination between isolates is naturally occurring, the mechanisms of genetic exchange are also unknown, although likely relate to the conserved mating-type locus, which play a central role in the sexual cycle of diverse fungi (Fraser and Heitman 2003). Here, we show that the MATα1 gene was absent or partially absent in 3 isolates belonging to ST6, which could potentially impact or even be a hallmark of a rarer second mating type of C. glabrata. Together, genetic recombination among C. glabrata isolates appears much more common than previously recognized, and likely contributes to increased genetic diversity. The nuclear genome for isolates belonging to every ST (apart from the reference ST15 that was included as a control) included evidence of positive or diversifying selection. Signatures of positive selection were found enriched in genes with diverse functions, including several with repeat domains, as well as EPA genes with GLEYA domains. EPA genes are a large sub-telomeric family of virulence-related surface glycoprotein-encoding genes encoded by several other pathogens including Plasmodium, Trypanosoma, and Pneumocystis (De Las Peñas ). Such gene differences between STs of C. glabrata may result in clinically relevant phenotypic differences. In host microevolutionary changes between serial isolates were enriched within coding-sequences, which is a surprise, given the expectation for intergenic regions to be more permissive to mutations due to relaxed selection within intergenic regions and purifying selection within coding sequence. The reason for this abundance of serial mutations in coding sequence is unclear, although it could potentially be technical (e.g. false-negative variants within repetitive sequences) or biological (e.g. drug exposure and host immune pressure). Alternatively, enriched mutations in genes could potentially be driven by processes such as DNA polymerase induced mutations, or differences in chromatin states (e.g. heterochromatin could lead to increased exposure to DNA damaging agents resulting in higher mutation rates, or conversely, greater surveillance and correction of mutations in euchromatin regions by cellular DNA repair enzymes, Makova and Hardison 2015). Selection may explain why we identified similar numbers of nonsynonymous mutations to synonymous mutations, given random mutations are expected to be nonsynonymous in ∼2/3 nucleotides of each codon. Furthermore, accumulations of deleterious mutations could be occurring in the serial isolates due to small population sizes, although population size estimates could not be calculated accurately from the metadata. Genes with GLEYA domains including EPA genes were significantly enriched for frameshift and nonsynonymous mutations in the coding sequence between serial isolates. Combined with our finding of positive selection in EPA genes across STs, suggests that EPA genes are undergoing variation at both longer time frames and microevolutionary time-scales. Genes encoding several important drug targets also underwent mutations between serial isolates, including a nonsynonymous mutation in the ergosterol biosynthesis pathway gene ERG4, and a nonsense mutation in the 1,3-β-glucan synthase component FKS2 [mutations in these genes can confer resistance to azoles (Feng ) and echinocandins (Katiyar , p. 1), respectively]. Indeed, the nonsense mutation in FKS2 coincided with a marked drop in fluconazole MIC for isolate CG93K, suggesting a possible link. Our study highlights the need for further sampling and genomic analysis of C. glabrata in order to better inform the population structure and mechanisms underlying its increasing emergence, pathogenicity and multi-drug resistance. While we have largely focused on differences among the conserved regions of the C. glabrata ST15 CBS138 genome using an alignment-based strategy, our discoveries of a hyper-diverse mitochondrial sequence highlight the value for future long-read sequencing and assemblies to characterize the pan-genomes of C. glabrata and structural genomic diversity that exists among and perhaps within STs, and to explore the mechanisms driving those changes. Furthermore, given the genetic diversity between STs that we document, it would likely be valuable to sequence and assemble additional high-quality reference sequences for the purposes of increasing variant-calling accuracy and quantifying gene content between different STs. Given the low and patchy alignment depth across the ST15 CBS138 mitochondrial sequence for that same isolate, a review and update for the published CBS138 mitochondrial genome is likely required as well. Indeed, high (∼0.5–1%) frequencies of structural variation in the nuclear genomes of C. glabrata isolates was recently found using de novo assemblies from long single-molecule real-time reads (Xu ). The rapidity that C. glabrata can mutate important genes and gene families, both via microevolution and putative recombination highlights an obstacle for future drug-development, given that individual gene targets are able to mutate within short time spans, and substantial diversity already present between STs. In addition, the epidemiology of C. glabrata is poorly understood. Future sampling and genomic comparison studies are necessary to identify the routes and mechanisms of its spread and evolution.

Data availability

Raw sequences for all haploid isolates of C. glabrata from this study have been deposited in the NCBI Sequence Read Archive (SRA) under BioProject PRJNA669061. Supplemental material is available at GENETICS online. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file.

63 in total

1. SplitsTree: analyzing and visualizing evolutionary data.

Authors: D H Huson
Journal: Bioinformatics Date: 1998 Impact factor: 6.937

2. Mutations and/or Overexpressions of ERG4 and ERG11 Genes in Clinical Azoles-Resistant Isolates of Candida albicans.

Authors: Wenli Feng; Jing Yang; Zhiqin Xi; Zusha Qiao; Yaping Lv; Yiru Wang; Yan Ma; Yanqing Wang; Wen Cen
Journal: Microb Drug Resist Date: 2016-12-15 Impact factor: 3.431

3. International Society of Human and Animal Mycology (ISHAM)-ITS reference DNA barcoding database--the quality controlled standard tool for routine identification of human and animal pathogenic fungi.

Authors: Laszlo Irinyi; Carolina Serena; Dea Garcia-Hermoso; Michael Arabatzis; Marie Desnos-Ollivier; Duong Vu; Gianluigi Cardinali; Ian Arthur; Anne-Cécile Normand; Alejandra Giraldo; Keith Cassia da Cunha; Marcelo Sandoval-Denis; Marijke Hendrickx; Angela Satie Nishikaku; Analy Salles de Azevedo Melo; Karina Bellinghausen Merseguel; Aziza Khan; Juliana Alves Parente Rocha; Paula Sampaio; Marcelo Ribeiro da Silva Briones; Renata Carmona e Ferreira; Mauro de Medeiros Muniz; Laura Rosio Castañón-Olivares; Daniel Estrada-Barcenas; Carole Cassagne; Charles Mary; Shu Yao Duan; Fanrong Kong; Annie Ying Sun; Xianyu Zeng; Zuotao Zhao; Nausicaa Gantois; Françoise Botterel; Barbara Robbertse; Conrad Schoch; Walter Gams; David Ellis; Catriona Halliday; Sharon Chen; Tania C Sorrell; Renaud Piarroux; Arnaldo L Colombo; Célia Pais; Sybren de Hoog; Rosely Maria Zancopé-Oliveira; Maria Lucia Taylor; Conchita Toriello; Célia Maria de Almeida Soares; Laurence Delhaes; Dirk Stubbe; Françoise Dromer; Stéphane Ranque; Josep Guarro; Jose F Cano-Lira; Vincent Robert; Aristea Velegraki; Wieland Meyer
Journal: Med Mycol Date: 2015-03-22 Impact factor: 4.076

4. Evidence for recombination in Candida glabrata.

Authors: Andrew R Dodgson; Claude Pujol; Michael A Pfaller; David W Denning; David R Soll
Journal: Fungal Genet Biol Date: 2005-01-22 Impact factor: 3.495

5. Evidence implicating phospholipase as a virulence factor of Candida albicans.

Authors: A S Ibrahim; F Mirbod; S G Filler; Y Banno; G T Cole; Y Kitajima; J E Edwards; Y Nozawa; M A Ghannoum
Journal: Infect Immun Date: 1995-05 Impact factor: 3.441

6. Opportunistic fungal pathogen Candida glabrata circulates between humans and yellow-legged gulls.

Authors: Mohammed Hashim Al-Yasiri; Anne-Cécile Normand; Coralie L'Ollivier; Laurence Lachaud; Nathalie Bourgeois; Stanislas Rebaudet; Renaud Piarroux; Jean-François Mauffrey; Stéphane Ranque
Journal: Sci Rep Date: 2016-10-26 Impact factor: 4.379

7. Fungal clones win the battle, but recombination wins the war.

Authors: André Drenth; Alistair R McTaggart; Brenda D Wingfield
Journal: IMA Fungus Date: 2019-10-29 Impact factor: 3.515

8. Whole Genome Sequencing of Australian Candida glabrata Isolates Reveals Genetic Diversity and Novel Sequence Types.

Authors: Chayanika Biswas; Vanessa R Marcelino; Sebastiaan Van Hal; Catriona Halliday; Elena Martinez; Qinning Wang; Sarah Kidd; Karina Kennedy; Deborah Marriott; C Orla Morrissey; Ian Arthur; Kerry Weeks; Monica A Slavin; Tania C Sorrell; Vitali Sintchenko; Wieland Meyer; Sharon C-A Chen
Journal: Front Microbiol Date: 2018-12-03 Impact factor: 5.640

9. Genome Comparisons of Candida glabrata Serial Clinical Isolates Reveal Patterns of Genetic Variation in Infecting Clonal Populations.

Authors: Laia Carreté; Ewa Ksiezopolska; Emilia Gómez-Molero; Adela Angoulvant; Oliver Bader; Cécile Fairhead; Toni Gabaldón
Journal: Front Microbiol Date: 2019-02-12 Impact factor: 5.640

10. One year prospective survey of Candida bloodstream infections in Scotland.

Authors: Frank C Odds; Mary F Hanson; Amanda D Davidson; Mette D Jacobsen; Pauline Wright; Julie A Whyte; Neil A R Gow; Brian L Jones
Journal: J Med Microbiol Date: 2007-08 Impact factor: 2.472