Literature DB >> 34909054

Comparative genomic analysis of clinical Candida glabrata isolates identifies multiple polymorphic loci that can improve existing multilocus sequence typing strategy.

A Arastehfar1, M Marcet-Houben2,3,4, F Daneshnia1, S J Taj-Aldeen5, D Batra6, S R Lockhart6, E Shor1,7, T Gabaldón2,3,4, D S Perlin1,7,8.   

Abstract

Candida glabrata is the second leading cause of candidemia in many countries and is one of the most concerning yeast species of nosocomial importance due to its increasing rate of antifungal drug resistance and emerging multidrug-resistant isolates. Application of multilocus sequence typing (MLST) to clinical C. glabrata isolates revealed an association of certain sequence types (STs) with drug resistance and mortality. The current C. glabrata MLST scheme is based on single nucleotide polymorphisms (SNPs) at six loci and is therefore relatively laborious and costly. Furthermore, only a few high-quality C. glabrata reference genomes are available, limiting rapid analysis of clinical isolates by whole genome sequencing. In this study we provide long-read based assemblies for seven additional clinical strains belonging to three different STs and use this information to simplify the C. glabrata MLST scheme. Specifically, a comparison of these genomes identified highly polymorphic loci (HPL) defined by frequent insertions and deletions (indels), two of which proved to be highly resolutive for ST. When challenged with 53 additional isolates, a combination of TRP1 (a component of the current MLST scheme) with either of the two HPL fully recapitulated ST identification. Therefore, our comparative genomic analysis identified a new typing approach combining SNPs and indels and based on only two loci, thus significantly simplifying ST identification in C. glabrata. Because typing tools are instrumental in addressing numerous clinical and biological questions, our new MLST scheme can be used for high throughput typing of C. glabrata in clinical and research settings.
© 2021 Westerdijk Fungal Biodiversity Institute. Production and hosting by ELSEVIER B.V.

Entities:  

Keywords:  Candida glabrata; MLST; Sequence type; Whole-genome sequencing

Year:  2021        PMID: 34909054      PMCID: PMC8640552          DOI: 10.1016/j.simyco.2021.100133

Source DB:  PubMed          Journal:  Stud Mycol        ISSN: 0166-0616            Impact factor:   16.097


Introduction

Candida species are among the most prevalent mycobiome constituents (Romo & Kumamoto 2020) and also a major cause of invasive fungal infections in humans worldwide (Brown ). It has been estimated that > 400 000 life-threatening infections are caused by Candida albicans alone (Brown ). Although C. albicans once was the most prevalent cause of candidemia, recent studies have revealed a stark increase in candidemias caused by non-albicans Candida (NAC) species, especially C. glabrata, C. parapsilosis, and C. tropicalis (Astvad , Lamoth et al., 2018, Fuller ; Pfaller et al., 2019, Song , Stavrou et al., 2019; Won ). Candida glabrata has been identified as the second leading cause of candidemia in the US (Pfaller , Pfaller et al., 2019, Tsay ), Canada (Fuller ), Australia (Chapman ), and some European (Astvad ) and Asian countries (Taj-Aldeen , Arastehfar , Kord ). Candida glabrata has reduced susceptibility to fluconazole (Healey & Perlin 2018, Arastehfar ) and can rapidly develop drug resistance during infection (Healey & Perlin 2018, Ksiezopolska & Gabaldón 2018, Arastehfar ). Indeed, the increasing incidence of candidemia due to C. glabrata in recent years has coincided with a notable increase in the number of fluconazole-resistant (FLZR) isolates in many countries (Hou , Astvad , Pfaller et al., 2019, Arastehfar , Won ). Patients infected with FLZR C. glabrata isolates had the highest mortality rate and experienced a shorter median survival days after diagnosis compared to those infected with fluconazole-susceptible-dose-dependent (FLZ-SDD) isolates (Won ). These observations are especially concerning for developing countries, where fluconazole is the frontline antifungal drug used to treat candidemia (Chakrabarti , Arastehfar , Kord , Megri ). However, the emerging echinocandin-resistant (ECR) and multidrug-resistant (MDR) C. glabrata isolates, which constitute > 30 % of the ECR isolates (Astvad , Won ), may also threaten the clinical efficacy of echinocandins, the frontline antifungal drugs recommended by international guidelines (Pappas et al., 2016). Together, these features make C. glabrata one of the most challenging fungal pathogens of the present time. Although C. glabrata is considered to be predominantly asexual, various types of genome analyses of C. glabrata clinical isolates revealed a high degree of genetic diversity both in terms of chromosome structure and sequence (Carreté , Xu ), and this diversity was estimated to be greater than that of C. albicans (Carreté , Gabaldón & Fairhead 2019). Understanding the genetic diversity of C. glabrata isolates from both clinical and biological standpoints is of paramount importance, as it will help to answer key questions regarding its epidemiology, aid in implementing appropriate infection control strategies by identifying the route of infection (Megri ), and also enables researchers to uncover the evolution of drug resistance of genetically-related isolates during the course of infection (Carreté ). Additionally, several studies have uncovered an association between genotype and mortality (Byun , Arastehfar ) and drug resistance (Won ). Studies dissecting the genetic structure of C. glabrata isolates have employed various tools, such as whole-genome sequencing (WGS) (Biswas , Carreté , 2019), polymorphic locus sequence typing (Katiyar , Katiyar & Edlind 2021), pulsed field gel electrophoresis (PFGE) (Lin ), amplified fragment length polymorphism analysis (Arastehfar ), multilocus microsatellite typing (MLMT) (Hou et al., 2018, Bordallo-Cardona et al., 2019, Arastehfar ), and multilocus sequence typing (MLST) (Lott , 2012, Hou , Hou et al., 2018, Byun , Bordallo-Cardona et al., 2019, Khalifa , Won ). Among these, WGS provides the most information and resolution, and short read based and nanopore long read based sequencing platforms are gradually becoming more accessible and affordable (Gabaldón 2019). However, the extensive chromosome structure diversity between clinical isolates makes it difficult to use standard tools for short read sequence analysis, which rely on mapping the reads to a reference genome with a common structure. Therefore, there is a need for additional high quality assembled C. glabrata genomes that can serve as references, facilitating WGS analysis of clinical isolates. Although WGS provides the most information and resolution, the still high costs and sophisticated analysis hinder its wide use in clinical settings, especially among medical mycologists (Gabaldón ). Therefore, most of the studies conducted so far have used MLMT and MLST as the most popular techniques to type clinical C. glabrata isolates (Gabaldón ). The MLST method is highly reproducible and provides data that are consistent with WGS (Carreté ). These data can be archived at “https://pubmlst.org/organisms/candida-glabrata” (Gabaldón ), making the worldwide, temporal, and nationwide analysis of clinical C. glabrata isolates possible. The currently used MLST scheme (Dodgson ) has identified 1 414 isolates of C. glabrata belonging to > 100 STs, which underscores its genetic diversity. This scheme is based on SNPs at six loci (FKS1, LEU2, NMT1, TRP1, UGP1, and URA3), together comprising > 3 000 bps. Of course, it is desirable to reduce the required number of loci to reduce the time and cost associated with MLST analysis, without losing resolutive power. In this study, we obtained high quality genome data based on PacBio and Illumina sequencing for seven additional clinical isolates of C. glabrata belonging to ST10, 15, and 16. Comparative analysis of the seven genomes identified a number of highly polymorphic loci (HPL) characterized by a high number of insertions/deletions (indels). These loci were enriched for the functional categories governing stress response pathways, particularly those involved in membrane and cell wall stresses. Interestingly, we found that two of these HPLs were highly resolutive for ST identity. Using an additional set of 53 clinical isolates showed that each of these loci in combination with TRP1 provided the same resolution as that generated by the traditional six-locus MLST protocol (Dodgson ). Therefore, we propose an improved MLST protocol for C. glabrata, which necessitates sequencing only two loci (totalling ≤1 Kbp) and offers a quicker and cheaper approach without the loss of resolution, and which can be used for high throughput typing studies of C. glabrata.

Materials and methods

Isolates and growth conditions

The 53 C. glabrata strains used in this study are listed in Table 1. The isolates originated from various geographical regions and had various antifungal susceptibility profiles. All isolates were grown on yeast peptone dextrose (YPD) agar and incubated at 37 °C for 24–48 h.
Table 1

List of clinical Candida glabrata isolates used in this study.

Isolate #Original #Sequence typeMinimum inhibitory concentration (μg/ml)
Susceptibility profilesComments
FluconazoleMicafunginAnidulafunginAMB
1CAS08-0293ST364441MDR
2CAS08-0725 (CMD00311)ST3640.521MDR
3CAS09-0869 (CMD00373)ST3640.521MDR
4BG2ST340.0150.1251S
5Qatar 36ST340.0150.1250.5S
6DPK 159ST340.0150.1251S
7DPL27 (5962)ST3424 or 21ECR
8ATCC 66032ST640.0150.060.5S
9CAS09-1225 (CGA00908)ST6> 64422MDR
10CAS09-1786 (CGA01258)ST664421MDR
11DPL155 (M234)ST610.2511ECR
12DPL157 (17351, CGC2)ST664441MDR
16CAS08-0089ST840.0150.1251S
17CAS08-494ST840.0150.1251S
18CAS08-528ST840.0150.1251S
19CAS11-3112 (CGA02083)ST8640.250.51MDR
20DPL209 (M2952)STX64441MDR
25CAS08-0205ST1040.0150.1250.5S
26CAS09-1083 (CGA00822)ST10640.510.5MDR
27CAS09-1437 (CGA01045)ST106421 or 0.51MDR
28DPK 305ST1040.0150.1251SWGS
29CAS08-0425 (CMD00008)ST1064441MDRWGS
35DPL1021 (ATCC 90030)ST1040.0150.1251SWGS
21Qatar 38ST1540.0150.251S
22Qatar 57ST1540.0150.1251S
23Qatar 71ST15> 640.060.251FLZR
24ATCC 2001 (CBS 138)ST1540.0150.1250.5S
30CAS08-0027ST15640.0150.1251FLZRWGS
31CAS08-485ST1540.0150.1251S
32CAS09-755ST1540.0150.1251S
33DPK 762ST1540.0150.1251SWGS
34DPL274 (3-CPH-W20800)ST151144ECR
36CAS08-0092STY20.0150.1251S
37CAS11-3129 (CMD01408STY64440.5MDR
38DPL245 (1611)STY16440.5ECRWGS
39DPL38 (42997)STY32120.25ECR
40CAS08-0016 (CGA00019)STY64211MDRWGS
41DPL217ST1720.510.5ECR
42DPL219ST1720.510.5ECR
44LB599-02ST4640.0150.1251S
45LB906-05ST46640.0150.1251FLZR
46Qatar 51ST4620.0150.1251S
48Qatar 59ST4620.0150.061S
50Qatar 62ST4620.0150.061S
53Qatar 69ST4640.030.251S
54Qatar 72ST46640.0150.1251FLZR
56Qatar 76ST4640.0150.1251S
47Qatar 53ST720.0150.061S
49Qatar 61ST740.0150.061S
51Qatar 63ST740.0150.061S
52Qatar 64ST720.0150.061S
55Qatar 73ST740.0150.1251S
57CAS08-0094ST76> 64241MDR

AMB: Amphotericin B, MDR: Multidrug-resistant, S: Susceptible, ECR: Echinocandin-resistant, FLZR: Fluconazole-resistant, WGS: Whole-genome sequenced.

List of clinical Candida glabrata isolates used in this study. AMB: Amphotericin B, MDR: Multidrug-resistant, S: Susceptible, ECR: Echinocandin-resistant, FLZR: Fluconazole-resistant, WGS: Whole-genome sequenced. To enrich the genomes available for comparative purposes and due to the limited number of high-quality genome data, we performed whole-genome PacBio and Illumina sequencing of seven isolates, DPK305, CAS08-0425, DPK762, DPL1021, CAS080027, DPL245, CAS08-0016 (Table 1), which belonged to three STs (10, 15, and 16) and had different susceptibility profiles.

Antifungal susceptibility testing (AFST)

Antifungal susceptibility testing (AFST) followed the CLSI-M60 (Clinical and Laboratory Standards Institute. 2017) protocol and included fluconazole (Pfizer, New York, NY, USA), amphotericin B (AMB) (Sigma-Aldrich, Milwaukee, WI, USA), micafungin (Astellas Pharma, Tokyo, Japan), and anidulafungin (Pfizer). Plates containing antifungal drug-C. glabrata cell suspensions were incubated at 37 °C for 24 h and minimum inhibitory concentrations (MICs) were visually recorded. Type strains of C. krusei (ATCC 6258) and C. parapsilosis (ATCC 22019) were used in each individual AFST experiment for quality control purposes. Susceptibility profiles were denoted based on the availability of the clinical breakpoints and epidemiological cut-off values (ECVs) as suggested (Pfaller & Diekema 2012). Briefly, isolates showing MIC ≥ 64 μg/ml, ≥ 0.5 μg/ml, and ≥ 0.25 μg/ml were regarded as fluconazole-resistant, anidulafungin-resistant, and micafungin-resistant, respectively (Pfaller & Diekema 2012). The MIC data of AMB was interpreted based on ECVs and isolates showing a MIC > 2μg/ml were regarded as non-wild-type (Pfaller & Diekema 2012).

DNA extraction

The DNA extraction method varied depending on the sequencing method used. DNA samples subjected to Illumina and Sanger sequencing were extracted using the Quick-DNA Fungal/Bacterial Miniprep Kit (ZymoResearch, Irvine, CA, USA) following the protocol suggested by manufacturer. DNA isolation for PacBio sequencing followed an old-fashioned DNA isolation protocol. Briefly, overnight C. glabrata cultures were harvested by centrifugation, the cells were disrupted by vortexing in lysis buffer (100 mM Tris pH 8.0; 50 mM EDTA; 1 % SDS), and the liquid phase was collected. This was followed by incubation with ammonium sulphate, followed by chloroform extraction, and isopropanol precipitation. Finally, the pellets were washed with 70 % ethanol, air-dried, and resuspended in RNase/DNase free water.

Illumina library preparation and WGS

Genomic DNA was quantified using the Qubit v. 2.0 Fluorometer (Life Technologies, Carlsbad, CA, USA). The DNA integrity was checked with ∼1 % agarose gel with 50–100 ng sample loaded in each well. Samples were then chosen for library preparation based on the QC results. NEBNext® Ultra™ II DNA Library Prep Kit for Illumina, clustering, and sequencing reagents were used throughout the process following the manufacturer’s recommendations. Briefly, the genomic DNA was fragmented by acoustic shearing with a Covaris S220 instrument. Fragmented DNA was cleaned up and end repaired. Adapters were ligated after adenylation of the 3’ ends followed by enrichment by limited cycle PCR. DNA libraries were validated using a DNA 1000 Chip on the Agilent 2100 Bioanalyzer (Agilent Technologies, Palo Alto, CA, USA), and were quantified using Qubit v. 2.0 Fluorometer. The DNA libraries were also quantified by real-time PCR (Applied Biosystems, Carlsbad, CA, USA), clustered on one lane of a flowcell, and loaded on the Illumina HiSeq instrument according to manufacturer’s instructions. The samples were sequenced using a 2× 150 paired-end (PE) configuration. Image analysis and base calling were conducted by the HiSeq Control Software (HCS) on the HiSeq instrument. Raw sequence data (.bcl files) generated from Illumina HiSeq were converted into fastq files and de-multiplexed using Illumina's bcl2fastq v. 2.17 software. One mismatch was allowed for index sequence identification.

PacBio library preparation and genome assembly

PacBio DNA sequencing of C. glabrata strains was performed at the Waksman Institute, Rutgers University. Library preparation closely followed the multiplexed microbial library protocol provided by Pacific Biosciences. Briefly, the DNA quality was evaluated using a Qubit fluorometer (Thermo Fisher Scientific, Waltham, MA, USA) and Nanodrop spectrophotometer (Thermo Fisher Scientific). Two micrograms genomic DNA was fragmented using Magaruptor (Diagenode, Glen Ridge, NJ, USA) with the 10 kbp fragments settings according to the manufacturer's instructions. This is followed by 0.45× AMPure PB Bead Purification and removal of Single-Strand Overhangs. Samples were then subjected to DNA Damage Repair, End Repair, A-tailing and ligation to prepare SMRTbell DNA template libraries. After another round of 0.45× PB Bead Purification libraries were pooled in equimolar ratio and size selected with BluePippin (Sage Science, Beverly, MA, USA). Library quality was analysed by Qubit, and average fragment size was estimated using an Agilent Fragment Analyzer (Agilent, Santa Clara, CA, USA). SMRT sequencing was performed using a Pacific Biosciences Sequel I sequencer (PacBio, Menlo Park, CA, USA) and standard protocols (MagBead Standard Seq v. 2 loading, 600 min movie) using SMRT Cell 1M v. 2. Single molecule real-time sequencing reads were demultiplexed using SMRT Analysis software suite v. 5.1 (Pacific Biosciences Inc., Menlo Park, CA, USA) and de novo assembled using the CANU v. 1.7 workflow (Koren ). Scaffolding was performed using the SSPACE long-read scaffolder (Boetzer & Pirovano 2014). Genome assembly was further improved for assembly contiguity using PBJelly (English ). The PBJelly running parameters were as follows: -minMatch 8 -sdpTupleSize 8 -minPctIdentity 75 -bestn 1 -nCandidates 10 -maxScore -500 -nproc 13 -noSplitSubreads. WGS data obtained from Illumina was used to correct the those generated by PacBio. SNP calling was performed by aligning reads from each strain to each of the other strains using BWA mem v. 0.7.17-r1188 (Li & Durbin 2009) and GATK v. 4.0.4.0 (DePristo ).

Annotation and gene prediction

Gene prediction was performed using a combination of methods. Reference sets were formed by the annotation of the reference genome of C. glabrata, and the collection of proteomes used in YGOB (Byrne & Wolfe 2005). Exonerate v. 2.4.0 (Slater & Birney 2005) was used to search for the presence of each protein in YGOB against each of the genomes. RATT (Otto ) (downloaded 2018) from the PAGIT package was used to transfer annotations from the C. glabrata reference genome to the strain genomes; it was run in three modes: strain, species and multi, and the best transference was kept. YGAP web server (Proux-Wéra ) (accessed October 2020) was then used to obtain a second gene prediction. MAKER2 v. 2.31.10 (Holt & Yandell 2011) was the third annotation program used. Results from RATT, YGAP, MAKER and exonerate were joined together into a single gene prediction using EVM (Haas ) (downloaded 2018). This annotation was then improved by searching for the presence of genes that were predicted in the C. glabrata reference but not included in the strain annotation. As C. glabrata strains tend to be highly syntenic, the pipeline first associated each predicted protein in the strain genome to the reference genome using a best reciprocal hit approach. Then, based on location, it tried to associate unmatched genes if enough similarity was found even if they were not best reciprocal hits. In the same way it corrected spurious matches that were not congruent with the gene order conservation. Once the list of missing genes was found, the pipeline scanned the RATT annotation for those genes. If found they were incorporated to the gene prediction, if not the pipeline located surrounding genes and then used GenomeThreader v. 1.7.1 (Gremme ) to search the intergenic space between those genes for the presence of the missing genes. In a last step, for genes still not found, the whole genome was scanned for their presence.

Identification of highly polymorphic loci (HPL) and primer design

Individual chromosomes of the seven clinical isolates (Table 1) were aligned with those of the reference strain CBS138/ATCC2001 (Xu ) and strain DSY562 (Vale-Silva ) using the Mauve alignment tool (DNASTAR, Madison, USA). Highly polymorphic loci (HPL) were defined based on the presence of indels that were different between strains of different STs. To avoid overlap with previously published sets of loci, HPL associated with satellite regions, megasatellites, minisatellites, and microsatellites, were excluded (Thierry ). Loci obtained were subjected to gene ontology (GO) enrichment using FungiFun (Priebe ), and those involved in cellular integrity pathways (osmotic stress tolerance, cell wall integrity and membrane integrity pathways, etc.) were selected for primer design. The primers were chosen to amplify the shortest possible regions without losing any indel information. All primer information is listed in Table 2.
Table 2

Oligos used in this study, their functions, and their PCR conditions.

Oligo nameFunctionOligo sequence (5'--3')
USA1-FPCR/SequencingCCTGGAGAAGATGTATGTGTT
USA1-R
PCR/Sequencing
TCTTCATGGTCGTGCTGAT
DUF1-FPCR/SequencingTATCAAGTGTGTGGTGCC
DUF1-R
PCR/Sequencing
CGTGTTCGACAGATTGTCC
SLG1-FPCR/SequencingGATGCTACTTATACTGGCGG
SLG1-R
PCR/Sequencing
TCAATCGGTTTCGTCTGG
HKR1-FPCR/SequencingGAGAAAGCTATTGCTTTTGGT
HKR1-R
PCR/Sequencing
TCAATAGATGATGGCTGCAC
H09053-FPCR/SequencingAATACGAAAGCCACGACG
H09053-R
PCR/Sequencing
ATATCTGGAATGCACTACCTG
A02255-FPCR/SequencingTGGATTCGAATGAGGGACT
A02255-R
PCR/Sequencing
CGAGTTCACACCGATTATCC
G00825-FexPCR/SequencingTCAAATGCTCCTCCTGGC
C00825-F1SequencingCTCTACAGGCGGCAAAA
G00825-R1SequencingGAATAATTAGAGTCGCCTCCG
25F-NSequencingTGG CAG AAA TAA ACG CCA G
G00825-Rex
PCR/Sequencing
CCAACATCAATTTCAGGAGC
SSR1-FPCR/SequencingGAGCTGGAACTCGATCCG
SSR1-R
PCR/Sequencing
AGGAAGGGGAGTAATGATGG
PIR2-FPCR/SequencingATGCAATACAAAAAGACTCTAGC
PIR2-R
PCR/Sequencing
TGGATCATTTGGTGCCTTAC
PIR1-FPCR/SequencingCTTCTTCCTCTGTCGCTAAG
PIR1-R
PCR/Sequencing
CAATTTGAGAAGCAGCGC
C00715-FPCR/SequencingAGCCTCTGTCCCTACTTTATC
C00715-R
PCR/Sequencing
GTCGCTGTTGGTGTGTAA
G03839-FPCR/SequencingCCCAATCCCTTTCTCTGCT
G03839-R
PCR/Sequencing
TATCCTCTTCTCATCGCTCG
G06281-FPCR/SequencingGATGTTATGGTCTAGCTTTGC
G06281-RPCR/SequencingCTGCCTATCTTAGATTGCTAGA

Note that except for G00825 and PIRI, the rest of the loci had the same PCR program (94 °C 5 mins, 35 cycles of (94 °C 30 s, 60 °C 30 s, 72 °C 40 s), 72 °C 8 mins). PCR programs for G00825 was 94 °C 5 mins, 35 cycles of (94 °C 30 s, 58 °C 30 s, 72 °C 2 mins), 72 °C 8 mins and for PIRI was 94 °C 5 mins, 35 cycles of (94 °C 30 s, 58 °C 30 s, 72 °C 40 s), 72 °C 8 mins.

Oligos used in this study, their functions, and their PCR conditions. Note that except for G00825 and PIRI, the rest of the loci had the same PCR program (94 °C 5 mins, 35 cycles of (94 °C 30 s, 60 °C 30 s, 72 °C 40 s), 72 °C 8 mins). PCR programs for G00825 was 94 °C 5 mins, 35 cycles of (94 °C 30 s, 58 °C 30 s, 72 °C 2 mins), 72 °C 8 mins and for PIRI was 94 °C 5 mins, 35 cycles of (94 °C 30 s, 58 °C 30 s, 72 °C 40 s), 72 °C 8 mins.

MLST schemes used and tree construction

The original MLST scheme developed by Dodgson was used as the gold standard (Dodgson ). Strain types (STs) were determined with the use of the https://pubmlst.org/about-us database. For each strain, a BLAST search was performed to determine the allele identity of each gene. Then the allelic combination was introduced in order to identify the STs. In one case, the allelic combination did not produce a known sequence type, and was named STX. The trees obtained from the MLST scheme based on HPL were categorised as follows: trees that were obtained based on the concatenation of all 13 HPL, those constructed based on each individual HPL, those generated based on combination of two most resolutive HPL, and trees constructed based on the combination of the most resolutive HPL in combination with each loci of the original MLST scheme (Dodgson ).

Robinson and Foulds comparison

ETE v. 3 was used to calculate the normalized Robinson and Foulds (RF) distances between trees (Huerta-Cepas ). To calculate this value trees were first rooted to one of the leaves so that all trees had potentially the same structure. The RF calculates the number of common partitions between two trees. Normalization was carried out by dividing this value over the total number of partitions found in both trees. A high RF indicates little overlap between the two trees, although a single leaf moving to a very different position in the tree can cause a marked increase in the RF metric as it will affect many partitions.

Clade comparison

We used ETE v. 3 to assess the monophyly of sequences from the same ST. This would indicate that the sequences used to reconstruct the tree have the capacity to correctly classify strains into the same clades as the MLST genes. In order to quantify the consistency between clades we calculated the precision, recall and F1 values of each ST in the following way: for each node in the tree, we established to which ST most of the sequences belong. Considering all OTUs (strains) contained within this partition, we considered as true positives (TP) the strains that belonged to the ST, and false positives (FP) those belonging to alternative STs, how many strains from the same ST appeared outside that partition, false negatives (FN) and how many leaves from the rest of the tree were from a different ST, true negatives (TN). For each node we calculated the precision, recall and F1 for the chosen ST. Then, for each ST we selected the node that offered the best F1. So for a ST that was found completely monophyletic we would obtain a F1 of 1.0. For each tree, we calculated the average precision, recall and F1 of all the ST groups.

Results

Genome variation among the isolates

The workflow of this study is shown in Fig. 1. Seven clinical isolates belonging to ST10 (28, 29, 35), 15 (30 and 33), and 16 (38 and 40) (Table 1), were processed for WGS by long-read (PacBio) and short-read (Illumina) methods. Their genomes were then assembled and compared to each other and to the reference strain CBS138/ATCC2001 (Xu ). Consistent with expectations, this analysis showed that 387 to 1 063 SNPs separated strains of the same ST, whereas strains of different STs were 57 476 to 75 811 SNPs apart.
Fig. 1

The workflow of finding hyperpolymorphic loci (HPL) in this study.

The workflow of finding hyperpolymorphic loci (HPL) in this study. We also analysed the genomes for large-scale chromosomal variations known to frequently characterize C. glabrata strains (Carreté ) (Fig. 2). Figure 2 shows the comparison between the seven newly sequenced genomes of C. glabrata and the CBS138 reference genome. Inversions, in red, were not very common. The three main inversions observed were a small inversion in chromosome C present in four of the seven strains (ATCC90030 (#35), DPK305 (#28), DPL245 (#38) and CAS08-0027 (#30)), a large (> 600 Kb) inversion in chromosome L in DPL245 (#38), and a smaller inversion in chromosome L in CAS08-0016 (#40) just adjacent to previous one. None of the inversions were specific to a given ST. For instance, DPL245 (#38) and CAS08-0016 (#40) belong to the same ST and they both have an inversion in chromosome L, but these inversions do not affect the same region. The inversion in chromosome C is shared by two isolates of clade ST10 (ATCC90030 (#35) and DPK305 (#28)), but not present in the third one (CAS08-0425 (#29)). However, this inversion is also present in DPK762 (#33) (ST15) but not in the other ST15 strains (CAS08-0027 (#30) and the reference strain CBS138). Translocations are more common, though they predominantly affect telomeric regions and therefore could be the result of either miss-assemblies or miss-alignments. Still, some of the translocations are consistent across isolates and had already previously been described (Carreté ). For instance, in four of the seven strains we see a translocation from chromosome L to chromosome D. This translocation is found in two of the three isolates of ST10 (ATCC90030 (35) and DPK305 (28) and in the two strains of ST16 (DPL245 (38) and CAS08-0016 (40)). A second translocation moved a piece of chromosome D to chromosome L, affecting two of the three strains of ST10 (ATCC90030 (35) and CAS08-0425 (29)). Three additional translocations affected DPL245 (38): pieces of chromosome D and chromosome L moved to chromosome I and a large part of chromosome I was found attached to chromosome L. Finally, pieces of chromosomes I and J were interchanged in the genome of CAS08-0027 (30). This translocation was not observed previously.
Fig. 2

Circos plots representing inversions (red) and translocations (blue) between the genomes of the C. glabrata reference strain (dark yellow on the right of each circle) and each individual strain (light yellow on the left of each circle). Chromosomes are ordered as mirror images of each other, starting with chromosome A at the top and ending with chromosome M at the bottom of the image.

Circos plots representing inversions (red) and translocations (blue) between the genomes of the C. glabrata reference strain (dark yellow on the right of each circle) and each individual strain (light yellow on the left of each circle). Chromosomes are ordered as mirror images of each other, starting with chromosome A at the top and ending with chromosome M at the bottom of the image. Although changes in genome structure were not consistent with ST assignments, ST16 is the one most distantly related to the reference (CBS 138), both in terms of chromosome structure and DNA sequence (Fig. 2).

HPL selection and GEO

To identify discriminative HPL in C. glabrata that may be resolutive for ST, we aligned individual chromosomes from the seven newly assembled genomes with those from reference strain CBS138 and strain DSY562, for which high quality WGS is available (Vale-Silva , Xu ). Highly polymorphic loci (HPL) were defined based on the presence of indels that were different between strains of different STs. To avoid overlap with previously published sets of loci, HPL associated with satellite regions, megasatellites, minisatellites, and microsatellites, were excluded (Thierry ). In total 33 HPL were identified (Table 3), which were subjected to GO analysis using FungiFun (Priebe ). Since the functions of most of the genes in C. glabrata are not characterized, their orthologs in Saccharomyces cerevisiae (https://www.yeastgenome.org) were used for GO analysis. These loci were found to be significantly enriched for stress response pathways, particularly those responding to cell surface (cell wall and plasma membrane) stress (Fig. 3), with 14 out of 33 loci containing genes involved in these processes. Our subsequent analyses focused on this subset of loci, with the exception of UBI4, which showed variation even among strains of the same ST and was therefore omitted, leaving 13 loci (Table 3).
Table 3

Polymorphic loci of interest identified through comparative whole-genome sequencing.

Polymorphic lociStandard namesSystematic names
CHS5/ CAGL0G00814gCHS5YLR330W
G00715/ CAGL0G00858g1IEA with MID2YLR332W
G00825/ CAGL0G00968g1VRP1YLR337C
SRP40/ CAGL0G02409gSRP40YKR092C
G03839/ CAGL0G03993g1N/ANot available
G06281/ CAGL0G06446g1N/ANot available
G07117/ CAGL0G07293g1USA1YML029W
SSR1/ CAGL0H06413g1CCW14YLR390W-A
H09053/ CAGL0H09152g1PTK1YKL198C
PIR2/ CAGL0I06182gSHP150YJL159W
I07645/ CAGL0I07821g1DUF1YOL087C
K01793/ CAGL0K01947gYER137CYER137C
K06501/ CAGL0K06655gNGR1YBR212W
K09845/ CAGL0K10010gPRP8YDR083W
PIR3/ CAGL0M08492g1PIR1YKL164C
M09053/ CAGL0M09086gBUD4YJR092W
M09075/ CAGL0M09108gJSN1YJR091C
GVI51_A02255/ CAGL0A02486g1SEB2YDR351W
A00825/ CAGL0A01001g1YLR326WYLR326W
A02431/ CAGL0A02651gEAF1YDR359C
C01617/ CAGL0C01859gRER1YCL001W
ARB1/ CAGL0C02343gARB1YER036C
C02321/ CAGL0C02541gBDF1YLR399C
POP2/ CAGL0C03399gPOP2YNR052C
ABP1/ CAGL0C03597gABP1YCR088W
D02871/ CAGL0D02926gBRE2YLR015W
D05049/ CAGL0D05082gUBI4YLL039C
E00341/ CAGL0E00561gTUP1YCR084C
SWI5/ CAGL0E01331gSWI5YDR146C
STP8/ CAGL0F01837gSPT8YLR055C
SLG11SLG1YOR008C
HKR1/ CGAL0F03003g1HKR1YDR420W
F03333/ CAGL0F03641gYML018CYML018C

These were the 13 polymorphic loci that were selected for MLST analysis in this study.

Fig. 3

The gene ontology enrichment to select the polymorphic loci (PL) of interest among a pool of candidate loci. Eleven loci involved in cellular integrity and stress response pathways and two loci lacking orthologs in Saccharomyces cerevisiae were selected for further analysis.

Polymorphic loci of interest identified through comparative whole-genome sequencing. These were the 13 polymorphic loci that were selected for MLST analysis in this study. Statistics for trees based on individual genes. MLST refers to the conventional MLST approach employing six loci to type C. glabrata and PLOI refers to the approach proposed in this study. The resolution of each approach and locus is evaluated by three factors, namely precision, recall, and F1. The higher the overall score, the more resolutive is the approach/locus. PLOI: Polymorphic loci of interest.

Comparison of concatenated original MLST and HPL trees

To identify the most resolutive HPL, we sequenced the 13 HPL (Table 3) as well as the six loci used in the traditional MLST scheme in 53 additional C. glabrata clinical isolates belonging to 13 STs (Table 1). We built a tree based on the concatenation of the six traditional MLST genes and mapped ST information on it in different colors (Fig. 4A). The same was done with the 13 HPL (Fig. 4B). The trees were rooted at strain 57 as it is the only member of ST76. The two trees were very different in terms of the Robinson and Foulds calculation (RF), which is 0.72, meaning that 72 % of the nodes defined in the tree were different. These differences could be due to either the re-arrangements within STs, re-arrangements among STs, or strains moving between STs.
Fig. 4

A. Tree resulting from the concatenation of six traditional MLST genes. Colours represent the ST group each strain belongs to. Bootstraps below 90 % are shown on the tree. B. Tree resulting from the concatenation of 13 hypervariable regions.

The gene ontology enrichment to select the polymorphic loci (PL) of interest among a pool of candidate loci. Eleven loci involved in cellular integrity and stress response pathways and two loci lacking orthologs in Saccharomyces cerevisiae were selected for further analysis. In order to compare the performance of the two means of generating phylogenetic trees (HPL vs traditional MLST loci), we computed the precision, recall and F1 for each ST and then calculated the average precision, recall and F1 for the whole tree. The MLST concatenated tree, which can serve as baseline, showed a precision of 1.0, a recall of 1.0 and a F1 of 1.0, whereas the concatenated of the HPL had a slightly lower precision 0.99, a recall of 0.1.0 and a F1 of 0.99. While the tree based on MLST data recovered all ST perfectly, the tree based on HPL only failed in retrieving ST03 as a monophyletic clade due to the presence of STX.

Comparison of single gene trees

We repeated the same comparison with trees built for each single gene in the analysis. As seen in Table 4, out of the six MLST genes, NMT1 was best able, on its own, to capture the variability of ST, though it did not perform as well as the concatenated assembly. Interestingly, two HPL were able to capture ST information better than NMT1: G00825 and SLG1. As seen in Fig. 5, NMT1 was unable to distinguish between ST10 and ST15, which is not surprising because the distinction between these two ST is based on the classification of LEU2 and URA3. It is also unable to separate STX from ST03, whose separation is again determined by a different gene (TRP1). In contrast, the two best HPL were able to separate ST10 from ST15 but could not separate STX from ST03 (Fig. 5).
Table 4

Statistics for trees based on individual genes. MLST refers to the conventional MLST approach employing six loci to type C. glabrata and PLOI refers to the approach proposed in this study. The resolution of each approach and locus is evaluated by three factors, namely precision, recall, and F1. The higher the overall score, the more resolutive is the approach/locus.

LociApproachPrecisionRecallF1
FKS1MLST0.960.870.88
LEU2MLST0.870.750.74
NMT1MLST0.930.930.93
TRP1MLST0.960.810.83
UGP1MLST0.790.850.77
URA3MLST0.920.780.79
A02255PLOI0.950.830.85
DUF1PLOI0.880.870.84
G00715PLOI0.830.910.86
G00825PLOI0.991.00.99
G03839PLOI0.920.820.82
G06281PLOI0.920.810.82
H09053PLOI0.920.810.83
HKR1PLOI0.970.900.91
PIR1PLOI0.900.860.85
PIR2PLOI0.940.880.90
SLG1PLOI0.991.00.99
SSR1PLOI0.960.950.95
USA1PLOI0.900.930.90

PLOI: Polymorphic loci of interest.

Fig. 5

Single gene trees belonging to A. NMT1, B. G00825 and C. SLG1. Clades are colored according to the ST group. In NMT1 ST10 and ST15 are colored in the same color as they are identical in all strains.

A. Tree resulting from the concatenation of six traditional MLST genes. Colours represent the ST group each strain belongs to. Bootstraps below 90 % are shown on the tree. B. Tree resulting from the concatenation of 13 hypervariable regions.

Comparison of concatenated gene trees using combinations of loci

STs are currently assigned based on the allelic information provided by six genes. As single genes are not able to completely recover all ST groups as monophyletic clades, we tried to find a combination of six or fewer genes that could perform better than the concatenated MLST tree. To do that, we generated concatenated alignments of combinations of 2, 3, 4, 5 and 6 HPL and checked them for the distribution of ST. We then calculated the average precision, recall and F1 (Table 5). We found that none of the combinations performed as the concatenated MLST tree. However, 21 different combinations of two HPL genes were able to recover the same ST distribution as found by the concatenation of HPL, showing that few genes have as much resolutive power as the 13 selected HPL (Table 6 and Supplementary Table 1).
Table 5

Summary statistics for each group of trees

StatisticsConcatenated tree of MLSTConcatenated tree of PLOIs1 gene2 genes3 genes4 genes5 genes6 genes
Average Precision--0.930.970.980.980.980.98
Average Recall--0.890.940.970.980.990.995
Average F1--0.890.950.970.980.980.99
Maximum F11.00.990.990.990.990.990.990.99

PLOI: Polymorphic loci of interest.

Table 6

Best combinations of two polymorphic loci.

Polymorphic loci combinationAlignment lengthPrecision
A02255;SLG11 5490.99
DUF1;G008252 1200.99
DUF1;SLG11 5730.99
G00825;G038392 2800.99
G00825;G062812 2290.99
G00825;HKR12 8740.99
G00825;SLG13 1650.99
G00825;SRR12 5190.99
G06281;SRR11 0360.99
H09053;HKR11 2520.99
H09053;SLG11 5430.99
HKR1;PIR22 0350.99
HKR1;SLG12 3270.99
HKR1;SRR11 6810.99
HKR1;USA11 5580.99
PIR2;SLG12 3260.99
PIR2;SRR11 6800.99
PIR2;USA11 5570.99
SLG1;SRR11 9720.99
SLG1;USA11 8490.99
SRR1;USA11 2030.99
Summary statistics for each group of trees PLOI: Polymorphic loci of interest. Best combinations of two polymorphic loci. Finally, we tested whether combining MLST genes with HPL would produce better results. Two combinations of TRP1 with a single HPL (either SLG1 or G00825) resulted in a tree which was as good as the concatenated tree based on the six MLST loci (Fig. 5, Fig. 6).
Fig. 6

Trees based on the concatenation of two genes. A.G00825 + TRP1. B.SLG1 + TRP1.

Single gene trees belonging to A. NMT1, B. G00825 and C. SLG1. Clades are colored according to the ST group. In NMT1 ST10 and ST15 are colored in the same color as they are identical in all strains. Trees based on the concatenation of two genes. A.G00825 + TRP1. B.SLG1 + TRP1.

Conclusions

Understanding the genetic diversity of C. glabrata isolates has broad clinical and biological implications ranging from aiding effective implementation of infection control strategies in hospitals to understanding within-host microevolution, which can help answer key questions regarding host-pathogen interaction and the development of antifungal resistance. In this study we used WGS and comparative genomic analyses to identify a new MLST scheme based not only on SNPs but also indels, and, using only two loci, offers the same resolution as that provided by the widely used six-locus MLST scheme. Using 53 clinical C. glabrata isolates originating from various geographical regions and belonging to different STs, we showed that the two-locus scheme was in 100 % agreement with the traditional MLST approach. Also, we note that except for isolate CAS08-0293 for which the ST could not be distinguished with HPL, the ST of the rest of the isolates were successfully determined when testing only SLG1 or G00825. In summary, the newly described MLST scheme can be used as a reliable approach for high throughput typing purposes in clinical settings and large-scale studies aiming to understand the genetic diversity of C. glabrata globally. Additionally, the high-quality whole genomes of seven clinical strains reported in this study can serve as references for future short read based WGS analyses of C. glabrata, helping the field develop new diagnostic tools and address fundamental biological questions.

Data availability

The project has been deposited in GenBank under the Bioproject number PRJNA718446. The raw read data are deposited under accession number SRR8146337. Read data and genome assemblies can be found under SRA codes SRR14180810 to SRR14180816 for the Illumina reads and codes SRR14163270 to SRR14163276 for PacBio reads. MLST and PL sequences were deposited in GenBank under codes MW970414 to MW971421. Accession numbers associated with the isolates whole-genome sequenced were as follows, JAGTUA000000000 (CAS08-0425), JAGTTZ000000000 (CAS80027), JAGTTY000000000 (DPK762), JAGTTX000000000 (ATCC 90030), JAGTTW000000000 (DPL245), JAGTTV000000000 (CAS08-0016), JAGTTU000000000 (DPK 305).
  58 in total

Review 1.  Molecular Typing of Candida glabrata.

Authors:  Toni Gabaldón; Emilia Gómez-Molero; Oliver Bader
Journal:  Mycopathologia       Date:  2019-10-15       Impact factor: 2.574

2.  Incidence, characteristics and outcome of ICU-acquired candidemia in India.

Authors:  Arunaloke Chakrabarti; Prashant Sood; Shivaprakash M Rudramurthy; Sharon Chen; Harsimran Kaur; Malini Capoor; Deepinder Chhina; Ratna Rao; Vandana Kalwaje Eshwara; Immaculata Xess; Anupama J Kindo; P Umabala; Jayanthi Savio; Atul Patel; Ujjwayini Ray; Sangeetha Mohan; Ranganathan Iyer; Jagdish Chander; Anita Arora; Raman Sardana; Indranil Roy; B Appalaraju; Ajanta Sharma; Anjali Shetty; Neelam Khanna; Rungmei Marak; Sanjay Biswas; Shukla Das; B N Harish; Sangeeta Joshi; Deepak Mendiratta
Journal:  Intensive Care Med       Date:  2014-12-16       Impact factor: 17.440

3.  Profiling of PDR1 and MSH2 in Candida glabrata Bloodstream Isolates from a Multicenter Study in China.

Authors:  Xin Hou; Meng Xiao; He Wang; Shu-Ying Yu; Ge Zhang; Yanan Zhao; Ying-Chun Xu
Journal:  Antimicrob Agents Chemother       Date:  2018-05-25       Impact factor: 5.191

4.  FungiFun2: a comprehensive online resource for systematic analysis of gene lists from fungal species.

Authors:  Steffen Priebe; Christian Kreisel; Fabian Horn; Reinhard Guthke; Jörg Linde
Journal:  Bioinformatics       Date:  2014-10-07       Impact factor: 6.937

5.  Comparative Genomics of Two Sequential Candida glabrata Clinical Isolates.

Authors:  Luis Vale-Silva; Emmanuel Beaudoing; Van Du T Tran; Dominique Sanglard
Journal:  G3 (Bethesda)       Date:  2017-08-07       Impact factor: 3.154

6.  Multilocus Sequence Typing (MLST) Genotypes of Candida glabrata Bloodstream Isolates in Korea: Association With Antifungal Resistance, Mutations in Mismatch Repair Gene (Msh2), and Clinical Outcomes.

Authors:  Seung A Byun; Eun Jeong Won; Mi-Na Kim; Wee Gyo Lee; Kyungwon Lee; Hye Soo Lee; Young Uh; Kelley R Healey; David S Perlin; Min Ji Choi; Soo Hyun Kim; Jong Hee Shin
Journal:  Front Microbiol       Date:  2018-07-13       Impact factor: 5.640

7.  Low Level of Antifungal Resistance in Iranian Isolates of Candida glabrata Recovered from Blood Samples in a Multicenter Study from 2015 to 2018 and Potential Prognostic Values of Genotyping and Sequencing of PDR1.

Authors:  Amir Arastehfar; Farnaz Daneshnia; Kamiar Zomorodian; Mohammad Javad Najafzadeh; Sadegh Khodavaisy; Hossein Zarrinfar; Ferry Hagen; Zahra Zare Shahrabadi; Michaela Lackner; Hossein Mirhendi; Mohammadreza Salehi; Maryam Roudbary; Weihua Pan; Markus Kostrzewa; Teun Boekhout
Journal:  Antimicrob Agents Chemother       Date:  2019-06-24       Impact factor: 5.191

8.  Genome Comparisons of Candida glabrata Serial Clinical Isolates Reveal Patterns of Genetic Variation in Infecting Clonal Populations.

Authors:  Laia Carreté; Ewa Ksiezopolska; Emilia Gómez-Molero; Adela Angoulvant; Oliver Bader; Cécile Fairhead; Toni Gabaldón
Journal:  Front Microbiol       Date:  2019-02-12       Impact factor: 5.640

9.  Epidemiology of yeast species causing bloodstream infection in Tehran, Iran (2015-2017); superiority of 21-plex PCR over the Vitek 2 system for yeast identification.

Authors:  Mohammad Kord; Mohammadreza Salehi; Sadegh Khodavaisy; Seyed Jamal Hashemi; Roshanak Daie Ghazvini; Sassan Rezaei; Ayda Maleki; Ahmad Elmimoghaddam; Neda Alijani; Alireza Abdollahi; Mahsa Doomanlou; Kazem Ahmadikia; Niloofar Rashidi; Weihua Pan; Teun Boekhout; Amir Arastehfar
Journal:  J Med Microbiol       Date:  2020-04-28       Impact factor: 2.472

10.  Fluconazole-Resistant Candida glabrata Bloodstream Isolates, South Korea, 2008-2018.

Authors:  Eun Jeong Won; Min Ji Choi; Mi-Na Kim; Dongeun Yong; Wee Gyo Lee; Young Uh; Taek Soo Kim; Seung Ah Byeon; Seung Yeob Lee; Soo Hyun Kim; Jong Hee Shin
Journal:  Emerg Infect Dis       Date:  2021-03       Impact factor: 6.883

View more
  1 in total

1.  Chromosome-level assemblies from diverse clades reveal limited structural and gene content variation in the genome of Candida glabrata.

Authors:  Marina Marcet-Houben; María Alvarado; Ewa Ksiezopolska; Ester Saus; Piet W J de Groot; Toni Gabaldón
Journal:  BMC Biol       Date:  2022-10-08       Impact factor: 7.364

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.