| Literature DB >> 34970511 |
Soon-Hwan Oh1, Klaus Schliep2, Allyson Isenhower3, Rubi Rodriguez-Bobadilla3, Vien M Vuong4, Christopher J Fields5, Alvaro G Hernandez5, Lois L Hoyer1.
Abstract
The Candida albicans agglutinin-like sequence (ALS) family is studied because of its contribution to cell adhesion, fungal colonization, and polymicrobial biofilm formation. The goal of this work was to derive an accurate census and sequence for ALS genes in pathogenic yeasts and other closely related species, while probing the boundaries of the ALS family within the Order Saccharomycetales. Bioinformatic methods were combined with laboratory experimentation to characterize 47 novel ALS loci from 8 fungal species. AlphaFold predictions suggested the presence of a conserved N-terminal adhesive domain (NT-Als) structure in all Als proteins reported to date, as well as in S. cerevisiae alpha-agglutinin (Sag1). Lodderomyces elongisporus, Meyerozyma guilliermondii, and Scheffersomyces stipitis were notable because each species had genes with C. albicans ALS features, as well as at least one that encoded a Sag1-like protein. Detection of recombination events between the ALS family and gene families encoding other cell-surface proteins such as Iff/Hyr and Flo suggest widespread domain swapping with the potential to create cell-surface diversity among yeast species. Results from the analysis also revealed subtelomeric ALS genes, ALS pseudogenes, and the potential for yeast species to secrete their own soluble adhesion inhibitors. Information presented here supports the inclusion of SAG1 in the ALS family and yields many experimental hypotheses to pursue to further reveal the nature of the ALS family.Entities:
Keywords: ALS genes; adhesion; comparative genomics; fungi; protein structure; repeated sequences
Mesh:
Substances:
Year: 2021 PMID: 34970511 PMCID: PMC8712946 DOI: 10.3389/fcimb.2021.794529
Source DB: PubMed Journal: Front Cell Infect Microbiol ISSN: 2235-2988 Impact factor: 6.073
Principal genome sequences used for ALS gene detection.
| Species | Strain | Assembly Name | GenBank Assembly Accession | Reference |
|---|---|---|---|---|
|
| SC5314 | ASM18296v3 | GCA_000182965.3 | ( |
|
| CD36 | ASM2694v1 | GCA_000026945.1 | ( |
|
| CDC317 | ASM18276v2 | GCF_000182765.2 | ( |
|
| Co 90-125 | ASM31587v1 | GCA_000315875.1 | ( |
| ASM433491v1 | GCA_004334915.1 | ( | ||
|
| ATCC 96143 | UIUC_Cmeta_2.0 | GCA_008904905.1 | ( |
|
| MYA-3404 | ASM633v3 | GCA_000006335.3 | ( |
| ASM694213v1 | GCA_006942135.1 | ( | ||
| ASM1731540v1 | GCA_017315405.1 | ( | ||
|
| ATCC 6260 | ASM14942v1 | GCA_000149425.1 | ( |
| ASM694215v1 | GCA_006942155.1 | This work | ||
|
| ATCC 42720 | ASM383v1 | GCA_000003835.1 | ( |
|
| CBS 6054 | ASM20916v1 | GCA_000209165.1 | ( |
| ASM694211v1 | GCA_006942115.1 | This work | ||
|
| CBS 767 | ASM644v2 | GCA_000006445.2 | ( |
|
| NRRL YB-4239 | ASM14968v1 | GCA_000149685.1 | ( |
| ASM1362098v1 | GCA_013620985.1 | This work | ||
|
| ATCC 10573 | Candida tenuis v1.0 | GCA_000223465.1 | ( |
|
| NRRL Y-27907 | S passalidarum v2.0 | GCA_000223485.1 | ( |
| ASM1362096v1 | GCA_013620965.1 | This work | ||
|
| ATCC 6258 | ASM305444v1 | GCA_003054445.1 | ( |
|
| CBS138 | ASM254v2 | GCA_000002545.2 | ( |
| BG2 | ASM1421772v1 | GCA_010111755.1 | ( | |
|
| S288C | R64 | GCA_000146045.2 | ( |
|
| B8441 | Cand_auris_B8441_V2 | GCA_002759435.2 | ( |
Figure 1Phylogenetic tree showing relationships between fungal species used in this study. The tree was pruned from the genome-scale phylogeny of the kingdom Fungi developed by Li et al. (2021). The phylogeny was based on 290 concatenated sequences in 1644 species. C. metapsilosis was not part of the original analysis and was added to the tree based on its close relationship with C. parapsilosis and C. orthopsilosis (Tavanti et al., 2005). All species listed were Phylum Ascomycota, Subphylum Saccharomycotina, Class Saccharomycetes, Order Saccharomycetales. Vertical bars on the right of the image indicate Family designations according to the NCBI Taxonomy Database (https://www.ncbi.nlm.nih.gov/taxonomy). Brown = Family Saccharomycetaceae, Green = Family Pichiaceae, Purple = Family Debaryomycetaceae, and Blue = Family Metschnikowiaceae.
Figure 2Schematic of the genome region that included LeALS2716 and LeALS2721 from (A) assembly ASM14968v1 and (B) assembly ASM1362098v1. (C) shows the region as represented in the Candida Genome Order Browser (CGOB; https://cgob.ucd.ie; Maguire et al., 2013) based on ASM14968v1 data. L. elongisporus information is circled in red; ORF numbers are shown in each rectangle and the direction of transcription indicated by the arrow below. ORFs 02718, 02719, and 02720 featured the IFF/HYR repeated sequences that were also found in LeALS2716 suggesting that ORF 02716 was longer than initially annotated. The large number of repeated sequences complicated genome sequence assembly in this region. The predicted size of LeALS2716 in ASM1362098v1 was greater than the final PCR-amplified/Sanger-sequenced fragment that was deposited into GenBank (accession number MN893370; green arrow).
Physical location of ALS loci in the S. passalidarum NRRL Y-27907 genome.
| Scaffold (Size, Mb) | Gene Name | Approx. Location (nt)* | Tsc† | Scaffold (Size, Mb) | Gene Name | Approx. Location (nt) | Tsc† |
|---|---|---|---|---|---|---|---|
| 1 (2.64) |
| 9400 | R‡ | 4 (1.81) |
| 69000 | R |
|
|
|
|
| 180000 | R | ||
|
|
|
|
|
|
| ||
|
| 1607000 | F |
|
|
| ||
|
| 1122200 | F | |||||
| 2 (2.12) |
| 189000 | R | ||||
|
| 254000 | F | 5 (1.65) |
| 1011000 | R | |
|
| 337300 | F |
| 1300300 | R | ||
|
| 827000 | R |
| 1306000 | R | ||
|
| 835000 | R |
| 1312800 | R | ||
|
| 940000 | R | |||||
| 6 (1.18) |
| 710700 | F | ||||
| 3 (2.07) |
| 8400 | R‡ | ||||
|
| 486000 | R | 8 (0.85) |
|
|
| |
|
| 858000 | F |
|
|
| ||
|
| 1015000 | F | |||||
|
| 1021200 | R | |||||
|
|
|
| |||||
|
|
|
|
*Approximate location of each ORF in the Spathaspora passalidarum v2.0 assembly (GCA_000223485.1). Pairs of contiguous ORFs are indicated in bold type.
†Direction of transcription is noted as F (forward) or R (reverse).
‡Subtelomeric localization.
Figure 3Experimental and AlphaFold-predicted protein structures. (A) Crystallographic structure of C. albicans NT-Als3 (Protein Data Bank accession 4LE8) visualized using PyMOL. (B) C. albicans NT-Als3 structure predicted by AlphaFold from the 4LE8 amino acid sequence. AlphaFold structural predictions for the corresponding region of C. albicans NT-Als1 (C; 83% identical to NT-Als3), ClAls3274 (D; 33% identical to NT-Als3), ScSag1 (E; 25% identity), CauAls4498 (F; 31% identity), and CAGL0G04125g (G; 21% identity). An AlphaFold structural prediction was also completed for the N-terminal functional domain of C. albicans Hyr1 (H) and S. cerevisiae Flo1 (I) to demonstrate structural diversity among cell-surface proteins that contain a central domain of repeated sequences. The AlphaFold prediction for C. albicans NT-Als3 (B) recapitulated the known experimental structure (A; RMSD = 0.53 as calculated using PyMOL align). Predictions for molecules (B–G) produced the same general structure suggesting that all should be included in the Als protein family. shows the structures of C. albicans NT-Als3 (B) and ScSag1 (E) aligned with the disulfide bonds highlighted.
Figure 4Visualization of NT-Als sequence features from . Each spot represents an Als protein; color coding matches . C. albicans NT-Als3 has 8 Cys that create four disulfide bonds (Lin et al., 2014) while S. cerevisiae Sag1 has only 6 and is missing the C57-C133 disulfide bond that is present in NT-Als3 (Salgado et al., 2011). Most NT-Als proteins had 8 Cys like NT-Als3 (center column), some had 6 Cys like ScSag1 (left column), and others had varying numbers of Cys (range of 4 to 14; right column). Presence of the amyloid-forming region (AFR) in C. albicans Als proteins promotes protein aggregation (Ho et al., 2019). While many Als proteins had the expected AFR (strength and location; top row), others had a predicted weak AFR (second row) or none (bottom row). Some S. passalidarum Als proteins had a strong AFR 20-30 amino acids C-terminal to the expected location known in C. albicans (third row). It was unknown whether this alternative location contributed to aggregative potential. An invariant Lys in C. albicans NT-Als establishes a salt bridge with the C-terminal carboxylic acid of an incoming peptide ligand (Salgado et al., 2011). Arg (R) in this location may serve a similar function. Some predicted proteins did not have a positively charged amino acid in this position (X). A red asterisk indicates proteins featured in the structural predictions ().
Figure 5Relationships between NT-Als sequences depicted in a tree format. Information from (number of Cys in NT-Als, presence of the invariant Lys, nature of the AFR) is included using symbols at the left of the tree. Yellow hexagons on the right of the branches depict ortholog group information from . Colored dots indicating species follow the color scheme from and . The scale bar represents substitutions per site.
Orthologous ALS loci reported on Candida Gene Order Browser (CGOB).
| Reference Gene | Reported Ortholog | Group* | Reference Gene | Reported Ortholog | Group* |
|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
| ||||
|
| |||||
|
|
|
|
| ||
|
|
| ||||
|
| |||||
|
|
|
| |||
|
|
|
|
| ||
|
|
| ||||
|
|
|
|
|
|
|
|
|
| ||||
|
| |||||
|
|
|
|
| ||
|
|
| ||||
|
|
|
|
|
|
|
|
|
| ||||
|
| |||||
|
|
|
|
| ||
|
| |||||
|
| |||||
*Group designations were noted within yellow hexagons in .
†For clarity, gene names from were used here instead of the ORF designations on CGOB. SsALS2386 was called PICST_4539 on CGOB; CmALS4220 was called CMET_1838 on CGOB, and CmALS800 was called CMET_3970.
‡PICST_31095 belonged to the IFF/HYR family instead of the ALS family. LELG_04272 lacked an NT-Als domain but had tandem repeats and C-terminal sequence features similar to ALS genes. ScFLO1 belongs to a family of flocculins (Goossens et al., 2011). Although flocculins also have a central domain of repeated sequences, the structure of the mannose-binding N-terminal domain is clearly different from NT-Als ().
§Although not ALS genes, CaFGR23 (orf19.1616), Cd82280, and Ctr2409 were also in this ortholog pillar.
ALS gene representation within the Order Saccharomycetales*.
| Family | Genus/species | # |
|---|---|---|
| 8 | ||
| 7 | ||
| 13 | ||
| 5 | ||
| 3 | ||
| 4 | ||
| 5 | ||
| 1 | ||
| 4 | ||
| 3 | ||
| 29 | ||
| 1 | ||
| 1 | ||
| 3 | ||
| 1 | ||
| 1 | ||
| 0 | ||
|
|
| |
| 9 | 0 | |
| 6 | 0 | |
| 32 | 0 | |
| 26 | 0 | |
| 1 | 0 | |
| 2 | 0 | |
| 2 | 0 | |
| 60 | 0 | |
| 9 | 1 | |
| 39 | 14 | |
| 21 | 0 | |
| 37 | 4 |
*Family designations were from lifeman-ncbi-univ-lyon1.fr, as well as the work of Shen et al. (2018) and Li et al. (2021). Taxid numbers were from the NCBI Taxonomy Database (https://www.ncbi.nlm.nih/gov/taxonomy). Numbers of ALS loci (right column; top half of table) were taken from the genome sequences examined in this study ().
Figure 6Phylogenetic tree of Families in the Order Saccharomycetales. The tree was traced from the data of Li et al. (2021) to highlight the region of interest for the current study. Family names were those used by Li et al. (2021). placed these names into the context of Family designations used on the NCBI Taxonomy Database (Schoch et al., 2020). The red dot denotes the common ancestor of the CUG-Ser1 clade and Saccharomycetaceae, the two groups in which ALS genes were most common (indicated by larger font). ALS genes were also detected in the Phaffomycetaceae.