| Literature DB >> 26396240 |
Sara Amorim-Vaz1, Van Du T Tran2, Sylvain Pradervand3, Marco Pagni2, Alix T Coste1, Dominique Sanglard4.
Abstract
UNLABELLED: In vivo transcriptional analyses of microbial pathogens are often hampered by low proportions of pathogen biomass in host organs, hindering the coverage of full pathogen transcriptome. We aimed to address the transcriptome profiles of Candida albicans, the most prevalent fungal pathogen in systemically infected immunocompromised patients, during systemic infection in different hosts. We developed a strategy for high-resolution quantitative analysis of the C. albicans transcriptome directly from early and late stages of systemic infection in two different host models, mouse and the insect Galleria mellonella. Our results show that transcriptome sequencing (RNA-seq) libraries were enriched for fungal transcripts up to 1,600-fold using biotinylated bait probes to capture C. albicans sequences. This enrichment biased the read counts of only ~3% of the genes, which can be identified and removed based on a priori criteria. This allowed an unprecedented resolution of C. albicans transcriptome in vivo, with detection of over 86% of its genes. The transcriptional response of the fungus was surprisingly similar during infection of the two hosts and at the two time points, although some host- and time point-specific genes could be identified. Genes that were highly induced during infection were involved, for instance, in stress response, adhesion, iron acquisition, and biofilm formation. Of the in vivo-regulated genes, 10% are still of unknown function, and their future study will be of great interest. The fungal RNA enrichment procedure used here will help a better characterization of the C. albicans response in infected hosts and may be applied to other microbial pathogens. IMPORTANCE: Understanding the mechanisms utilized by pathogens to infect and cause disease in their hosts is crucial for rational drug development. Transcriptomic studies may help investigations of these mechanisms by determining which genes are expressed specifically during infection. This task has been difficult so far, since the proportion of microbial biomass in infected tissues is often extremely low, thus limiting the depth of sequencing and comprehensive transcriptome analysis. Here, we adapted a technology to capture and enrich C. albicans RNA, which was next used for deep RNA sequencing directly from infected tissues from two different host organisms. The high-resolution transcriptome revealed a large number of genes that were so far unknown to participate in infection, which will likely constitute a focus of study in the future. More importantly, this method may be adapted to perform transcript profiling of any other microbes during host infection or colonization.Entities:
Mesh:
Year: 2015 PMID: 26396240 PMCID: PMC4600103 DOI: 10.1128/mBio.00942-15
Source DB: PubMed Journal: MBio Impact factor: 7.867
Enrichment of C. albicans RNA using the SureSelect procedure
| Mouse, 16 h p.i. (K23) | Mouse, 48 h p.i. (K29) | |||||||
|---|---|---|---|---|---|---|---|---|
| Method | Nonenriched | Enriched | Nonenriched | Enriched | Nonenriched | Enriched | Nonenriched | Enriched |
| Total reads | 189M | 24M | 172M | 191M | 183M | 38M | 169M | 31M |
| No. aligned to | 66K | 14M | 93K | 121M | 72K | 15M | 169K | 21M |
| % aligned reads | 0.03 | 58 | 0.05 | 63 | 0.04 | 39 | 0.1 | 69 |
| Fold enrichment | 1670 | 1172 | 1003 | 677 | ||||
M, million; K, thousand.
FIG 1 Correlation between enriched and nonenriched mRNA read counts of C. albicans genes. The enriched reads were recovered from a sample of host mRNA spiked with 1% of the same C. albicans mRNA. Scatter plots of log2 normalized RNA-seq counts and scatter plots of log2 fold change (log2FC) are shown. Green indicates valid genes and red indicates genes rejected via our classification with selected features. ra, Pearson correlation for all genes; rv, Pearson correlation for valid genes.
Binary features
| Property | Threshold values | No. of features |
|---|---|---|
| No. of probes per gene | >1, 2, 3, 4, 5 | 5 |
| % GC | >5, 10, 15, . . ., 95 | 57 |
| % low-complexity sequence | >5, 10, 15, . . ., 95 | 57 |
| RNA-folding energy | >−40, −35,−30, . . ., −5 | 24 |
| Redundancy | >1, 2, 3, 4, 5 | 15 |
The 158 binary features associated with every gene and that can be computed from the bait probe sequences and locations.
Minimum, maximum, or average per gene.
Feature combinations selected by the different feature selection methods
| Method | Selected features |
|---|---|
| Best first | #probe > 1, #probe > 3, avg(%GC) > 5, avg(%GC) > 10, max(%GC) > 25, min(%GC) > 10 |
| Greedy stepwise | #probe > 1, avg(%GC) > 5, max(%GC) > 25 |
| Linear forward selection | #probe > 1, #probe > 3, avg(%GC) > 5, avg(%GC) > 15, max(%GC) > 25, min(%GC) > 10 |
| Scatter search | #probe > 1, avg(%GC) > 5, max(%GC) > 25 |
| Subset size forward selection | #probe > 1, avg(%GC) > 5, max(%GC) > 25 |
Gene classification based on the three selected features
| No. of | No. of | Avg % | Max % | Class |
|---|---|---|---|---|
| 5806 | 1 | 1 | 1 | Valid |
| 8 | 1 | 1 | 0 | Rejected |
| 289 | 0 | 1 | 1 | Rejected |
| 18 | 0 | 1 | 0 | Rejected |
| 347 | 0 | 0 | 0 | Rejected |
No gene had the combination 101, 100, or 001 (impossible combination), which would have been classed as “acceptable.”
FIG 2 Hierarchical clustering (a) and principal component analysis (b) of in vivo and in vitro samples used to characterize C. albicans responses in the two host models. Clustering and PCA were performed using Voom-transformed and normalized gene counts. The 5,365 C. albicans genes with at least 1 count per million in at least one sample were used for clustering and PCA. Additionally, clustering was also performed on the 1,000 genes with the highest variance across all 10 samples. Genes that did not meet the enrichment quality criteria were excluded. Gm, G. mellonella; Mm, Mus musculus.
FIG 3 Correlations between log2 fold changes (logFC). For each in vivo condition, its log fold change versus the in vitro condition was computed. (a and b) Early versus late responses in G. mellonella and mouse. Brown dots indicate significant genes (false discovery rate [FDR] < 5%) with a difference between early and late fold changes larger than 2 (see Materials and Methods for statistical analysis). (c and d) G. mellonella versus mouse responses for early and late time points. Brown dots are genes significant (FDR < 5%) in both hosts. For all plots, the identity line is indicated in black. Red lines show logFC differences of −5-, −2-, 2- and 5-fold. r, Pearson correlation coefficients with confidence interval.
C. albicans genes most upregulated during systemic infection
| orf19 name | Gene name | Description | Fold change compared to | |||
|---|---|---|---|---|---|---|
| Gm, 2 h | Gm, 24 h | Mm, 16 h | Mm, 48 h | |||
| orf19.7455 | orf19.7455 | Ortholog of | 11.38 | 11.24 | 11.80 | 11.60 |
| orf19.1321 | Hyphal cell wall protein | 10.46 | 9.89 | 10.63 | 11.54 | |
| orf19.6028 | Hypha-specific G1 cyclin-related protein involved in | 9.78 | 9.55 | 10.39 | 11.05 | |
| orf19.3374 | Hypha-specific protein | 8.73 | 9.26 | 9.40 | 11.04 | |
| orf19.1363 | orf19.1363 | Putative protein of unknown function | 7.85 | 7.38 | 8.53 | 10.45 |
| orf19.5636 | GPI-linked cell wall protein | 6.80 | 9.50 | 8.43 | 9.85 | |
| orf19.2060 | Cu and Zn-containing superoxide dismutase | 8.55 | 6.97 | 8.43 | 9.59 | |
| orf19.7094 | Glucose, fructose, mannose transporter | 13.09 | 12.62 | 8.41 | 12.76 | |
| orf19.1816 | Cell wall adhesin | 8.02 | 4.20 | 8.31 | 8.59 | |
| orf19.5585 | Secreted aspartyl proteinase | 8.22 | 8.57 | 8.25 | 10.22 | |
| orf19.1822 | Zn(II)2Cys6 transcription factor | 7.86 | 7.86 | 8.11 | 8.87 | |
| orf19.2457 | orf19.2457 | Protein of unknown function | 9.59 | 8.99 | 7.91 | 8.41 |
| orf19.5952 | orf19.5952 | Protein of unknown function | 3.11 | 8.13 | 7.84 | 8.73 |
| orf19.2062 | Cu and Zn-containing superoxide dismutase | 7.50 | 9.64 | 7.71 | 6.64 | |
| orf19.2061 | orf19.2061 | Ortholog of | 10.23 | 6.52 | 7.56 | 9.61 |
| orf19.4599 | Putative phosphate permease | 6.87 | 7.54 | 7.21 | 7.20 | |
| orf19.1264 | Oxidoreductase; iron utilization | 5.46 | 8.23 | 7.08 | 6.86 | |
| orf19.1930 | Ferric reductase | 6.92 | 12.92 | 6.96 | 9.38 | |
| orf19.113 | Possible oxidoreductase | 7.88 | 7.90 | 6.82 | 8.31 | |
| orf19.6148 | orf19.6148 | Homolog of nuclear distribution factor NudE, NUDEL | 4.84 | 6.29 | 6.69 | 8.66 |
Mm, M. musculus; Gm, G. mellonella.
FIG 4 Identification of C. albicans genes differentially expressed in vivo versus in vitro using a meta-analytical approach. (a) Meta-analysis strategy. Limma contrast statistics were converted to z scores (see Materials and Methods). z scores were then combined meta-analytically as illustrated. (b) Scatter plot of mouse and Galleria z scores obtained meta-analytically. Mouse and Galleria z scores are further combined into one in vivo z score. The 1,169 genes for which this combined z score is significant (Bonferroni P value ≤ 0.05) are indicated in brown. Genes for which the combined z score is not significant are indicated in blue if the mouse z score is significant or in green if the G. mellonella z score is significant. TLO1 is the only gene with significant and anti-correlated z scores in mouse and G. mellonella. r, Pearson correlation coefficient with confidence interval. (c) Heat map of the 1,169 significant genes by meta-analysis. For each sample, a log fold change versus the average in vitro expression was computed. The log fold change values were variance scaled and are represented on the heat map. Hierarchical clustering tree of the samples is indicated at the top. Gm, G. mellonella; Mm, Mus musculus.
FIG 5 GSEA of C. albicans genes regulated in vivo. The gene list was produced from data in File S2 in the supplemental material (“meta-analysis”, “CandidaL_exp_dataL2.gmt”), in which genes with P values of ≤0.05 (in vivo) were chosen. The genes were ranked according to their z scores. The list was then imported into the GSEA software. Analysis parameters were as follows: norm, meandiv; scoring_scheme, weighted; set_min, 15; nperm, 1000; set_max, 500. GSEA results were uploaded into Cytoscape 3.0 with the following parameters: P value cutoff, 0.01; FDR q value, 0.05. Red nodes represent enriched gene lists in upregulated genes from the GSEA. Green nodes represent enriched gene lists in downregulated genes from the GSEA. Nodes are connected by edges when overlaps exist between nodes. The size of nodes reflects the total number of genes that are connected by edges to neighboring nodes. Edge thickness reflects the level of confidence between nodes. Colored labels of nodes are defined in the text and indicate specific classes of genes.
List of C. albicans genes regulated “in vivo” and categorized according to enriched GO terms
| Group and GO term | Enrichment | Log odds | Adjusted | Gene list |
|---|---|---|---|---|
| Positive | ||||
| Regulation of response to | 47/252 | 0.88 | 0.03 | |
| Iron ion homeostasis | 16/41 | 1.95 | 0.00 | |
| Regulation of filamentous | 18/60 | 1.57 | 0.03 | |
| Biofilm formation | 30/137 | 1.11 | 0.04 | |
| Pathogenesis (GO:0009405) | 41/225 | 0.85 | 0.09 | |
| Biological adhesion | 21/86 | 1.27 | 0.08 | |
| Negative | ||||
| Cellular amino acid | 44/132 | 2.02 | 0.00 | |
| Glycolysis (GO:0006096) | 11/16 | 3.07 | 0.00 | |
| Induction of host defense | 9/21 | 2.39 | 0.00 | |
| Mitochondrial electron | 6/10 | 2.87 | 0.01 | |
| Translation (GO:0006412) | 95/404 | 1.52 | 0.00 |
Log odds ratios and adjusted P values were obtained by performing GO term enrichment analysis with GOEAST (89). Only selected GO terms are listed.
Corresponding GO term numbers are given in parentheses.
Enrichment fraction was obtained by the ratio between the gene lists and the total number of genes present in a given GO term.
FIG 6 Correlations between log2 fold change (logFC) data generated in this study and from reference 24, taking early gene expression patterns from both studies (Xu et al. [24], mouse kidneys 12 h p.i. versus in vitro stationary-phase culture; Amorim-Vaz et al. [62], mouse kidneys 16 h p.i. versus in vitro exponential-phase culture). r, Pearson correlation coefficient, calculated with Prism 6.0.