| Literature DB >> 25814982 |
Andreas Dix1, Kerstin Hünniger2, Michael Weber2, Reinhard Guthke1, Oliver Kurzai2, Jörg Linde1.
Abstract
Sepsis is a clinical syndrome that can be caused by bacteria or fungi. Early knowledge on the nature of the causative agent is a prerequisite for targeted anti-microbial therapy. Besides currently used detection methods like blood culture and PCR-based assays, the analysis of the transcriptional response of the host to infecting organisms holds great promise. In this study, we aim to examine the transcriptional footprint of infections caused by the bacterial pathogens Staphylococcus aureus and Escherichia coli and the fungal pathogens Candida albicans and Aspergillus fumigatus in a human whole-blood model. Moreover, we use the expression information to build a random forest classifier to classify if a sample contains a bacterial, fungal, or mock-infection. After normalizing the transcription intensities using stably expressed reference genes, we filtered the gene set for biomarkers of bacterial or fungal blood infections. This selection is based on differential expression and an additional gene relevance measure. In this way, we identified 38 biomarker genes, including IL6, SOCS3, and IRG1 which were already associated to sepsis by other studies. Using these genes, we trained the classifier and assessed its performance. It yielded a 96% accuracy (sensitivities >93%, specificities >97%) for a 10-fold stratified cross-validation and a 92% accuracy (sensitivities and specificities >83%) for an additional test dataset comprising Cryptococcus neoformans infections. Furthermore, the classifier is robust to Gaussian noise, indicating correct class predictions on datasets of new species. In conclusion, this genome-wide approach demonstrates an effective feature selection process in combination with the construction of a well-performing classification model. Further analyses of genes with pathogen-dependent expression patterns can provide insights into the systemic host responses, which may lead to new anti-microbial therapeutic advances.Entities:
Keywords: decision tree based methods; feature selection; fungal pathogens; immune response; microarray; systems biology
Year: 2015 PMID: 25814982 PMCID: PMC4356159 DOI: 10.3389/fmicb.2015.00171
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
Figure 1The workflow for biomarker identification, classifier construction and performance assessment.
Housekeeping genes and putative reference genes suggested by other studies were used as input for determining stably expressed reference genes.
The symbols in the genes .
Figure 2The variable importance values were computed by the random forest algorithm. A gene with larger values exhibits a higher influence on the correct class predictions. The 50 highest importance values of the measure “mean decrease in accuracy” are shown. Genes above the dashed lines were selected as biomarkers for the corresponding classes.
Figure 3Visualization of the expression patterns of the biomarker genes. The samples are clustered according to their corresponding classes. The heatmap colors correlate with the normalized expression intensities (see key on right side). The colors of the gene symbols indicate the class for which the gene was selected as biomarker (brown = fungal class, blue = bacterial class, gray = mock-infected class).
Sensitivities and specificities for the performance assessments.
| – | 0.833 | 1.000 | – | 1.000 | 0.833 | |
| Cross-validation | 0.950 | 0.938 | 1.000 | 0.973 | 0.976 | 1.000 |
The C. neoformans dataset does not comprise samples of the bacterial class. Thus, no sensitivity and specificity could be calculated for this condition.
Figure 4The MDS plot based on the . Small distances correspond to high correlation coefficiens. Brown and gray circles indicate samples of the fungal and the mock-infected class, respectively. The arrow marks the fungal sample that was misclassified as mock-infected control.