Literature DB >> 32031163

Combined Use of Three Machine Learning Modeling Methods to Develop a Ten-Gene Signature for the Diagnosis of Ventilator-Associated Pneumonia.

Yunfang Cai1, Wen Zhang1, Runze Zhang1, Xiaoying Cui1, Jun Fang1.   

Abstract

BACKGROUND This study aimed to use three modeling methods, logistic regression analysis, random forest analysis, and fully-connected neural network analysis, to develop a diagnostic gene signature for the diagnosis of ventilator-associated pneumonia (VAP). MATERIAL AND METHODS GSE30385 from the Gene Expression Omnibus (GEO) database identified differentially expressed genes (DEGs) associated with patients with VAP. Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment identified the molecular functions of the DEGs. The least absolute shrinkage and selection operator (LASSO) regression analysis algorithm was used to select key genes. Three modeling methods, including logistic regression analysis, random forest analysis, and fully-connected neural network analysis, also known as also known as the feed-forward multi-layer perceptron (MLP), were used to identify the diagnostic gene signature for patients with VAP. RESULTS Sixty-six DEGs were identified for patients who had VAP (VAP+) and who did not have VAP (VAP-). Ten essential or feature genes were identified. Upregulated genes included matrix metallopeptidase 8 (MMP8), arginase 1 (ARG1), haptoglobin (HP), interleukin 18 receptor 1 (IL18R1), and NLR family apoptosis inhibitory protein (NAIP). Down-regulated genes included complement factor D (CFD), pleckstrin homology-like domain family A member 2 (PHLDA2), plasminogen activator, urokinase (PLAU), laminin subunit beta 3 (LAMB3), and dual-specificity phosphatase 2 (DUSP2). Logistic regression, random forest, and MLP analysis showed receiver operating characteristic (ROC) curve area under the curve (AUC) values of 0.85, 0.86, and 0.87, respectively. CONCLUSIONS Logistic regression analysis, random forest analysis, and MLP analysis identified a ten-gene signature for the diagnosis of VAP.

Entities:  

Mesh:

Year:  2020        PMID: 32031163      PMCID: PMC7020762          DOI: 10.12659/MSM.919035

Source DB:  PubMed          Journal:  Med Sci Monit        ISSN: 1234-1010


Background

Ventilator-associated pneumonia (VAP) is defined as pneumonia that occurs 48 hours or more following mechanical ventilation and extubation [1]. VAP is a hospital-acquired pneumonia that occurs in a large proportion of mechanically ventilated patients (8–28%). Although national surveillance data indicate a decline in the incidence of VAP, worldwide, it remains a common hospital-acquired infection [2]. The mortality rate for patients with VAP is between 24–50%, and can reach 76% when associated with certain pathogens [1]. The mortality associated with VAP remains high, partly because there are no guidelines for prediction of patient susceptibility or risk for VAP [3]. The use of antibiotics for suspected VAP in patients is recommended in the 2005 American Thoracic Society (ATS) guidelines [4]. Prevention measures include modifying known risk factors, but the prediction, prevention, and diagnosis of VAP remain challenging [4]. Currently available bioinformatics databases, including the Gene Expression Omnibus (GEO) database, allow gene expression profiles of human diseases to be studied [5,6]. Differentially expressed genes (DEGs) for disease based on data from the Gene Expression Omnibus (GEO) database have been increasingly reported. In a previous study on gene expression profiling in VAP, Xu et al. [7], used the expression profile GSE30385 to identify 69 DEGs that included 36 down-regulated and 33 upregulated genes in patients with VAP patients. Upregulated genes were mainly associated with pathways and functions related to the mitogen-activated protein kinase (MAPK) signaling pathway and immune response [7]. However, this previous study used traditional bioinformatics analysis and showed that genes, including ELANE, LTF, and MAPK14 [7]. In 2012, a previously published study on VAP by Swanson et al. used a cross-validated logistic regression model to identify five predictive genes, including HCN4, ADAM8, PI3, ATP2A1, and PIK3R3 [8]. However, there was only one algorithm used in establishing the model in this previous study [8]. Therefore, this study aimed to use three modeling methods, logistic regression analysis, random forest analysis, and fully-connected neural network analysis, also known as the feed-forward multi-layer perceptron (MLP), to develop a diagnostic gene signature for the diagnosis and prediction of VAP.

Material and Methods

Gene Expression Omnibus (GEO) database selection

Gene expression profiles were downloaded as raw data (CEL files) from the GSE30385 dataset [8] in the GEO database () [9]. The GPL201 [HG-Focus] Affymetrix Human HG-Focus Target Array served as the annotation platform. In this dataset, whole blood from 20 patient samples was obtained from patients with serious trauma, including ten patients with ventilator-associated pneumonia (VAP) (VAP+) and ten without VAP (VAP−). A total of 40 mL of whole blood was collected and immediately stimulated with 1,000 ng/mL of lipopolysaccharide (LPS) solution.

Data processing

The processing of raw downloaded data, including background correction, quintile normalization as well as probe summarization by robust multi-array average (RMA) algorithm [10], the affy R package [11] in Bioconductor was used (). Then, probe serials were transformed into gene symbols. Mapping multiple probes to the same gene helped to calculate the median probe expression value as the ultimate gene expression value.

Screening for differentially expressed genes (DEGs) for the VAP+ and VAP− patient groups

The Linear Models for Microarray Data (limma) package in R [12] was used to screen the DEGs of the VAP+ and the VAP− groups. DEGs were screened with cutoff values of p<0.05 and |fold change (FC)| ≥1.5 [7]. The eligible DEGs were classified into down-regulated and upregulated DEGs. To ensure two specimen types were in the identified DEGs, a three-dimensional principal component analysis (PCA) was performed using the ggord R package. An expression heatmap was used with the pheatmap R package.

Functional annotation and pathway enrichment analysis

Gene Ontology (GO) term enrichment analysis was performed on DEGs using the clusterProfiler R package to identify the molecular function [13]. The cellular component (CC), molecular function (MF), and biological process (BP) were selected with a cutoff false discovery rate (FDR) of <0.05. For the pathway analysis, the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment was analyzed using KOBAS (version 3.0) () [14]. An FDR <0.05 was considered to be statistically significant.

Protein–protein interaction (PPI) network construction and module analysis

The STRING (version 10.5) () [15] database was used to establish a PPI network. The parameter of protein interactions was set at a medium confidence level. Cytoscape (version 3.6.1) software () [16] was used for the visualization and analysis of the PPI network. The key modules of the entire network were screened using the MCODE plug-in. The ClueGO [17], and CluePedia [18] plug-ins of the Cytoscape software were used to perform GO enrichment analysis of the module, with the parameters set to default.

Data preprocessing and manifold learning before building a predictive signature

The range of expression data of all genes was evaluated in the VAP data and was 0–13 without notable outliers. The min–max scaling method was suitable for this type of data. For all patient samples, min–max scaling was used to transform the expression data of a given gene (i) to the range (0, 1), using the following formula: The intrinsic geometry of the data structure of VAP data was investigated and was easily visualized in the lower dimensions, and a robust projection method was used to extract essential data. For this type of task, manifold learning algorithms, which is a subfield of machine learning, were developed. In this study, the Isomap nonlinear dimensionality reduction method was chosen [19,20] to project the 66 dimensions of the data into two dimensions, which helped visualize the data geometry and primarily determine the machine learning algorithms that should be used in the modeling stage.

Screening for feature genes

The least absolute shrinkage and selection operator (LASSO) regression analysis algorithm was used to select key genes to build a linear model between target variables and genes with L1 norm constraints. This analysis method was more effective at selecting important features when compared with the traditional least-squared method. LASSO was used to improve the accuracy of the linear model and avoid over-fitting by penalizing coefficients with large values. The linear model derived from LASSO was reduced most of the coefficients to zero, and the features with non-zero coefficients were essential for predicting the target variables or patient labels.

Establishment of a gene signature with diagnostic value for VAP using logistic regression analysis, random forest analysis, and fully-connected neural network analysis

In this study, widely used and validated algorithms, including logistic regression and random forest algorithms, were applied to construct classification models. In particular, a type of deep neural network was applied, namely a fully connected or dense-layer network, to construct a generalized model from the data. Logistic regression has been used in many machine learning and medical fields and has previously provided good results. As a widely used statistical model in binary classification tasks, the algorithm identifies correlations between features (f) and binary dependent target variables (zero and one tag) on a given dataset and fit a multivariate linear equation (L). Then, the output of the linear equation was passed to the logistic function, and the final probability (P) of data belonging to class one, or VAP–related patients, was obtained. As a result, the patient with an output probability >0.5 was identified as a VAP–related patient, or was identified as a normal patient. In this study, the penalty coefficient of logistic regression was set to 0.02 to obtain good model generalization. The logistic regression formula used was as follows: The random forest algorithm introduced randomization to the algorithm to reduce overfitting of a single decision tree and to promote model accuracy by building many related decision trees from one training set. In particular, deep neural networks have been widely applied in a variety of fields, including natural language processing [21], and have achieved performances comparable with those of traditional machine learning algorithms. Therefore, to take full advantage of this type of power model, a fully connected three-layer deep neural network was constructed, also known as the feed-forward multi-layer perceptron (MLP), to investigate the intrinsic relationship between patient types and gene expression data. The MLP model utilized the backward propagation method as a supervised learning algorithm to minimize the error or loss function, a statistical measuring distance between predicted labels and true labels across the neural network, to converge on the network.

Cross-validation and metrics

Given that the number of patient samples in the dataset was relatively small, a reliable leave-one-out (LOO) cross-validation procedure was used to evaluate the model’s generalization abilities derived from the three algorithms used in this study, logistic regression analysis, random forest analysis, and fully-connected neural network analysis. Also, the area under the receiver operating characteristic (ROC) curve (AUC) and the metrics of accuracy was applied as quantitative measurements to assess the predicted abilities of the constructed models.

Results

Identification of ventilator-associated pneumonia (VAP) gene expression dataset

A multistep bioinformatic analysis was performed in this study to develop a ten-gene signature to predict VAP (Figure 1). The gene expression files of VAP patients were downloaded from the Gene Expression Omnibus (GEO) database. The Affy R package was used to preprocess the raw data. As shown in Figure 2, box plots of the processed and raw data distribution were prepared. The data distribution was disordered before processing but was consistent after processing and could be used for subsequent analysis.
Figure 1

Flow diagram of the study design. The study design for developing a ten-gene signature for predicting ventilator-associated pneumonia (VAP), based on a deep learning neural network and bioinformatics analysis.

Figure 2

The distribution of gene expression values for ventilator-associated pneumonia (VAP). (A) Raw data box plot. (B) Normalized data box plot. The abscissa and ordinate represent the Gene Expression Omnibus (GEO) samples and the gene expression value, respectively.

Identification of differentially expressed genes (DEGs) inVAP+ and VAP− patients

Based on the processed data of the VAP gene expression profiles, 66 significant DEGs were identified, including 35 down-regulated and 31 upregulated genes for the VAP+ and VAP− groups, respectively. The top ten DEGs are shown in Table 1. Upregulated genes included matrix metallopeptidase 8 (MMP8), arginase 1 (ARG1), haptoglobin (HP), interleukin 18 receptor 1 (IL18R1), and NLR family apoptosis inhibitory protein (NAIP). The top down-regulated genes included complement factor D (CFD), pleckstrin homology-like domain family A member 2 (PHLDA2), plasminogen activator, urokinase (PLAU), laminin subunit beta 3 (LAMB3), and dual-specificity phosphatase 2 (DUSP2).
Table 1

The top ten differentially expressed genes (DEGs) in ventilator-associated pneumonia (VAP).

Gene symbolGene namelog2-FCp-ValueRegulation
MMP8Matrix metallopeptidase 81.8714509820.002938895Up
ARG1Arginase 11.4888119730.003824453Up
HPHaptoglobin1.4114484450.007541712Up
IL18R1Interleukin 18 receptor 11.1561289870.008158229Up
NAIPNLR family apoptosis inhibitory protein1.0958232840.036770186Up
LTFLactotransferrin1.0512615020.014733312Up
CYP1B1Cytochrome P450 family 1 subfamily B member 10.9824266810.041859932Up
DEFA4Defensin alpha 40.9799544020.034299563Up
MNDAMyeloid cell nuclear differentiation antigen0.9596225730.017834746Up
ANXA3Annexin A30.8489702440.019900178Up
CTSZCathepsin Z−0.8628584890.002942944Down
THBDThrombomodulin−0.8714302290.009507344Down
PPIFPeptidylprolyl isomerase F−0.8723206940.006172789Down
SLPISecretory leukocyte peptidase inhibitor−0.8909798570.001666654Down
PI3Peptidase inhibitor 3−0.9283810599.68E–05Down
DUSP2Dual-specificity phosphatase 2−0.9612420770.005718815Down
LAMB3Laminin subunit beta 3−1.0107877630.000192664Down
PLAUPlasminogen activator, urokinase−1.0630365670.032298508Down
PHLDA2Pleckstrin homology like domain family A member 2−1.0701583340.001626576Down
CFDComplement factor D−1.2425980990.013684517Down
Three-dimensional principal component analysis (PCA) was performed using the above DEGs (Figure 3A) and showed that VAP samples were divided into two groups. The volcano plot of the p-value and fold change are shown in Figure 3B. The whole gene expression of the 66 DEGs is shown in the heatmap in Figure 3C.
Figure 3

Identification of differentially expressed genes (DEGs) in ventilator-associated pneumonia (VAP) between the VAP+ and VAP− groups. (A) Three-dimensional principal component analysis (PCA). The red points represent VAP+ samples, and the blue points represent VAP− samples. (B) The volcano plot of DEGs. Blue dots denote down-regulated genes, and red dots represent upregulated genes. (C) The expression heatmap of DEGs.

Gene Ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis of the DEGs

Functional and pathway enrichment analysis of the biological classification of the above DEGs was performed. The results of the analysis of the GO terms analysis are shown in Supplementary Table 1. In the biological process (BP) category, upregulated DEGs were associated with a significant increase in neutrophil activity, including neutrophil activation in the immune response and degranulation. In the molecular function (MF) category, upregulated genes were enriched in serine hydrolase activity, serine-type peptidase and endopeptidase activity, and glucosyltransferase activity. In the cellular component (CC) category, upregulated DEGs were associated with the cytoplasmic vesicle lumen, and secretory granule lumen (Figure 4A).
Figure 4

Functional annotation and pathway enrichment analysis of genes associated with ventilator-associated pneumonia (VAP). (A) The Gene Ontology (GO) enrichment term results of the upregulated differentially expressed genes (DEGs). (B) The GO terms enrichment results of the down-regulated DEGs. (C) The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment results of the upregulated DEGs. (D) The KEGG pathway enrichment results of down-regulated DEGs.

Down-regulated genes showed significant enrichment in the upregulation of responses to external stimuli, down-regulation of hydrolase activity, negative regulation of proteolysis and endopeptidase activity, and differentiation of brown fat cells in the BP category. In the MF category, down-regulated genes were enriched in endopeptidase inhibitor activity, peptidase inhibitor activity, endopeptidase regulator activity, cyclosporin A binding, and enzyme inhibitor activity. In the CC category, down-regulated genes were enriched in the endoplasmic reticulum lumen (Figure 4B). The KEGG pathway enrichment was analyzed for the identified DEGs using the KOBAS database (Supplementary Table 2). The analysis showed a significant increase of upregulated DEGs in the hematopoietic cell lineage, interactions between cytokine receptors, metabolic pathways, amoebiasis, and MAPK, NOD-like receptor, FoxO, neurotrophin, and tumor necrosis factor (TNF) signaling pathways (Figure 4C). The down-regulated genes showed enrichment in the PI3K-Akt signaling pathway, proteoglycans in cancer, Epstein-Barr virus (EBV) infection, focal adhesion, pathways in cancer, herpes simplex virus (HSV) infection, transcriptional dysregulation in cancer, tuberculosis, the cGMP-PKG signaling pathway, and microRNAs in cancer (Figure 4D).

Protein–protein interaction (PPI) network establishment and module analysis

The STRING database was used to create a PPI network to investigate the biological roles of the identified DEGs (Figure 5A). There were 48 nodes and 106 edges that were identified in the PPI network. Two key modules were identified from the whole network using MCODE (Figure 5B, 5C). ClueGO was used to perform GO term enrichment analysis of genes in module 1 (Figure 5D). Genes were enriched in tertiary granule lumen, specific granule, specific granule lumen, defense response to fungus, disruption of cells of other organisms, and antibacterial humoral response (Supplementary Table 3).
Figure 5

Protein–protein interaction (PPI) network construction and module analysis based on differentially expressed genes (DEGs). (A) The entire PPI network (B) Module 1 network. (C) Module 2 network. (D) The Gene Ontology (GO) enrichment term analysis of module 1.

To select feature genes and build a gene signature with diagnostic value for VAP among the DEGs, a series of analyses was performed (Figure 6A). The projected data identified by the Isomap algorithm transformed two dimensions, are shown in Figure 6B, where the binary classes of data are represented by different colors, and annotations on the right corner of each data point show the sample attribute. The majority of data points are mutually separated and can be distinguished by a simple decision boundary. Therefore, the machine learning algorithm was developed without a complex adjustment to fit the data and provide results.
Figure 6

Development of a ten-gene signature for predicting ventilator-associated pneumonia (VAP) using the three modeling methods of logistic regression, random forest, and fully connected neural network analysis. (A) The workflow for developing the gene signature. (B) The manifold learning algorithm was used to project the data. The points with zero labels identify VAP− patients, with the remaining being VAP+ patients. (C) Feature selection of ten genes. (D) The receiver operating characteristic (ROC) curves of the diagnostic value of the three algorithms.

Among the 66 genes, some were closely associated with the two types of patients and might be key biomarkers for identifying patients at increased risk of VAP. A reliable feature selection process was adopted in this study to identify essential genes. After LASSO on the 66 identified DEGs, ten essential genes with coefficients greater than zero were extracted as feature genes (Figure 6C, Table 2).
Table 2

Details of the ten featured genes in ventilator-associated pneumonia (VAP).

Gene symbolGene namelog2-FCp-ValueRegulation
LTFLactotransferrin1.0512615020.014733312Up
MNDAMyeloid cell nuclear differentiation antigen0.9596225730.017834746Up
FKBP5FK506 binding protein 50.7872329990.013536848Up
PDGFCPlatelet derived growth factor C0.7824153640.00563081Up
GADD45AGrowth arrest and DNA damage inducible alpha0.7245828650.000672126Up
ARHGDIARho GDP dissociation inhibitor alpha0.7201672090.033516623Up
PPIBPeptidylprolyl isomerase B−0.6082658830.0035643Down
RGS2Regulator of G protein signaling 2−0.7455493240.000658377Down
KIF3BKinesin family member 3B−0.7468306640.002775964Down
CTSZCathepsin Z−0.8628584890.002942944Down

Building a gene signature exhibiting diagnostic value using three algorithms

The optimal identification of two patient populations was performed using robust machine learning algorithms to build a classification model on the selected feature genes. In this study, widely used and validated algorithms, including logistic regression and random forest algorithms, were applied to construct classification models. In particular, one type of deep neural network was applied, which was a fully connected network or dense-layer network, to construct a generalized model from the data. Three prevalent and robust algorithms, including one type of deep neural network, the feed-forward multi-layer perceptron (MLP), were used to build the predictive models based on the ten selected genes. The area under the curve (AUC) values of the three models for logistic regression, random forest, and MLP were 0.85, 0.86, and 0.87, respectively (Table 3). In addition to the two metrics, the ROC curves were plotted for each model (Figure 6D). Considering the two metrics simultaneously, the predicted model based on the MLP algorithm was selected, and ten key genes were identified to distinguish between the two types of patients. The predictive ability of the MLP model also indicated that the ten selected essential genes were closely associated with the patients who were diagnosed with VAP. Among these ten genes, six were upregulated, and four were down-regulated. The expression box plots are shown in Figure 7.
Table 3

The accuracy and area under the curve (AUC) of the three predicted models in ventilator-associated pneumonia (VAP).

MetricsAccuracyAUC
Logistic regression0.750.85
Random forest0.800.86
MLP0.900.87

MLP – multi-layer perceptron.

Figure 7

The expression box plot of the ten genes associated with ventilator-associated pneumonia (VAP). * Represents a p-value <0.05. ** Represents a p-value <0.01.

Discussion

The aim of this study was to use three modeling methods, logistic regression analysis, random forest analysis, and fully-connected neural network analysis, to develop a diagnostic gene signature for the diagnosis of ventilator-associated pneumonia (VAP). The multistep bioinformatics analysis was performed to identify a ten-gene signature for the diagnosis and prediction of VAP based on the three modeling methods. GSE30385 was identified from the Gene Expression Omnibus (GEO) database to identify differentially expressed genes (DEGs) associated with patients with VAP. A total of 66 significant DEGs were identified between VAP+ and VAP− patients. Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment identified the molecular functions of the DEGs. After applying the least absolute shrinkage and selection operator (LASSO) regression analysis algorithm to select key genes, ten essential genes were identified. Based on three modeling methods, including logistic regression, random forest, and fully-connected neural network methods, a ten-gene signature was identified with diagnostic value for VAP. This ten-gene signature may predict VAP in patients and could be used as potential diagnostic or predictive markers. However, these initial findings require validation with future clinical studies. VAP acts as a potentially fatal hospital-acquired pneumonia that represents a global health problem [22]. Also, VAP is caused by multidrug-resistant bacteria that also represents an emerging global problem [23]. The diagnosis of VAP remains a challenge. Based on the use of the GEO database, bioinformatics analysis studies have been increasingly reported, but only three previous studies have been reported in gene expression associated with VAP. In 2015, Xu et al. [7] used the expression profile GSE30385 to identify 69 DEGs associated with VAP, including 36 down-regulated and 33 upregulated genes, which differed from the present study in which 66 DEGs were identified. Although the results of this previous study [7], and the present study were similar, the main reason for the differences in the number of DEGs was that the annotation platform GPL 201 was updated in July 2016. In the present study, in the GO term enrichment analysis, the upregulated genes were found to be associated with the processes of the immune system, immune reaction, and kinase activity, while the down-regulated genes were associated with stronger stress response, peptidase inhibitor activity as well as programmed cell death. Also, upregulated genes exhibited a primary enrichment in the neurotrophic protein signaling pathway, MAPK signal pathway, and the nucleotide-binding oligomeric domain (NOD)-like receptor signal pathway. In contrast, down-regulated genes exhibited a primary enrichment in complement, the coagulation cascade, cancer, ribosomal, and other pathways. In the present study, neutrophil activities were significantly enriched by upregulated genes, including neutrophil degranulation, neutrophil activation in the immune response, as well as immunity mediated by neutrophils. Neutrophil proteases are significantly increased in the alveolar space in VAP and may contribute to its pathogenesis [24]. Neutrophil extracellular traps are increased in the alveoli in patients with VAP [25]. Also, in the present study, upregulated genes were enriched in the defense responses to fungi and bacteria, immune responses, and antibacterial humoral responses. The findings from this study support the important role of immune responses in the etiology of VAP [26,27]. Genes associated with cell components were closely associated with the lumen, and down-regulated genes were enriched in the regulation of the response to external stimuli, as well as the negative regulation of hydrolase activity, proteolysis, and peptidase activity. Also, in 2015, Xu et al. [28] reported the findings from a study that used the gene expression profile data of GSE30385 and compared the PPI pairs of all genes from the STRING database, followed by searching VAP-related genes in the National Center for Biotechnology Information (NCBI) to build a PPT network for these genes. Then, they searched the overlapping DEGs and those in the PPI network and showed that the MAPK cascade and processes related to the immune system were enriched in these overlapping genes [28]. Swanson et al. [8] used a logistic regression model with cross validation to develop a gene expression model (PIK3R3, ATP2A1, PI3, ADAM8, and HCN4) for predicting VAP in trauma patients, but this previous study used only one modeling method to build the gene signatures. In the present study, three modeling methods were used to build a ten-gene signature with diagnostic value in VAP. Among these ten genes, lactotransferrin (LTF) is a multifunctional protein of the transferrin family. Specific receptors presenting on microbial cell surface also interpret lactoferrin antibacterial actions, and in humans, LTF is primarily expressed in mucosal epithelial cells and immune cells [29], and is known for its antimicrobial, antiviral, anti-inflammatory, and immunomodulatory functions [30]. A previously published study included proteomic profiling of bronchoalveolar lavage (BAL) fluid in critically ill VAP patients [31], and the protein lactotransferrin was also found to be a differentially expressed protein in VAP+ patients when compared with VAP− patients. Myeloid cell nuclear differentiation antigen (MNDA), is involved in the activation of the innate immune response and cellular defense response and is an immunohistochemical marker used to distinguish marginal zone lymphomas from other small B-cell lymphomas [32]. Also, the role of MNDA on the proliferation, apoptosis, and migration of osteosarcoma cells has previously been studied [33]. FK506 binding protein 5 (FKBP5) is an important modulator of stress responses and affects the pathogenesis of stress-related disorders [34]. The critical roles of platelet-derived growth factor C (PDGFC) in the cardiovascular system as angiogenic and survival factors have been demonstrated [35]. Growth arrest and DNA damage-inducible alpha (GADD45A) acts as an indicator of DNA damage and responds to environmental stresses by mediating the p38/JNK pathway activation through MTK1/MEKK4 kinase, and has been studied in several human cancers [36-39]. Rho GDP dissociation inhibitor alpha (ARHGDIA) is expressed in glioma [40,41]. Peptidylprolyl isomerase B (PPIB) is expressed in both Gram-negative and Gram-positive bacteria and is an intracellular protein that controls bacterial cell division [42]. Regulator of G protein signaling 2 (RGS2) is expressed in prostate cancer [43], breast cancer [44], and ovarian cancer [45]. Inhibition of kinesin family member 3B (KIF3B) expression can inhibit hepatocellular carcinoma cell proliferation [46]. The gene polymorphisms of cathepsin Z (CTSZ) is expressed in pulmonary tuberculosis [47]. Also, by inducing epithelial-mesenchymal transition (EMT) in hepatocellular carcinoma, CTSZ overexpression is associated with tumor metastasis [48]. However, there have been no previous reports on the role of MNDA, FKBP5, PDGFC, GADD45A, ARHGDIA, PPIB, RGS2, KIF3B, and CTSZ in VAP. This study had several limitations. The diagnostic signature identified in this study requires further validation in a larger sample size of patients with VAP. Although this model identified key genes associated with increased risk of VAP, a large number of genes were identified, which should be further narrowed to identify the most important genes that can be developed as predictive, diagnostic, or therapeutic biomarkers.

Conclusions

This study aimed to use machine learning models to develop a gene signature for the prediction of ventilator-associated pneumonia (VAP). The GSE30385 expression profile was downloaded from the Gene Expression Omnibus (GEO) database, and 66 significant differentially expressed genes (DEGs) were identified, including 35 down-regulated and 31 upregulated genes that distinguished between VAP+ and VAP− patients. According to Gene Ontology (GO) terms used for enrichment analysis, there was a significant increase in the number of upregulated DEGs in neutrophil activity. Down-regulated genes were increased in association with hydrolase activity. Based on the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis, there was a significant increase in the number of upregulated DEGs in FoxO and MAPK signaling pathways. Down-regulated genes saw an enrichment in PI3K/Akt signaling pathway and focal adhesion. After applying the least absolute shrinkage and selection operator (LASSO) regression analysis algorithm on the 66 DEGs, ten essential genes were extracted as feature genes and a ten-gene signature was identified to predict VAP in patients, including LTF, MNDA, FKBP5, PDGFC, GADD45A, ARHGDIA, PPIB, RGS2, KIF3B, and CTSZ. The three modeling methods included logistic regression analysis, random forest analysis, and the feed-forward multi-layer perceptron (MLP), to build a ten-gene diagnostic signature for the diagnosis of VAP. The area under the curve (AUC) values using the three models were 0.85, 0.86, and 0.87, respectively. This ten-gene signature requires further clinical evaluation for the prediction of VAP in patients.
  48 in total

1.  LSTM recurrent networks learn simple context-free and context-sensitive languages.

Authors:  F A Gers; E Schmidhuber
Journal:  IEEE Trans Neural Netw       Date:  2001

Review 2.  New antibiotics for ventilator-associated pneumonia.

Authors:  Matteo Bassetti; Antionio Vena; Nadia Castaldo; Elda Righi; Maddalena Peghin
Journal:  Curr Opin Infect Dis       Date:  2018-04       Impact factor: 4.915

Review 3.  Lactoferrin, a Pleiotropic Protein in Health and Disease.

Authors:  Sylvain Mayeur; Schohraya Spahis; Yves Pouliot; Emile Levy
Journal:  Antioxid Redox Signal       Date:  2016-03-16       Impact factor: 8.401

Review 4.  Immunosuppressive aspects of analgesics and sedatives used in mechanically ventilated patients: an underappreciated risk factor for the development of ventilator-associated pneumonia in critically ill patients.

Authors:  Michael A Smith; Maho Hibino; Bonnie A Falcione; Katherine M Eichinger; Ravi Patel; Kerry M Empey
Journal:  Ann Pharmacother       Date:  2013-11-04       Impact factor: 3.154

5.  Association of CTSZ rs34069356 and MC3R rs6127698 gene polymorphisms with pulmonary tuberculosis.

Authors:  M Hashemi; E Eskandari-Nasab; A Moazeni-Roodi; M Naderi; B Sharifi-Mood; M Taheri
Journal:  Int J Tuberc Lung Dis       Date:  2013-07-03       Impact factor: 2.373

Review 6.  Platelet-derived growth factor-C and -D in the cardiovascular system and diseases.

Authors:  Chunsik Lee; Xuri Li
Journal:  Mol Aspects Med       Date:  2017-10-13

7.  Down-regulation of GADD45A enhances chemosensitivity in melanoma.

Authors:  Jia Liu; Guoqiang Jiang; Ping Mao; Jing Zhang; Lin Zhang; Likun Liu; Jia Wang; Lawrence Owusu; Baoyin Ren; Yawei Tang; Weiling Li
Journal:  Sci Rep       Date:  2018-03-07       Impact factor: 4.379

8.  Identification of gene biomarkers in patients with postmenopausal osteoporosis.

Authors:  Chenggang Yang; Jing Ren; Bangling Li; Chuandi Jin; Cui Ma; Cheng Cheng; Yaolan Sun; Xiaofeng Shi
Journal:  Mol Med Rep       Date:  2018-12-12       Impact factor: 2.952

9.  Effects of the myeloid cell nuclear differentiation antigen on the proliferation, apoptosis and migration of osteosarcoma cells.

Authors:  Chengliang Sun; Chuanju Liu; Jun Dong; Dong Li; Wei Li
Journal:  Oncol Lett       Date:  2014-01-17       Impact factor: 2.967

10.  Gene expression profile analysis of ventilator-associated pneumonia.

Authors:  Xiaoli Xu; Bo Yuan; Quan Liang; Huimin Huang; Xiangyi Yin; Xiaoyue Sheng; Niuyan Nie; Hongmei Fang
Journal:  Mol Med Rep       Date:  2015-09-29       Impact factor: 2.952

View more
  2 in total

Review 1.  Data Science Trends Relevant to Nursing Practice: A Rapid Review of the 2020 Literature.

Authors:  Brian J Douthit; Rachel L Walden; Kenrick Cato; Cynthia P Coviak; Christopher Cruz; Fabio D'Agostino; Thompson Forbes; Grace Gao; Theresa A Kapetanovic; Mikyoung A Lee; Lisiane Pruinelli; Mary A Schultz; Ann Wieben; Alvin D Jeffery
Journal:  Appl Clin Inform       Date:  2022-02-09       Impact factor: 2.342

2.  Predicting mechanical ventilation effects on six human tissue transcriptomes.

Authors:  Judith Somekh; Nir Lotan; Ehud Sussman; Gur Arye Yehuda
Journal:  PLoS One       Date:  2022-03-10       Impact factor: 3.240

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.