| Literature DB >> 32714913 |
Yang Li1, Hongjie Liu2, Quan Xu3, Rui Wu4, Yi Zhang1, Naizhe Li1, Xiaozhou He1, Mengjie Yang1, Mifang Liang1, Xuejun Ma1,5.
Abstract
Host response biomarkers offer a promising alternative diagnostic solution for identifying acute respiratory infection (ARI) cases involving influenza infection. However, most of the published panels involve multiple genes, which is problematic in clinical settings because polymerase chain reaction (PCR)-based technology is the most widely used genomic technology in these settings, and it can only be used to measure a small number of targets. This study aimed to identify a single-gene biomarker with a high diagnostic accuracy by using integrated bioinformatics analysis with XGBoost. The gene expression profiles in dataset GSE68310 were used to construct a co-expression network using weighted correlation network analysis (WGCNA). Fourteen hub genes related to influenza infection (blue module) that were common to both the co-expression network and the protein-protein interaction network were identified. Thereafter, a single hub gene was selected using XGBoost, with feature selection conducted using recursive feature elimination with cross-validation (RFECV). The identified biomarker was oligoadenylate synthetases-like (OASL). The robustness of this biomarker was further examined using three external datasets. OASL expression profiling triggered by various infections was different enough to discriminate between influenza and non-influenza ARI infections. Thus, this study presented a workflow to identify a single-gene classifier across multiple datasets. Moreover, OASL was revealed as a biomarker that could identify influenza patients from among those with flu-like ARI. OASL has great potential for improving influenza diagnosis accuracy in ARI patients in the clinical setting.Entities:
Keywords: OASL; WGCNA; XGBoost; host response; influenza infection
Year: 2020 PMID: 32714913 PMCID: PMC7343705 DOI: 10.3389/fbioe.2020.00729
Source DB: PubMed Journal: Front Bioeng Biotechnol ISSN: 2296-4185
FIGURE 1General study workflow: data collection, in silico analysis, and external validation. PPI, protein–protein interaction; RFECV, recursive feature elimination with cross-validation.
FIGURE 2Co-expression network constructed using weighted correlation network analysis (WGCNA). (A) Analysis of the scale-free fit index with a threshold of 0.90 (top) and mean connectivity (bottom) for various soft-thresholding power values. (B) Distribution of average gene significance and errors in the modules associated with Influenza infections (FluA-Day0). (C) Heatmap of the correlation between module eigengenes and the clinical traits recorded in GSE68310. FluA, influenza A virus; FluB, influenza B virus; HRV, human rhinovirus; HCoV, human coronavirus.
FIGURE 3Functional analysis of interesting modules. (A) GO and KEGG enrichment results for the blue module; (B) Venn diagram of KEGG results for the blue and purple modules; (C) Venn diagram of GO results for the blue and purple modules. GO: Gene Ontology; KEGG: Kyoto Encyclopedia of Genes and Genomes.
FIGURE 4Hub gene selection. (A) Scatter plot of module eigengenes in the blue module with selection thresholds. (B) Visualization of the network connections among the most connected genes in the blue module. The size of circles was equal to the log2 fold change. (C) Common hub genes in both the PPI and co-expression networks. (D) Classification accuracy versus number of genes, based on the combination of XGBoost and recursive feature elimination with cross-validation. (E) Evaluation of classification performance of the selected hub gene, oligoadenylate synthetases-like (OASL), using dataset GSE68310.
FIGURE 5Forest plot of diagnostic performance of OASL and IFI27 on external cohorts. AUC, Area under curve. *Cases with bacterial infections were removed.