Xiaoxi Zeng1,2,3, Chunyang Li1,3, Yi Li4, Haopeng Yu1,3, Ping Fu2,3, Hyokyoung G Hong5, Wei Zhang1,3. 1. West China Biomedical Big Data Center, West China School of Medicine (West China Hospital), Sichuan University, Chengdu, China. 2. Division of Nephrology, Kidney Research Institute, West China Hospital, Sichuan University, Chengdu, China. 3. Medical Big Data Center, Sichuan University, Chengdu, China. 4. Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, Michigan, USA. 5. Department of Statistics and Probability, Michigan State University, East Lansing, Michigan, USA.
Abstract
AIMS: Intervention for end-stage kidney disease (ESKD), which is associated with adverse prognoses and major economic burdens, is challenging due to its complex pathogenesis. The study was performed to identify biomarker genes and molecular mechanisms for ESKD by bioinformatics approach. METHODS: Using the Gene Expression Omnibus dataset GSE37171, this study identified pathways and genomic biomarkers associated with ESKD via a multi-stage knowledge discovery process, including identification of modules of genes by weighted gene co-expression network analysis, discovery of important involved pathways by Gene Ontology and Kyoto Encyclopedia of Genes and Genomes enrichment analyses, selection of differentially expressed genes by the empirical Bayes method, and screening biomarker genes by the least absolute shrinkage and selection operator (Lasso) logistic regression. The results were validated using GSE70528, an independent testing dataset. RESULTS: Three clinically important gene modules associated with ESKD, were identified by weighted gene co-expression network analysis. Within these modules, Gene Ontology and Kyoto Encyclopedia of Genes and Genomes enrichment analyses revealed important biological pathways involved in ESKD, including transforming growth factor-β and Wnt signalling, RNA-splicing, autophagy and chromatin and histone modification. Furthermore, Lasso logistic regression was conducted to identify five final genes, namely, CNOT8, MST4, PPP2CB, PCSK7 and RBBP4 that are differentially expressed and associated with ESKD. The accuracy of the final model in distinguishing the ESKD cases and controls was 96.8% and 91.7% in the training and validation datasets, respectively. CONCLUSION: Network-based variable selection approaches can identify biological pathways and biomarker genes associated with ESKD. The findings may inform more in-depth follow-up research and effective therapy.
AIMS: Intervention for end-stage kidney disease (ESKD), which is associated with adverse prognoses and major economic burdens, is challenging due to its complex pathogenesis. The study was performed to identify biomarker genes and molecular mechanisms for ESKD by bioinformatics approach. METHODS: Using the Gene Expression Omnibus dataset GSE37171, this study identified pathways and genomic biomarkers associated with ESKD via a multi-stage knowledge discovery process, including identification of modules of genes by weighted gene co-expression network analysis, discovery of important involved pathways by Gene Ontology and Kyoto Encyclopedia of Genes and Genomes enrichment analyses, selection of differentially expressed genes by the empirical Bayes method, and screening biomarker genes by the least absolute shrinkage and selection operator (Lasso) logistic regression. The results were validated using GSE70528, an independent testing dataset. RESULTS: Three clinically important gene modules associated with ESKD, were identified by weighted gene co-expression network analysis. Within these modules, Gene Ontology and Kyoto Encyclopedia of Genes and Genomes enrichment analyses revealed important biological pathways involved in ESKD, including transforming growth factor-β and Wnt signalling, RNA-splicing, autophagy and chromatin and histone modification. Furthermore, Lasso logistic regression was conducted to identify five final genes, namely, CNOT8, MST4, PPP2CB, PCSK7 and RBBP4 that are differentially expressed and associated with ESKD. The accuracy of the final model in distinguishing the ESKD cases and controls was 96.8% and 91.7% in the training and validation datasets, respectively. CONCLUSION: Network-based variable selection approaches can identify biological pathways and biomarker genes associated with ESKD. The findings may inform more in-depth follow-up research and effective therapy.