| Literature DB >> 35978028 |
Md Al Mehedi Hasan1, Md Maniruzzaman1,2, Jungpil Shin3.
Abstract
Immunoglobulin-A-nephropathy (IgAN) is a kidney disease caused by the accumulation of IgAN deposits in the kidneys, which causes inflammation and damage to the kidney tissues. Various bioinformatics analysis-based approaches are widely used to predict novel candidate genes and pathways associated with IgAN. However, there is still some scope to clearly explore the molecular mechanisms and causes of IgAN development and progression. Therefore, the present study aimed to identify key candidate genes for IgAN using machine learning (ML) and statistics-based bioinformatics models. First, differentially expressed genes (DEGs) were identified using limma, and then enrichment analysis was performed on DEGs using DAVID. Protein-protein interaction (PPI) was constructed using STRING and Cytoscape was used to determine hub genes based on connectivity and hub modules based on MCODE scores and their associated genes from DEGs. Furthermore, ML-based algorithms, namely support vector machine (SVM), least absolute shrinkage and selection operator (LASSO), and partial least square discriminant analysis (PLS-DA) were applied to identify the discriminative genes of IgAN from DEGs. Finally, the key candidate genes (FOS, JUN, EGR1, FOSB, and DUSP1) were identified as overlapping genes among the selected hub genes, hub module genes, and discriminative genes from SVM, LASSO, and PLS-DA, respectively which can be used for the diagnosis and treatment of IgAN.Entities:
Mesh:
Year: 2022 PMID: 35978028 PMCID: PMC9385868 DOI: 10.1038/s41598-022-18273-x
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1Flowchart of data preparation, processing, analysis, and validation.
Figure 2Identification and hierarchical clustering of DEGs for IgAN patients. (A) Volcano plot of DEGs which were generated using “ggplot2 version 3.3.6” package in R[63] (https://cran.r-project.org/package=ggplot2) . Dodger blue represents down-regulated, gray represents no significant genes, and fire brick represents up-regulated DEGs. (B) Heatmap of the DEGs for IgAN patients which were generated using “NMF” version 0.24.0 package in R[64] (https://cran.r-project.org/package=NMF). The horizontal axis shows the number of patients and the vertical axis shows DEGs.
GO analysis of DEGs in biological process, cellular component, and molecular function.
| GO ID | Description | Count | ||
|---|---|---|---|---|
| BP | GO:0006954 | Inflammatory response | 32 | |
| GO:0051591 | Response to camp | 10 | ||
| GO:0019221 | Cytokine-mediated signaling pathway | 21 | ||
| GO:0071222 | Cellular response to lipopolysaccharide | 16 | ||
| GO:0030593 | Neutrophil chemotaxis | 11 | ||
| CC | GO:0070062 | Extracellular exosome | 93 | |
| GO:0005576 | Extracellular region | 70 | ||
| GO:0005615 | Extracellular space | 62 | ||
| GO:0005581 | Collagen trimer | 12 | ||
| GO:0072562 | Blood microparticle | 14 | ||
| MF | GO:0005201 | Extracellular matrix structural constituent | 15 | |
| GO:0001228 | Transcriptional activator activity | 27 | ||
| GO:0008270 | Zinc ion binding | 36 | ||
| GO:0022857 | Transmembrane transporter activity | 13 | ||
| GO:0030020 | Gextra cellular matrix structural constituent conferring tensile strength | 7 |
Top five items were selected based on p-value.
GO gene ontology, BP biological process, CC cellular component, MF molecular function.
KEGG pathway analysis of DEGs.
| Pathway ID | Description | Count | |
|---|---|---|---|
| hsa00260 | Glycine, serine and threonine metabolism | 10 | |
| hsa04933 | Age-rage signaling pathway in diabetic complications | 13 | |
| hsa04974 | Protein digestion and absorption | 12 | |
| hsa04657 | IL-17 signaling pathway | 10 | |
| hsa04380 | Osteoclast differentiation | 11 | 0.0013 |
Top five items were selected based on p-value.
Figure 3(A) PPI network of DEGs, (B) Module 1, and (C) Module 2. These three figures were generated by Cytoscape 3.9.1[54] (www.cytoscape.org).
List of 19 hub genes which were identified from PPI network based on degree of connectivity.
| SN | Gene | Degree | Betweenness | Closeness |
|---|---|---|---|---|
| 1 | FOS | 50 | 0.113 | 0.314 |
| 2 | JUN | 44 | 0.164 | 0.326 |
| 3 | FN1 | 38 | 0.113 | 0.321 |
| 4 | ALB | 34 | 0.190 | 0.330 |
| 5 | IL1B | 32 | 0.234 | 0.337 |
| 6 | EGR1 | 32 | 0.012 | 0.280 |
| 7 | JUNB | 30 | 0.023 | 0.291 |
| 8 | CD44 | 28 | 0.074 | 0.310 |
| 9 | MMP2 | 28 | 0.033 | 0.288 |
| 10 | MYC | 26 | 0.076 | 0.315 |
| 11 | FOSB | 26 | 0.011 | 0.275 |
| 12 | COL1A2 | 24 | 0.006 | 0.264 |
| 13 | TYROBP | 22 | 0.060 | 0.223 |
| 14 | CSF1R | 22 | 0.093 | 0.248 |
| 15 | COL1A1 | 22 | 0.008 | 0.269 |
| 16 | CCL4 | 20 | 0.062 | 0.303 |
| 17 | ATF3 | 20 | 0.044 | 0.264 |
| 18 | DUSP1 | 20 | 0.036 | 0.262 |
| 19 | LUM | 20 | 0.013 | 0.250 |
Two modules selected from the PPI network.
| Cluster | Score | Nodes | Edges | Node IDs |
|---|---|---|---|---|
| 1 | 8.44 | 10 | 76 | COL5A2, POSTN, COL6A3, LUM, COL1A1, SDC1, COL3A1, MMP2, FN1, COL1A2 |
| 2 | 8.40 | 11 | 84 | DUSP1, JUN, JUNB, EGR3, MYC, FOSL2, FOSB, FOSL1, EGR1, FOS, ARC |
Score=density no. of nodes.
Figure 4Classification accuracy of SVM for each gene.
Figure 5Discriminative gene selected using LASSO-based model by 10 CV: (A) A coefficient profile plot was generated against the log (λ) sequence. (B) 32 discriminative genes were selected for IgAN. (C) Contribution of 32 discriminative genes for IgAN patients.
Figure 6PLS-DA for DEGs: (A) Component 1 vs. Component 2. The red points indicate IgAN patients and the green points indicate healthy control; (B) Importance of top 20 discriminative genes for IgAN.
Figure 7Identification and PPI analysis of key hub genes for IgAN patients. (A) Key candidate genes identification from hub module genes, computed from Cytohubba, SVM, LASSO, and PLS-DA. (B) PPI analysis of key five candidate genes.
Figure 8Boxplot of five key candidate genes as (A) FOS, (B) JUN, (C) EGR1, (D) FOSB, (E) DUSP1 for IgAN patients, and (F) Heatmap of the five key candidate genes in renal tissue samples which were generated using “NMF” version 0.24.0 package in R[64] (https://cran.r-project.org/package=NMF).
Figure 9Validation of the five key candidate genes using ROC curves which were generated by pROC package with version 1.18.0 in R[29] (https://cran.r-project.org/package=pROC) and heatmap for GSE116626 dataset. (A) FOS (B) JUN (C), EGR1 (D) FOSB (E) DUSP1 (F) Heatmap of the five key candidate genes in renal tissue samples which were generated using “NMF” version 0.24.0 package in R[64] (https://cran.r-project.org/package=NMF). CI confidence interval.
Figure 10Validation of the five key candidate genes using ROC curves which were generated by pROC package with version 1.18.0 in R[29] (https://cran.r-project.org/package=pROC) and heatmap for GSE35487 dataset. (A) FOS (B) JUN (C), EGR1 (D) FOSB (E) DUSP1 (F) Heatmap of the five key candidate genes in renal tissue samples which were generated using “NMF” version 0.24.0 package in R[64] (https://cran.r-project.org/package=NMF).