| Literature DB >> 35492683 |
David J Burks1, Soham Sengupta1, Ronika De1, Ron Mittler2,3, Rajeev K Azad1,4.
Abstract
Identifying genes that interact to confer a biological function to an organism is one of the main goals of functional genomics. High-throughput technologies for assessment and quantification of genome-wide gene expression patterns have enabled systems-level analyses to infer pathways or networks of genes involved in different functions under many different conditions. Here, we leveraged the publicly available, information-rich RNA-Seq datasets of the model plant Arabidopsis thaliana to construct a gene co-expression network, which was partitioned into clusters or modules that harbor genes correlated by expression. Gene ontology and pathway enrichment analyses were performed to assess functional terms and pathways that were enriched within the different gene modules. By interrogating the co-expression network for genes in different modules that associate with a gene of interest, diverse functional roles of the gene can be deciphered. By mapping genes differentially expressing under a certain condition in Arabidopsis onto the co-expression network, we demonstrate the ability of the network to uncover novel genes that are likely transcriptionally active but prone to be missed by standard statistical approaches due to their falling outside of the confidence zone of detection. To our knowledge, this is the first A. thaliana co-expression network constructed using the entire mRNA-Seq datasets (>20,000) available at the NCBI SRA database. The developed network can serve as a useful resource for the Arabidopsis research community to interrogate specific genes of interest within the network, retrieve the respective interactomes, decipher gene modules that are transcriptionally altered under certain condition or stage, and gain understanding of gene functions.Entities:
Year: 2022 PMID: 35492683 PMCID: PMC9039629 DOI: 10.1002/pld3.396
Source DB: PubMed Journal: Plant Direct ISSN: 2475-4455
FIGURE 1Arabidopsis weighted gene correlation network analysis (WGCNA) network. Nodes with correlation >.12 are shown. Each node (gene) is color‐coded to its respective modules. Each modules is designated to have a specific function based on gene enrichment analysis. The network topology is displayed using the Prefuse force directed layout algorithm in Cytoscape
Enriched GO terms and KEGG pathways in modules 1–54
| Module number | Biological process | Molecular function | Cellular component | Enriched KEGG pathway | KEGG fold enrichment |
|---|---|---|---|---|---|
| 1 | RNA metabolic process | Nucleic acid binding | Nucleus | Basal transcription factors | 4.47 |
| 2 | Cell cycle | Microtubule binding | Cytoskeleton | DNA replication | 10.26 |
| 3 | Secondary metabolic process | Heme binding | Extracellular region | Flavonoid biosynthesis | 6.00 |
| 4 | Photosynthesis | Oxidoreductase activity | Chloroplast | Photosynthesis—antenna proteins | 15.19 |
| 5 | RNA modification | RNA binding | Mitochondrion/nucleolus | Ribosome biogenesis in eukaryotes | 8.52 |
| 6 | Vesicle−mediated transport | Transferase activity | Endomembrane system | Various types of N‐glycan biosynthesis | 9.71 |
| 7 | Plastid organization | Catalytic activity | Chloroplast/plastid | Porphyrin and chlorophyll metabolism | 9.11 |
| 8 | Translation | Structural constituent of ribosome | Ribosome | Ribosome | 10.37 |
| 9 | Catabolic process | Catalytic activity | Endomembrane system/Phagophore assembly | Autophagy—other | 10.71 |
| 10 | Ubiquitin‐dependent protein catabolic process | Peptidase activity | Proteosome complex | Proteasome | 21.85 |
| 11 | Cytoskeleton organization | Transferase activity | Golgi apparatus/cytoskeleton | One carbon pool by folate | 13.56 |
| 12 | Defense response | Kinase activity | Plasma membrane | Plant‐pathogen interaction | 11.48 |
| 13 | Localization | Phospholipase/lipase activity | Respiratory chain/membrane protein complex | Protein export | 12.65 |
| 14 | Purine metabolism | Metal ion/cation binding | Mitchondrial membrane/envelope | Citrate cycle (TCA cycle) | 12.67 |
| 15 | Protein modification | Ubiquitin protein ligase activity | Nucleus | — | — |
| 16 | Biotic stimulus | Kinase activity | Extracellular region | Plant‐pathogen interaction | 1.05 |
| 17 | Protein catabolic process | Ubiquitin conjugating enzyme activity | Ruffle membrane | SNARE interactions in vesicular transport | 14.07 |
| 18 | Cell wall organization or biogenesis | Oxidoreductase activity | Extracellular region | — | — |
| 19 | Pollen tube development | Structural constituent of cell wall | Pollen tube | Ether lipid metabolism | 65.71 |
| 20 | Response to stress | ADP binding | Plasma membrane/SMC loading complex | Alpha‐linolenic acid metabolism | 39.73 |
| 21 | Regulation of DNA replication | Sar guanyl‐nucleotide exchange factor | Telomere cap complex/CST complex | — | — |
| 22 | ATP metabolic process | NADH dehydrogenase activity | Mitochondrion/nucleolus | Oxidative phosphorylation | 23.87 |
| 23 | mRNA splicing | Nucleic acid binding | Nucleus | Spliceosome | 13.87 |
| 24 | Pollination/development | SNARE binding | Cell projection/Pollen tube | — | — |
| 25 | Response to water/chemical | Sucrose synthase activity | Monolayer‐surrounded lipid storage body | Glyoxylate and dicarboxylate metabolism | 16.43 |
| 26 | Chloropalst/plastid organization/embryo development | DNA supercoiling activity | Chloroplast/plastid | — | — |
| 27 | Phosphorylation | Transferase/kinase activity | Cell periphery | Cyanoamino acid metabolism | 16.27 |
| 28 | Protein phosphorylation | Catalytic activity/protein kinase activity/calmodulin binding | Plasma membrane/endosome | Plant‐pathogen interaction | 8.97 |
| 29 | Stomatal development | Transferase activity | Extracellular region | Fatty acid elongation | 37.96 |
| 30 | Response to hypoxia | Transcriptional regulation | Nucleus/CCR4‐NOT complex | Plant‐pathogen interaction | 11.17 |
| 31 | Response to biotic stimulus | Ligand‐gated ion channel | Extracellular region/Apoplast | Tryptophan metabolism | 20.34 |
| 32 | Protein amino acid modification | ADP binding | Extrinsic component of plasma membrane | — | — |
| 33 | Circadian rhythm/post‐embryonic development | Phosphorelay response regulator activity | Lipid droplet/vacuole | Circadian rhythm—plant | 41.07 |
| 34 | Electron transport chain | Chlorophyll binding/cofactor binding | Thylakoid | — | — |
| 35 | Carpel development | RNA polymerase II regulatory region sequence | Nucleus | Glycerolipid metabolism | 14.12 |
| 36 | Immune system response | Calmodulin binding | Plasma membrane | — | — |
| 37 | Lipid metabolic process | Hydrolase activity | Endomembrane system | — | — |
| 38 | Callose deposition/localization | Sucrose synthase activity | Anchored component of plasma membrane | Biotin metabolism | 53.39 |
| 39 | Fatty acid biosynthesis | Fatty acid synthesis | Chloroplast/plastid | Phosphatidylinositol signaling system | 66.56 |
| 40 | Protein phosphorylation | Protein kinase activity | Plasma membrane | Lysine biosynthesis | 53.39 |
| 41 | Amino acid biosynthetic process | Coenzyme binding | Chloroplast | — | — |
| 42 | Vascular/phloem transport | DNA binding transcription | Plasma membrane | Glucosinolate biosynthesis | 106.77 |
| 43 | Glucosinolate biosynthesis | Catalytic activity | Chloroplast | Protein export | 23.73 |
| 44 | Response to endoplasmic reticulum stress | Unfolded protein binding | Endoplasmic reticulum | — | — |
| 45 | Cellulose biosynthesis | Cellulose synthase | Trans‐Golgi network | — | — |
| 46 | Pollen tube development | Microfilament motor activity | Myosin complex | Endocytosis | 32.44 |
| 47 | Regulation of pollen developement | Protein kinase activity | Cell cortex | Thiamine metabolism | 48.81 |
| 48 | Translation/peptide biosynthesis | Structural constituent of ribosome | Plastid ribosome | Mismatch repair | 46.17 |
| 49 | Lactate catabolic process/monocarboxylic acid catabolic process/thiamine diphosphate biosynthetic process | Catalytic/transporter activity | Intracellular part | Starch and sucrose metabolism | 23.98 |
| 50 | DNA repair | Damaged DNA binding/DNA insertion or deletion | Anaphase‐promoting complex | Sulfur relay system | 43.80 |
| 51 | Starch catabolic process | Starch binding | Chloroplast | Inositol phosphate metabolism | 25.95 |
| 52 | RNA modification | Zinc ion binding | Mitochondrion | — | — |
| 53 | RNA splicing/gene expression | mRNA binding | Germ plasm | — | — |
| 54 | Embryonic meristem initiation/phosphorus metabolic process | Transferase/kinase activity | Plasma membrane | — | — |
FIGURE 2Functional analysis of Module 1. (a) Module 1 derived from the main Arabidopsis network showing genes associated with photosynthesis. (b) Five major biological process Gene Ontology (GO) terms derived from module 1. The number following each GO term refers to the number of genes that were found to be significant among the annotated to that category
FIGURE 3Functional analysis of Module 12. (a) Module 12 obtained from the main Arabidopsis network showing genes associated with defense. (b) Seventeen major biological process Gene Ontology (GO) terms derived from module 12. The number following each GO term refers to the number of genes that were found to be significant among the annotated to that category
FIGURE 4Functional analysis of Module 13. (a) Module 13 procured from the main Arabidopsis network showing genes associated with localization. (b) Seventeen major biological process Gene Ontology (GO) terms derived from module 13. The number following each GO term refers to the number of genes that were found to be significant among the annotated to that category
FIGURE 5Functional analysis of Module 16. (a) Module 16 computed from the main Arabidopsis network showing genes associated with biotic stimulus. (b) Tree map representing the most statistically significantly overrepresented Biological Process (BP) GO terms. (c) Eight major biological process GO terms derived from module 16. The number following each GO term refers to the number of genes that were found to be significant among the annotated to that category
FIGURE 6Functional analysis of Module 20. (a) Module 20 constructed from the main Arabidopsis network showing genes associated with stress. (b) Thirteen major biological process Gene Ontology (GO) terms derived from module 20. The number following each GO term refers to the number of genes that were found to be significant among the annotated to that category
Response of module 20 genes to different stresses, hormones, and stimuli
| Drought | Cold | Heat | High light | NaCl | Ozone | Wounding | Incompatible bacterial pathogen | eATP | Fe‐deficiency | Fe‐overload | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Abiotic stresses | 17 (19.31%) | 29 (32.95%) | 9 (10.22%) | 77 (87.5%) | 60 (68.18%) | 45 (51.14%) | 60 (68.18%) | 19 (21.6%) | 26 (29.54%) | 14 (15.9%) | 9 (10.22%) |
Note: Top: Response of module 20 genes to different abiotic and biotic stresses. Bottom: Response of module 20 genes to different hormones, reactive oxygen species, and external ATP. (ABA, abscisic acid; ACC, 1‐aminocyclopropane‐1‐carboxylic acid; SA, salicylic acid; eATP, external ATP).
FIGURE 7Gene specific network of respiratory burst oxidase homolog D (RBOHD). (a) Network represented using the first direct neighbors of RBOHD gene. Genes are color‐coded according to module assigned. (b) Enriched Gene Ontology (GO) terms in gene set constituting the first direct neighbors or RBOHD. The number following each GO term refers to the p‐value. (c) STRING‐DB network constructed out of hypothetical genes interacting with RBOHD. (d) GO terms found to be enriched in sub‐network of hypothetical genes
FIGURE 8Gene specific network of AtNEET. (a) Network represented using the first direct neighbors of AtNEET. Genes are color‐coded according to module assigned. (b) Enriched Gene Ontology (GO) terms in gene set constituting the first direct neighbors or AtNEET. The number following each GO term refers to the p value. (c) STRING‐DB network constructed out of hypothetical genes interacting with AtNEET
FIGURE 9Gene specific network of HSFA1D. (a) Network represented using the first direct neighbors of HSFA1D. Genes are color‐coded according to module assigned. (b) Enriched Gene Ontology (GO) terms in gene set constituting the first direct neighbors or HSFA1D. The number following each GO term refers to the p value. (c) STRING‐DB network constructed out of hypothetical genes interacting with HSFA1D
Enrichment analysis of genes in network modules expressing differentially under high light (HL), heat (HS), and combination of high light and heat (HL + HS) stresses
| Stress | Module | DEG count | Up & down regulated non‐DEG count | Total DEG count | Module gene count | DEG percent in module | DEG Pval percent in module | Enrichment fold change | Enrichment fisher test |
|---|---|---|---|---|---|---|---|---|---|
| HL | 5 | 833 | 21 | 854 | 1,188 | 71.89 | 70.12 | 2.0 | 4.3E−237 |
| 8 | 455 | 1 | 456 | 584 | 78.08 | 77.91 | 2.3 | 2.4E−157 | |
| 12 | 149 | 9 | 158 | 242 | 65.29 | 61.57 | 1.8 | 3.4E−32 | |
| 16 | 71 | 14 | 85 | 124 | 68.55 | 57.26 | 1.7 | 8.6E−14 | |
| 17 | 77 | 0 | 77 | 116 | 66.38 | 66.38 | 1.9 | 3.5E−20 | |
| 20 | 36 | 19 | 55 | 89 | 61.80 | 40.45 | 1.2 | 2.2E−03 | |
| 26 | 61 | 0 | 61 | 69 | 88.41 | 88.41 | 2.6 | 6.3E−28 | |
| 27 | 56 | 2 | 58 | 68 | 85.29 | 82.35 | 2.4 | 1.6E−22 | |
| 32 | 40 | 0 | 40 | 57 | 70.18 | 70.18 | 2.0 | 2.3E−12 | |
| 41 | 35 | 0 | 35 | 41 | 85.37 | 85.37 | 2.5 | 1.6E−15 | |
| 52 | 18 | 5 | 23 | 29 | 79.31 | 62.07 | 1.8 | 3.8E−05 | |
| 53 | 16 | 0 | 16 | 25 | 64.00 | 64.00 | 1.9 | 6.0E−05 | |
| 54 | 18 | 0 | 18 | 25 | 72.00 | 72.00 | 2.1 | 1.6E−06 | |
| HS | 12 | 141 | 7 | 148 | 242 | 61.16 | 58.26 | 1.3 | 8.3E−18 |
| 15 | 97 | 0 | 97 | 144 | 67.36 | 67.36 | 1.5 | 8.7E−19 | |
| 16 | 101 | 7 | 108 | 124 | 87.10 | 81.45 | 1.8 | 1.4E−30 | |
| 20 | 46 | 15 | 61 | 89 | 68.54 | 51.69 | 1.1 | 7.9E−05 | |
| 22 | 40 | 8 | 48 | 79 | 60.76 | 50.63 | 1.1 | 3.8E−04 | |
| 23 | 53 | 0 | 53 | 78 | 67.95 | 67.95 | 1.5 | 4.2E−11 | |
| 27 | 49 | 3 | 52 | 68 | 76.47 | 72.06 | 1.6 | 6.6E−12 | |
| 32 | 37 | 0 | 37 | 57 | 64.91 | 64.91 | 1.4 | 2.1E−07 | |
| 40 | 26 | 0 | 26 | 42 | 61.90 | 61.90 | 1.4 | 7.1E−05 | |
| 44 | 30 | 0 | 30 | 35 | 85.71 | 85.71 | 1.9 | 4.2E−11 | |
| 51 | 19 | 1 | 20 | 29 | 68.97 | 65.52 | 1.4 | 1.8E−04 | |
| 53 | 15 | 0 | 15 | 25 | 60.00 | 60.00 | 1.3 | 3.9E−03 | |
| HL + HS | 4 | 886 | 12 | 898 | 1,417 | 63.37 | 62.53 | 1.2 | 1.0E−104 |
| 8 | 406 | 0 | 406 | 584 | 69.52 | 69.52 | 1.3 | 1.9E−66 | |
| 10 | 294 | 1 | 295 | 447 | 66.00 | 65.77 | 1.2 | 6.2E−41 | |
| 11 | 224 | 0 | 224 | 336 | 66.67 | 66.67 | 1.3 | 8.5E−33 | |
| 12 | 139 | 7 | 146 | 242 | 60.33 | 57.44 | 1.1 | 6.9E−13 | |
| 13 | 145 | 2 | 147 | 241 | 61.00 | 60.17 | 1.1 | 8.1E−16 | |
| 14 | 139 | 0 | 139 | 193 | 72.02 | 72.02 | 1.4 | 7.2E−26 | |
| 16 | 95 | 9 | 104 | 124 | 83.87 | 76.61 | 1.4 | 2.3E−21 | |
| 17 | 72 | 0 | 72 | 116 | 62.07 | 62.07 | 1.2 | 3.1E−09 | |
| 27 | 51 | 6 | 57 | 68 | 83.82 | 75.00 | 1.4 | 2.3E−11 | |
| 28 | 47 | 0 | 47 | 67 | 70.15 | 70.15 | 1.3 | 6.6E−09 | |
| 29 | 43 | 0 | 43 | 63 | 68.25 | 68.25 | 1.3 | 7.5E−08 | |
| 30 | 37 | 1 | 38 | 62 | 61.29 | 59.68 | 1.1 | 7.7E−05 | |
| 32 | 38 | 0 | 38 | 57 | 66.67 | 66.67 | 1.3 | 1.0E−06 | |
| 33 | 47 | 1 | 48 | 55 | 87.27 | 85.45 | 1.6 | 1.1E−14 | |
| 34 | 29 | 5 | 34 | 54 | 62.96 | 53.70 | 1.0 | 6.0E−03 | |
| 39 | 37 | 0 | 37 | 44 | 84.09 | 84.09 | 1.6 | 2.2E−11 | |
| 40 | 33 | 0 | 33 | 42 | 78.57 | 78.57 | 1.5 | 7.8E−09 | |
| 41 | 33 | 0 | 33 | 41 | 80.49 | 80.49 | 1.5 | 2.5E−09 | |
| 44 | 27 | 0 | 27 | 35 | 77.14 | 77.14 | 1.4 | 3.7E−07 | |
| 51 | 27 | 1 | 28 | 29 | 96.55 | 93.10 | 1.7 | 7.2E−11 | |
| 52 | 29 | 0 | 29 | 29 | 100.00 | 100.00 | 1.9 | 4.8E−14 | |
| 53 | 21 | 0 | 21 | 25 | 84.00 | 84.00 | 1.6 | 5.8E−07 |
Note: Differentially expressed genes (DEGs) from a previous study dataset (Balfagón et al., 2019) were mapped onto the gene modules generated using WGCNA. The gene modules that were enriched in genes expressing differentially during stress conditions were thus identified. Modules that are enriched with DEGs were determined by performing Fisher test; modules with p value ≤ .05 were deemed significantly enriched. Furthermore, modules with large majority of genes (over 60%) significantly differentially expressed and otherwise two‐fold or more upregulated or downregulated were identified. Stress: stress under consideration; Module: Modules with over 60% DEGs; DEG Count: Genes that are considered differentially expressing as per the study; Up & Down Regulated Non‐DEG Count: Genes that are 2‐fold or more up/down regulated but had p value > .05; Total DEG Count: DEG Count + Up & Down Regulated Non‐DEG Count; Module Gene Count: Number of genes in the module; DEG Percent In Module: Percentage of genes in Module that are considered as differentially expressing as per the study as well as genes that are 2‐fold or more up/down regulated but has p value > .05; DEG p value Percent In Module: Percentage of genes in Module that are considered as differentially expressing as per the study; Enrichment Fold Change: Fold enrichment of DEGs in a Module; Enrichment Fisher Test: p value generated using Fisher test to indicate the significance of DEG enrichment in a Module.