| Literature DB >> 35837132 |
Kithmee K de Silva1, Jim M Dunwell2, Anushka M Wickramasuriya1.
Abstract
Somatic embryogenesis (SE), which occurs naturally in many plant species, serves as a model to elucidate cellular and molecular mechanisms of embryo patterning in plants. Decoding the regulatory landscape of SE is essential for its further application. Hence, the present study was aimed at employing Weighted Gene Correlation Network Analysis (WGCNA) to construct a gene coexpression network (GCN) for Arabidopsis SE and then identifying highly correlated gene modules to uncover the hub genes associated with SE that may serve as potential molecular targets. A total of 17,059 genes were filtered from a microarray dataset comprising four stages of SE, i.e., stage I (zygotic embryos), stage II (proliferating tissues at 7 days of induction), stage III (proliferating tissues at 14 days of induction), and stage IV (mature somatic embryos). This included 1,711 transcription factors and 445 EMBRYO DEFECTIVE genes. GCN analysis identified a total of 26 gene modules with the module size ranging from 35 to 3,418 genes using a dynamic cut tree algorithm. The module-trait analysis revealed that four, four, seven, and four modules were associated with stages I, II, III, and IV, respectively. Further, we identified a total of 260 hub genes based on the degree of intramodular connectivity. Validation of the hub genes using publicly available expression datasets demonstrated that at least 78 hub genes are potentially associated with embryogenesis; of these, many genes remain functionally uncharacterized thus far. In silico promoter analysis of these genes revealed the presence of cis-acting regulatory elements, "soybean embryo factor 4 (SEF4) binding site," and "E-box" of the napA storage-protein gene of Brassica napus; this suggests that these genes may play important roles in plant embryo development. The present study successfully applied WGCNA to construct a GCN for SE in Arabidopsis and identified hub genes involved in the development of somatic embryos. These hub genes could be used as molecular targets to further elucidate the molecular mechanisms underlying SE in plants.Entities:
Year: 2022 PMID: 35837132 PMCID: PMC9274236 DOI: 10.1155/2022/7471063
Source DB: PubMed Journal: Int J Genomics ISSN: 2314-436X Impact factor: 2.758
Figure 1Hierarchical clustering of somatic embryo transcriptomes based on their Euclidean distance using average linkage clustering (replicates of each stage are labeled as “a” and “b”). (a) Unrooted hierarchical clustering dendrogram (the length between nodes corresponds to the distance between samples). (b) Hierarchical clustering heatmap visualizing the correlations between the samples.
Figure 2Overview of DEGs. (a) Distribution of DEGs between consecutive somatic embryo developmental stages. (b) Number of up- or downregulated DEGs between stages.
Figure 3Construction of the draft GCN for SE. (a) Network topology for different soft-thresholding powers. (b) Module preservation statistics. (c) Hierarchical clustering dendrogram of MEs.
Figure 4Stage-specific gene modules detected by WGCNA. (a) Module-trait relationship heatmap. Each row corresponds to a module, and each column corresponds to a stage. The degree of correlation is illustrated with the colour legend. The numbers in the table correspond to the p value. Modules that are significantly associated with each somatic embryo development stage (|r| > 0.8 and p value ≤0.01) are indicated by an asterisk. (b) Gene significance values of coexpression modules related to different somatic embryo developmental stages.
Figure 5Functional enrichment analysis of the stage-specific modules. (a)–(d) represent the significantly enriched GO terms (p value <0.05) in modules specifically associated with stages I–IV, respectively.
Top 10 hub genes ordered by the degree of connectivity.
| Gene identifier | Degree of connectivity | Gene module | Gene name | Description |
|---|---|---|---|---|
| AT1G27120 | 3327 | Turquoise |
| Galactosyltransferase family protein |
| AT5G52820 | 2853 | Blue |
| WD-40 repeat family protein/notchless protein |
| AT5G56090 | 2348 | Brown |
| Encodes a homolog of COX15 |
| AT2G43100 | 2134 | Yellow |
| Isopropylmalate isomerase 2 |
| AT1G71010 | 1958 | Green |
| Encodes a protein that is predicted to act as a phosphatidylinositol-3P 5-kinase but lacks a FYVE domain |
| AT2G29890 | 806 | Red |
| Encodes a ubiquitously expressed villin-like protein |
| AT2G45600 | 569 | Black |
| Alpha/beta-hydrolases superfamily protein |
| AT1G72400 | 321 | Magenta |
| Hypothetical protein |
| AT3G53980 | 311 | Pink |
| Bifunctional inhibitor/lipid-transfer protein/seed storage 2S albumin superfamily protein |
| AT5G65350 | 243 | Purple |
| Histone 3 11 |
| AT5G27560 | 182 | Green-yellow |
| DUF1995 domain protein, putative (DUF1995) |
| AT5G54855 | 170 | Tan |
| Pollen Ole e 1 allergen and extensin family protein |
| AT1G75630 | 156 | Salmon |
| Vacuolar H+-pumping ATPase 16 kD proteolipid (ava-p) mRNA |
| AT1G74450 | 107 | Cyan |
| BPS1-like protein (DUF793) |
| AT2G23940 | 107 | Midnight-blue |
| Transmembrane protein (DUF788) |
| AT1G30460 | 82 | Light-cyan |
| Encodes AtCPSF30, the 30-KDa subunit of cleavage and polyadenylation specificity factor |
| AT1G06040 | 74 | Grey60 |
| B-box zinc finger family protein that encodes a salt tolerance protein |
| AT2G11560 | 58 | Light-green |
| Mutator-like transposase/similar to MURA transposase of maize |
| AT3G55050 | 50 | Dark-green |
| Protein phosphatase 2C family protein |
| ATCG01070 | 48 | Dark-red |
| NADH dehydrogenase ND4L |
| AT3G25950 | 48 | Royal-blue |
| TRAM, LAG1, and CLN8 (TLC) lipid-sensing domain containing protein |
| AT1G65410 | 39 | Dark-turquoise |
| Encodes a member of NAP subfamily of transporters |
| AT5G02310 | 38 | Light-yellow |
| Encodes a component of the N-end rule pathway that targets protein degradation |
| AT3G61130 | 37 | Dark-grey |
| Encodes a protein with putative galacturonosyltransferase activity |
| AT3G53350 | 34 | Dark-orange |
| Encodes a microtubule-binding protein |
| AT5G43490 | 34 | Orange |
| Myb-like protein X |
Conserved motifs identified in the promoter regions of hub genes using the MEME tool.
| Motif | E-value | Motif width | Sites | Significant GO enriched terms | |
|---|---|---|---|---|---|
| 1 | NAVAAAAAAARAAARARAAARAAAAHMAA | 2.8e-077 | 29 | 229 | (i) GO:0042023: DNA endoreduplication |
| Consensus logo: | |||||
|
| |||||
| 2 | DTTTTTKTTTTKTTY | 6.0e-020 | 15 | 245 | |
| Consensus logo: | |||||
|
| |||||
| 3 | NDRRAGDDDRRWARRRARAGAADRRDAG | 3.6e-025 | 28 | 121 | (i) GO:0009944: Polarity specification of adaxial/abaxial axis |
| Consensus logo: | |||||
|
| |||||
| 4 | CMAYCTYCTCCDTCHBCATC | 5.3e-007 | 20 | 44 | |
| Consensus logo: | |||||
Figure 6Expression patterns of two hub genes, AT1G19540 (a) and AT5G44380 (b), when viewed through the Arabidopsis eFP browser. The normalized expression value for each gene is colour-coded as indicated by the legend.
Figure 7Transcript abundance extracted from the somatic embryo transcriptome dataset, E-MTAB-2465 for the hub genes. The hub genes significantly upregulated (log2 FC ≥ 2.0) and downregulated (log2 FC ≤ −2.0) in somatic embryonic tissues compared to leaf tissues are indicated with yellow asterisks and red diamonds, respectively. Transcript abundances are shown in fragments per kilobase of transcript per million mapped reads (FPKM).
Figure 8Venn diagram indicating the intersection of hub genes and DEGs (|log2 FC| ≥ 2.0 and p value <0.05) obtained from E-MTAB-2465 [24] and GSE48915 [37].
Figure 9The distribution of several important plant CREs present in the promoter regions of functionally uncharacterized hub genes. The number of hub genes that contain the relevant CRE in their promoter region is indicated by the x-axis. The size of the circle depicts the occurrence of the CREs within the promoter regions of the hub genes (as indicated within the circle).
Functional roles of several important CREs detected in the functionally uncharacterized hub gene promoter sequences retrieved from the PLACE database [46].
|
| Function∗ |
|---|---|
| ABADESI1 | “ACGT” motif; transacting factor: TAF-1; responsive to ABA and desiccation. Expressed in seeds late during embryogenesis. Induced by ABA and osmotic stress in vegetative tissues. |
| CACGTGMOTIF | “CACGTG motif”; essential for expression of beta-phaseolin gene during embryogenesis |
| CANBNNAPA | Core of “(CA)n element” in storage protein genes; embryo- and endosperm-specific transcription of napin (storage protein) gene |
| CARGNCAT | Noncanonical CArG motif (CC-Wx8-GG); A relevant |
| DPBFCOREDCDC3 | DPBF-1 and 2 (Dc3 promoter-binding factor-1 and 2) binding core sequence; Dc3 expression is embryo-specific and induced by ABA |
| DRE1COREZMRAB17 | “DRE1” core found in maize (Z.M.) rab17 gene promoter; “DRE1” was protected, in in vivo footprinting, by a protein in embryos specifically, but in leaves, was protected when was treated with ABA and drought; rab17 is expressed during late embryogenesis and is induced by ABA |
| DRE2COREZMRAB17 | “DRE2”; core sequence in rab17 gene promoter. rab17 is expressed during late embryogenesis and is induced by ABA |
| EBOXBNNAPA | “E-box” of napA storage-protein gene |
| PYRIMIDINEBOXOSRAMY1A | Found in the promoter of alpha-amylase (Amy2/32b) gene which is induced in the aleurone layers in response to GA in embryo |
| RYREPEATVFLEB4 | RY repeat motif; quantitative seed expression; binding site of |
| SEF1MOTIF | “SEF1 (soybean embryo factor 1)” binding motif; regulates the expression of genes encoding for the beta-conglycinin seed storage proteins |
| SEF3MOTIFGM | “SEF3 binding site”; regulates the expression of genes encoding for the beta-conglycinin seed storage proteins |
| SEF4MOTIFGM7S | “SEF4 (soybean embryo factor 4)” binding motif; regulates the expression of genes encoding for the beta-conglycinin seed storage proteins |
| TATABOX2 | “TATA box”; TATA box found in beta-phaseolin promoter which is accurate transcription initiation in the embryo stage |
∗Details of PLACE entries were retrieved from the https://www.dna.affrc.go.jp/place/place_seq.shtml (accessed on 19th May 2022).
Distribution of SE markers across network modules ordered by the number of interactors.
| Gene identifier | Module | Gene name | No. of interactors | |
|---|---|---|---|---|
| 1 | AT3G26790 | Turquoise |
| 3346 |
| 2 | AT5G13790 | Turquoise |
| 3308 |
| 3 | AT1G21970 | Turquoise |
| 3297 |
| 4 | AT5G45980 | Turquoise |
| 3158 |
| 5 | AT3G24650 | Turquoise |
| 3115 |
| 6 | AT1G78080 | Brown |
| 2111 |
| 7 | AT5G57390 | Turquoise |
| 752 |
| 8 | AT4G37750 | Red |
| 688 |
| 9 | AT1G63470 | Red |
| 522 |
| 10 | AT5G65510 | Purple |
| 216 |
Figure 10Distribution of EMB genes across gene modules. The coloured bars represent the ratio between the number of EMB genes in each module and the total number of EMB genes in the network.
Figure 11Distribution of TFs in SE. (a) Overall distribution of TFs. The percentage is calculated as the ratio of TFs belonging to each family with respect to the total number of TFs in the network. (b) Distribution of SE-related TFs across different somatic embryo developmental stages. The percentage is calculated for each stage as the ratio of TFs present in each family with respect to the total number of TFs in the network.
Figure 12Distribution of genes encoding epigenetic regulators across the network modules.