Literature DB >> 26504546

Construction and validation of a gene co-expression network in grapevine (Vitis vinifera. L.).

Ying-Hai Liang¹, Bin Cai², Fei Chen², Gang Wang², Min Wang², Yan Zhong², Zong-Ming Max Cheng³.

Abstract

Gene co-expression analysis has been widely used for predicting gene functions because genes within modules of a co-expression network may be involved in similar biological processes and exhibit similar biological functions. To detect gene relationships in the grapevine genome, we constructed a grapevine gene co-expression network (GGCN) by compiling a total of 374 publically available grapevine microarray datasets. The GGCN consisted of 557 modules containing a total of 3834 nodes with 13 479 edges. The functions of the subnetwork modules were inferred by Gene ontology (GO) enrichment analysis. In 127 of the 557 modules containing two or more GO terms, 38 modules exhibited the most significantly enriched GO terms, including 'protein catabolism process', 'photosynthesis', 'cell biosynthesis process', 'biosynthesis of plant cell wall', 'stress response' and other important biological processes. The 'response to heat' GO term was highly represented in module 17, which is composed of many heat shock proteins. To further determine the potential functions of genes in module 17, we performed a Pearson correlation coefficient test, analyzed orthologous relationships with Arabidopsis genes and established gene expression correlations with real-time quantitative reverse transcriptase PCR (qRT-PCR). Our results indicated that many genes in module 17 were upregulated during the heat shock and recovery processes and downregulated in response to low temperature. Furthermore, two putative genes, Vit_07s0185g00040 and Vit_02s0025g04060, were highly expressed in response to heat shock and recovery. This study provides insight into GGCN gene modules and offers important references for gene functions and the discovery of new genes at the module level.

Entities: Chemical Disease Gene Species

Year: 2014 PMID： 26504546 PMCID： PMC4596334 DOI： 10.1038/hortres.2014.40

Source DB: PubMed Journal: Hortic Res ISSN： 2052-7276 Impact factor: 6.793

Introduction

The rapid accumulation of genome sequences and high-throughput microarray data provides rich materials for research on gene function and regulation at the system level.[1] However, integrating and exploiting these data sets has been challenging. Biological networks constructed by bioinformatic methods can help ‘put the function in genomics,[2] and allow researchers to understand how biomolecules interact with one another at the system level to perform specific biological functions in living plant cells.[3,4] The molecular interaction network is a type of biological network in which a node represents a gene, gene product or metabolite, and a link or edge refers to an interaction between them.[4] A gene co-expression network, in which nodes and links represent genes and indicate their co-expression relationships, can characterize such topological properties as small-world, hierarchically modular and scale-free.[5] A gene co-expression network can be divided into several substructures, including motifs, modules and pathways. Its substructure exhibits topological properties described by specific terms, such as network density, degree distribution, clustering coefficient and betweenness.[3] Co-expression network analysis is a powerful method to extract functional modules of co-expressed genes, analyze their biological meanings and identify important novel genes. In recent studies, several plant gene co-expression networks have been built and many functional modules have been inferred or identified.[6-13] For instance, Mao and colleagues[7] constructed an Arabidopsis gene-expression network and identified many functional modules associated with photosynthesis, protein biosynthesis, cell cycle, defense response and others, and these modules revealed new insights into gene function organization. The expression of genes related to the same metabolic function may show co-expression patterns.[14] Wang and colleagues employed co-expression network analysis to identify related cell wall genes in Arabidopsis.[11] Gene modules were extracted in response to drought in rice by network-based analysis, and many hub genes clustered in some rice chromosomes have been found to significantly associate with quantitative trait loci (QTLs) for drought tolerance.[12] Microarray datasets and genome sequences provide an excellent opportunity to understand gene relationships and biological functions in the grapevine.[15,16] In this report, we constructed a GGCN by using 374 high quality microarrays (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL1320). Qcut,[17] a graph portioning algorithm, was applied to identify subnetwork modules from the gene co-expression network. The functions represented by the extracted modules were evaluated by GO enrichment analysis.[18] Next, we validated module 17 by examining gene expression by qRT-PCR and inferred that two putative uncharacterized proteins might be potentially related to heat stress.

Materials and methods

Raw expression data

The grapevine microarray data set for the construction of the co-expression network was obtained from Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL1320) (platform accession number GPL1320). The platform consists of experimental samples using Affymetrix GeneChip Grapevine Genome Array. A total of 374 CEL files of samples from platform GPL1320 were used to construct the network and involved three treatment types (biotic stress, development, abiotic stress) and 13 series. The grapevine and Arabidopsis genome sequences were downloaded from Phytozome (http://www.phytozome.net).[15]

Annotation of probe sets and homolog search

A total of 16 436 probe sets from the Affymetrix Grapevine GeneChip were mapped to the grapevine gene loci in CRIBI (http://genomes.cribi.unipd.it/) using BlastN. If more than six probes from the set aligned perfectly to a gene, the probe set was assigned to that gene. Arabidopsis protein sequences and gene information were obtained from the Arabidopsis Information Resource release 10 (http://www.arabidopsis.org/). Grapevine protein sequences were used to search complete Arabidopsis protein sequences using BlastP with an e-value cutoff of 1e−4, and the best hits were selected as Arabidopsis orthologs.

Construction of GGCN

The construction of a gene co-expression network involves the measuring gene expression similarity, visualizing gene expression data, and identifying modular structures. To measure the similarity of gene expression, we utilized the Pearson correlation coefficient (PCC) between pairwise genes. The 374 arrays from Gene Expression Omnibus were normalized by the justRMA function in R/BioConductor.[19] Gene co-expression data were calculated in ATTED-II and applied to the PCC calculation (http://atted.jp/help/coex_cal.shtml). To determine the PCC cutoff threshold for network construction, the numbers of probe sets, edges, and network density (ND) were calculated along with the PCC cutoffs. The network density was calculated according to where m was the observed number of edges in the network and n was the number of nodes in the network. Co-expressed genes are selected at a certain PCC cutoff threshold, and a co-expression network was constructed and visualized by Cytoscape software[20] (http://www.cytoscape.org/). The algorithm Qcut, which identifies statistically significant graph partitions in a biological network,[17] was applied to identify sub-network modules from the co-expression network (http://www.mybiosoftware.com/pathway-analysis/12211).

GO enrichment analysis of modules in GGCN

GO annotations of grapevine genes were downloaded from agriGO (http://bioinfo.cau.edu.cn/agriGO/download.php). The GO enrichment was performed within each module using BiNGO 2.4.[18] The statistical significance of GO term enrichment was measured by a hypergeometric test[21] using the genes in a whole co-expression network as the back ground. A Bonferroni correction[22] was used to control the false positive rate in the multiple testing problems, and a GO term in a module was considered significantly enriched in the given module if the family-wise error rate (FWER) corrected p value was less than 0.05.

Validation of expression genes in module 17 by qRT-PCR

Pinot Noir PN40024 (the genotype deriving the reference genome sequence) was subcultured in vitro on 3/4 Murashige and Skoog medium[23] at 22 °C with a 16-h/8-h photoperiod and an illumination intensity of 150 μmol m−2 s−1 for 6 weeks. Young leaves, including second and third expanding leaves, were sampled for gene expression analysis. To analyze the response of module 17 genes to continuous heat shock stress, whole plants were treated at 40 °C for 0.5, 1, 2, 3 or 6 h in the plant growth chamber. Meanwhile, to analyze the heat shock recovery response, a fraction of the plants that were heat-shocked for 1 h was placed under the original temperature (22 °C) for 2 h and 5 h (the third hour or sixth hour from the beginning of heat shock). The plants without heat shock treatment were used as the controls and handled in an identical manner. To analyze their responses to low temperature, a set of plants was placed in a plant growth chamber at 4 °C for 1 h. All the plant samples were then frozen in liquid nitrogen before total RNA extraction and first strand cDNA synthesis by the reported method.[24] We designed 29 pairs of oligonucleotide primers (Supplementary Table 1) in module 17 with Primer 5.0 (http://www.premierbiosoft.com/crm/jsp/com/pbi/crm/clientside/ProductList.jsp) according to the putative cDNA sequences of the grapevine genome. PCR amplification was carried out in a 25 μL reaction solution consisting of 20 ng template cDNA, 2.0 mM MgCl2, 2.5 μL 10× PCR buffer, 200 μM dNTP, 0.2 pM of each primer and 0.25 U Taq DNA polymerase. To validate the specificity of PCR products, the amplicons were cloned into a pMD19-T vector (Takara, Dalian, China), sequenced at Shanghai Invitrogen Biotechnology Co., Ltd (2715 Longwu Road, Shanghai 200231, China) according to the protocol[24] and aligned onto the grapevine reference genome. The qRT-PCR oligonucleotide primers (Table 1) targeting the expressed grapevine genes in module 17 (response to environmental stress) were designed with Beacon Designer 7.0 (http://www.premierbiosoft.com/molecular_beacons/). Because of high homology and some unknown gene information, all primers were blasted against the grapevine reference genome sequences. Each primer differs from non-target genes by at least three nucleotides, and at least one nucleotide at the 3′-end.[25]

Table 1

qRT-PCR primer sequences of genes in module 17

Gene number	Grapevine gene	Forward primers (5′ to 3′)	Reverse primers (5′ to 3′)
1	Vit_10s0003g00260	TCAACATCAAGTTTCCAACAAGG	ACAGTCGCACATCATTAGCC
2	Vit_07s0185g00040	AGGATGCGAGAGGATGAGAC	ACAAGAGAAACACCAGACAAGG
3	Vit_13s0019g03160	AGTTCCTTCGTCGGTTCAG	GCCTTCACCTCAGCCTTC
4	Vit_18s0041g01230	GTCAACAACCCAAACTATCAAGG	GCACCATCATATCATATACACTCC
5	Vit_02s0025g04060	TTGATAGTATGTCTGAGTTATGGAG	CCTTGGGTGTGAAACAAATGG
6	Vit_04s0008g01590	TTGAGGTGAAGGTTGCTTGAG	CATACTGACTTGGGAGACATCG
7	Vit_06s0004g04470	CATAAGAAGGATATTAGCGGAAGT	GTTGTGTAGAAATCAATACCATCGA
9	Vit_16s0050g01150	GACCTTGTGATGCTCCTATATG	ATCTTGCTCTCCTCATTGCC
11	Vit_01s0010g02290	GTATGACCAAGGATGATGTGAAG	ACTCCATCTTTGACCTCTGC
12	Vit_16s0098g01060	TGGAGGATGACTTGCTTGTG	CTCTACCTTGGTCTTAGGAATGG
13	Vit_11s0016g04080	GTGAACAAGGCTATCCGGTC	TCATCTTCTTCTCCAACCTCG
14	Vit_07s0005g01980	GGGGTTTGTCACGGTTAG	GTATGACTGGAAGTAATTTGCC
15	Vit_17s0000g07190	TAGATGCGGGAGTGTCAGG	CCTCTTCGTCTTCTATTTCTTCG
19	Vit_19s0085g01050	GAGTTCAAGAGTCAAGACACAG	ACCTCCAGTTTCACCTCATTC
20	Vit_06s0004g06010	GCTATTATAGAAGGCGGCATTAC	GACCCAGGAGTGAGAGACC
22	Vit_13s0019g00860	AAGGTGGAGATAGAAGATGGAAAC	TGGAACAACGATGGTGAGAAC
23	Vit_08s0007g00130	GATTGAGGATGCCATTGAGC	TCTTTGCTATGATGGGGTTG
24	Vit_16s0022g00510	AGATACAGCAGCAGAATTGATTTG	TCAGTCCTCTCCTCTTCCTTCAG
26	Vit_06s0004g05770	GTTCTTACTGTTACTGTTCCTAAGAAG	CGCTGATATATGATATGATGGTCTC

There were 41 nodes (probes) in module 17. Among them, 29 probes were matched with grapevine genes annotated by CRIBI Genomics, University of Padua (http://genomes.cribi.unipd.it/). However, the genes numbered 8, 10, 16, 17, 18, 21, 25, 27, 28 and 29 in module 17 did not express in response to heat shock or cold treatment stress and were therefore not cloned (listed in Table 1).

The qRT-PCR reaction was carried out in a 20 μL reaction solution consisting of 10 μL SYBR (Takara), 8.7 μL ddH2O, 1 μL cDNA diluted 10-fold and 0.15 μL of each specific primer. qRT-PCR amplifications were performed with the following procedure: 94 °C for 4 min and 40 cycles of 94 °C for 20 s, 60 °C for 20 s and 72 °C for 43 s. The qRT-PCR data were analyzed as previously described.[25] Each treatment data point represents three biological replicates (individual plants) with three technical replicates each. The actin-101-like gene (VIT_12S0178g00200) was used as an internal reference. The expression ratio was calculated by the formula , as previously described.[16,25]

Goodness of fit test of gene expression in module 17

To test the goodness of fit of all gene expression values between each two time points treated with heat shock and recovery, we employed ‘LOESS’, locally weighted scatterplot smoothing,[26] and ‘Linear’, a unitary linear regression, to add a fit line and calculate R2, the coefficient of determination,[27] with SPSS 19.0 software.[28] Firstly, a matrix scatter was created between the variables ‘gene expression value’ and ‘treatment time point’ following the steps Graphs→Legacy Dialogs→Scatter/Dot→Matrix Scatter. Next, a fit line was added in the matrix scatterplot by ‘LOESS’ with parameters 95% individual confidence intervals, 30% percentage of points to fit and Epanechnikov kernel function. Secondly, ‘Linear’ was performed with 95% individual confidence intervals following the steps Graphs→Legacy Dialogs→Scatter/Dot→Matrix Scatter→Linear. R2 between the dependent and independent variables ‘gene expression value’ and ‘treatment time point’ in the linear regression were obtained for goodness of fit analysis.[27,28]

Results

The raw microarray data could be divided into the following three categories: biotic stress, development, and abiotic stress. The array accession and the experiment conditions are listed in Table 2. After normalization of gene expression values, the PCC was calculated between each pair within the 16,436 genes. An appropriate PCC cutoff value is necessary to construct a co-expression network. Figure 1 reveals a negative correlation between the network density and PCC cutoff values. At approximately 0.78, the network density approached the minimal value and then increased gradually. The PCC cutoff value of 0.78 was then chosen to screen significant co-expression correlation from a large-scale expression data set (Figure 1). At the PCC cutoff value of 0.78, the network contained 3834 nodes (probe sets) with 13 479 edges (Figure 2 and Supplementary Table 2) and a network density of 0.001856078. The GGCN view was created by the Cytoscape software package.[20]

Table 2

Microarray data used to construct the grapevine co-expression network

Condition	Series ID	Number of gene chips	Experimental conditions
Biotic stress	GSE6404	72	Erysiphe necator conidiospores infection
	GSE11857	12	Downy mildew infection
	GSE12842	10	Bois noir infection
	GSE31660	14	Viral diseases in berry
Development	GSE31674	27	Berry transcriptome during ripening
	GSE31664	12	Skin transcriptome in the berries
	GSE31662	8	Grape skin transcriptome in the berries
	GSE11406	32	Berries during ripening initiation
	GSE17502	84	Photoperiod regulation of bud dormancy
Abiotic stress	GSE31677	39	Salt and water stress
	GSE31675	12	High temperature
	GSE31594	48	Short term abiotic stress
	GSE27180	4	Micropropagated plants were transferred to ex vitro conditions

Figure 1

Relationship between network densities and PCC cutoff values.

Figure 2

The co-expression network of grapevine genes. A red dot represents a node, and a blue line connecting two nodes represents an edge.

Modules in GGCN

In the 3834 nodes, a partitioning analysis was performed to detect 557 modules with a Q value of 0.78, demonstrating a strong modular structure. The modular structure, one of the important features of the biological network, indicates the interaction of biomolecules at the system level. However, all modules in the GGCN were completely independent and represented by different sizes (Figure 2 and Supplementary Table 2). For instance, the two largest modules, module 1 and module 2, each contained 312 nodes in their network, but with 1521 and 2284 edges, respectively, and the smallest modules had only two nodes (Supplementary Table 2). BiNGO 2.4,[18] a Cytoscape plugin, was used to perform GO term enrichment analysis of biological processes. A total of 127 modules that contained more than two nodes were analyzed using the 1256 probes with a biological process GO term as the custom reference set. As a result, 15 modules were identified with significantly over-represented GO terms with a FWER-adjusted p<0.01 from the hypergeometric test.[21] Table 3 lists the most significantly enriched functional categories and the GO term number in a module and in the grapevine gene co-expression network. Because the biotic or abiotic stress response and its regulation are important biological processes in plants, we highlight the details of one interesting module here, module 17, which responds to environmental stresses Figure 3 and Table 4.

Table 3

Significantly enriched GO terms in 38 modules

Module	GO term description	GO term	p value
1	Protein catabolic process	13/30	2.1×10⁻⁵
2	Ribonucleoprotein complex biogenesis	152/207	3.0×10⁻⁹⁰
3	Photosynthesis	54/69	1.0×10⁻⁴⁰
4	Cellular amine metabolic process	18/82	2.6×10⁻²
5	Response to salicylic acid stimulus	5/8	2.1×10⁻⁴
7	Carbohydrate metabolic process	18/102	2.4×10⁻⁵
11	DNA metabolic process	21/40	5.7×10⁻¹⁹
12	ATP synthesis coupled electron transport	9/16	1.5×10⁻⁸
15	Cellular biosynthetic process	34/408	4.4×10⁻⁷
17	Response to heat	11/31	3.5×10⁻¹⁰
20	Plant-type cell wall biogenesis	6/7	1.5×10⁻⁹
24	Response to auxin stimulus	3/10	2.8×10⁻²
25	Phenylpropanoid biosynthetic process	9/28	6.7×10⁻¹¹
26	ATP metabolic process	5/14	1.6×10⁻⁵
29	Protein folding	6/57	1.0×10⁻⁵
30	Lipid transport	3/14	2.1×10⁻²
31	Flavonoid biosynthetic process	6/8	6.2×10⁻¹¹
34	Response to wounding	3/10	3.5×10⁻⁵
35	Carboxylic acid metabolic process	6/141	3.4×10⁻⁴
36	Response to biotic stimulus	5/37	6.1×10⁻⁶
37	Protein ubiquitination	2/14	5.9×10⁻³
38	Acyl-carrier-protein biosynthetic process	4/25	1.1×10⁻⁴
42	Metal ion transport	3/18	9.9×10⁻⁵
48	Modification-dependent protein catabolic process	4/24	2.1×10⁻⁶
51	Nucleic acid metabolic process	4/96	2.5×10⁻³
57	Cell redox homeostasis	3/15	1.3×10⁻⁴
75	Fatty acid biosynthetic process	3/21	8.9×10⁻⁵
79	Water homeostasis	1/1	2.1×10⁻²
83	One-carbon metabolic process	3/9	7.9×10⁻⁶
87	Xylulose metabolic process	1/1	3.6×10⁻²
96	Regulation of cell cycle	2/6	1.6×10⁻³
101	Nucleosome assembly	2/25	4.6×10⁻²
105	D-xylose metabolic process	3/3	9.1×10⁻⁸
107	Oligosaccharide metabolic process	2/29	3.4×10⁻²
112	Ketone biosynthetic process	3/13	3.1×10⁻⁵
115	Chitin catabolic process	3/9	5.1×10⁻⁶
124	Lipid transport	3/14	1.8×10⁻⁵
139	Response to chlorate	3/3	5.5×10⁻⁸

A GO term indicates numerical values of the same GO term in one module and the grapevine gene co-expression network.

Figure 3

The fraction of module 17 enriched with the GO term ‘in response to heat stress’. Red circles represent nodes, the blue lines represent edges, and the numbers in the red circles represent gene chip probes.

Table 4

Gene ontology enrichment analysis in module 17

GO ID	p value (FWER corrected)	Number of GO terms in module 17 in⁻¹ GGCN	Description
6950	4.0537×10⁻¹⁸	26/183	Response to stress
50896	1.0848×10⁻¹³	26/267	Response to stimulus
9408	3.5017×10⁻¹⁰	11/31	Response to heat
9266	4.5005×10⁻⁸	11/46	Response to temperature stimulus
9644	3.2480×10⁻⁷	6/9	Response to high light intensity
9642	3.4062×10⁻⁶	6/12	Response to light intensity
9628	9.9960×10⁻⁶	12/92	Response to abiotic stimulus
42542	1.7589×10⁻⁵	6/15	Response to hydrogen peroxide
10035	2.7093×10⁻⁵	7/25	Response to inorganic substance
302	1.2576×10⁻⁴	20/29	Response to reactive oxygen species
6979	3.4874×10⁻³	6/34	Response to oxidative stress
9416	6.7133×10⁻³	6/38	Response to light stimulus
9314	6.7133×10⁻³	6/38	Response to radiation
6986	2.3696×10⁻²	2/2	Response to unfolded protein
43335	2.3696×10⁻²	2/2	Protein unfolding
35966	2.3696×10⁻²	2/2	Response to topologically incorrect protein

Module 17, a module in response to environmental stresses

We examined one module, module 17, in detail because we are interested in stress responses, as module 17 was found to be enriched with GO terms relating to environment stresses. Module 17 contained 41 nodes (genes) and 89 edges and was significantly enriched with 16 GO terms (p<2.3696×10–2) (Figure 3 and Table 4). The over-expressed GO terms include ‘response to stimulus’, ‘response to high light intensity’, ‘response to abiotic stimulus’, ‘response to oxidative stress’, ‘response to hydrogen peroxide’ and particularly ‘response to heat’ (GO: 0009408) (p=3.5017×10−10). A total of 19 genes in module 17 encode for heat shock proteins (HSPs), including members of the HSP20, HSP40, HSP70, HSP90 and HSP100 families (Table 5).

Table 5

Homologous genes between 29 grapevine genes in module 17 and those in Arabidopsis thaliana

Gene number	Grapevine gene	Probe number	Homologs in Arabidopsis thaliana	Information of gene classification and function
1	Vit_10s0003g00260	1616811_at	AT2G20560	DNAJ heat shock protein
2	Vit_07s0185g00040	1621759_s_at	AT3G07150	Unknown protein
3	Vit_13s0019g03160	1616145_a_at	AT1G53540	HSP17.6C-CI
4	Vit_18s0041g01230	1616369_at	AT5G49910	Chloroplast HSP70−2; ATP binding
5	Vit_02s0025g04060	1611927_at	AT4G11740	Unknown protein
6	Vit_04s0008g01590	1611192_at	AT5G12020	HSP17.6II
7	Vit_06s0004g04470	1621357_s_at	AT5G02500	HSC70−1; ATP binding
8	Vit_04s0008g01490	1614330_at	AT5G12020	HSP17.6II
9	Vit_16s0050g01150	1618066_a_at	AT5G52640	HSP90.1; ATP binding
10	Vit_08s0007g00740	1613948_at	AT3G09350	Armadillo/beta-catenin repeat family protein
11	Vit_01s0010g02290	1608828_at	AT4G27670	HSP21
12	Vit_16s0098g01060	1620985_at	AT4G27670	HSP21
13	Vit_11s0016g04080	1621552_at	AT3G24500	MBF1C
14	Vit_07s0005g01980	1609808_at	AT2G47180	GolS1
15	Vit_17s0000g07190	1615503_at	AT1G74310	HSP101; ATP binding
16	Vit_17s0000g00070	1611931_at	AT5G07330	Unknown protein
17	Vit_13s0047g00110	1606746_a_at	AT4G02450	Glycine-rich protein
18	Vit_11s0078g00260	1608348_a_at	AT5G35320	Unknown protein
19	Vit_19s0085g01050	1616538_at	AT1G53540	HSP17.6C-CI
20	Vit_06s0004g06010	1615761_at	AT1G07350	Arginine-rich ribonucleoprotein
21	Vit_05s0020g03330	1621709_at	AT2G32120	HSP70T−2; ATP binding
22	Vit_13s0019g00860	1622489_at	AT5G37670	HSP15.7−CI
23	Vit_08s0007g00130	1609949_at	AT3G12580	HSP70; ATP binding
24	Vit_16s0022g00510	1616889_at	AT4G25200	Mitochondrion-localized HSP23.6
25	Vit_08s0217g00090	1611195_at	AT3G08970	Endoplasmic reticulum-localized J protein
26	Vit_06s0004g05770	1621652_at	AT1G07400	HSP17.8−CI
27	Vit_02s0154g00480	1620348_at	AT4G25200	Mitochondrion-localized HSP23.6
28	Vit_12s0035g01910	1613858_at	AT4G10250	HSP22.0
29	Vit_18s0089g01270	1609222_at	AT4G10250	HSP22.0

Module 17 contains 41 nodes (probes). Among them, 12 probe sets were not matched with grapevine genes annotated by CRIBI Genomics, University of Padua (http://genomes.cribi.unipd.it/) (listed in Supplementary Table 2). These probe sets were 1609554_at, 1615503_at, 1607291_at, 1610779_at, 1613154_at, 1622489_at, 1616706_at, 1611195_at, 1621902_at, 1610122_at, 1616049_at and 1618545_a_at. Therefore, 29 grapevine genes are listed in this table.

Plants respond to various stresses in a similar manner—by producing HSPs that protect cells against many stresses.[29] The accumulation of HSPs plays a key role in acquired heat tolerance during heat stress.[30] MBF1C (Vit_11s0016g04080) is an important transcription factor that responds to stresses,[31] and as a key regulator of heat tolerance in Arabidopsis thaliana, the MBF1C protein accumulates rapidly during heat stress. The inositol galactoside (GolS2) enzyme (Vit_07s0005g01980) is a key synthase that regulates the drought and cold responses.[32] Liu et al.[33] inferred that galactinol synthase may be important for grapevine heat tolerance. The endoplasmic reticulum-localized J protein Vit_08s0217g00090 is an important molecular chaperone of HSP70.[34] In addition, four putative uncharacterized proteins in module 17, Vit_07s0185g00040, Vit_02s0025g04060, Vit_17s0000g00070 and Vit_11s0078g00260, are clearly interrelated to other nodes and edges involved in the stress response, but no information about their domain and homologous alignments is available. Therefore, we considered these four putative genes to have unknown functions in the stress response.

Expression patterns of genes in module 17 at different time points after heat shock and recovery

We tested module 17 in response to heat shock, one environmental stress. When grapevine plants were treated with heat shock at 40 °C for 6 h, 19 of 29 genes in module 17 were upregulated and their expression quantities exhibited variable regulation from low-level to high-level, ranging from 1.86- to 11.63-fold (Figure 4a−4e). However, some gene expression quantities maintained a high level from 0.5 h to 6 h, ranging from 6.85- to 11.63-fold (p<0.01). These included Vit_13s0019g03160, Vit_04s0008g01590, Vit_16s0098g01060, Vit_07s0005g01980 and Vit_19s0085g01050, which encode HSP17.6, HSP17.6, HSP21, galactinol synthase 1 and HSP17.6, respectively, in which galactinol synthase 1 (GolS1) is a heat shock factor target gene responsible for the heat-induced synthesis of the raffinose family of oligosaccharides in Arabidopsis.[35]

Figure 4

Gene expression patterns in module 17 treated with heat shock and recovery at different time points. a–e: heat shock for 0.5, 1, 2, 3 and 6 h, respectively. f–g: heat shock recovery for 2 and 5 h after plants were treated at 40 °C for 1 h, respectively. The value in the Y-axis is −ΔΔCt. The expression ratio of a gene was considered significant if *p<0.05. Expression ratio of genes was significant if **p<0.01. The numbers from 1 to 26 on the X-axis represent the grapevine genes listed under ‘gene number and grapevine gene’ in Table 1.

Moreover, 12 of 19 genes were still upregulated significantly (p<0.01) after 2 h and 5 h of recovery. After 2 h of recovery, 6 of 19 genes were downregulated significantly up to 3.02-fold (p<0.01) (Figure 4f), including Vit_08s0007g00130, Vit_16s0022g00510 and Vit_11s0016g04080. After 5 h of recovery, only two genes among them were downregulated significantly (p<0.01) (Figure 4g), and the other four genes recovered from their downregulated states. However, 3 out of 19 genes, Vit_04s0008g01590, Vit_16s0098g01060 and Vit_19s0085g01050, which expressed highly at 40 °C for 6 h, still maintained high-level expression after 2 h and 5 h of recovery, ranging from 4.49- to 8.49-fold (p<0.01). Therefore, our results indicate that genes in module 17 have different gene functions, and their mechanisms during heat shock and transient states may be complex. The expression of two putative uncharacterized genes, Vit_07s0185g00040 (ranging from 1.12- to 4.72-fold) and Vit_02s0025g04060 (ranging from 0.47- to 5.66-fold), was also detected during heat shock and recovery. Based on the GGCN analysis, no homologous alignment or annotation information is available about their sequences, domains or gene expression in NCBI (http://www.ncbi.nlm.nih.gov/cdd) or in CRIBI Genomics, University of Padua (http://genomes.cribi.unipd.it/). Expression values in response to heat shock and recovery between each two time points were plotted together for the 19 genes in module 17 using the SPSS program[28] and treated with LOESS[26] (Figure 5). The best goodness-of-fit values were those at adjacent time points. Moreover, most R2 between the dependent and independent variables ‘gene expression value’ and ‘treatment time point’ were close to 1.0 at adjacent time points[36] (Table 6), which indicated a strong linear relationship between compared variables. The goodness-of-fit analysis indicated that under the same tempospatial conditions, as a whole network, these genes display a clear co-expression relationship.

Figure 5

The goodness of fit test of 19 gene expression values in module 17 between each two time points treated with heat shock and subsequent recovery. The fit lines were added by using LOESS in the matrix scatterplot. ‘HS’ represents heat shock treatment. ‘HS_R’ represents recovery after heat shock treatment.

Table 6

‘Goodness-of-fit’ test of 19 gene expression values in module 17 between each ‘two time points’ treated with heat shock and recovery

R²	HS_0.5 h	HS_1 h	HS_2 h	HS_3 h	HS_6 h	HS_R_2 h	HS_R_5 h
HS_0.5 h		0.961	0.880	0.825	0.829	0.659	0.591
HS_1 h	0.961		0.944	0.882	0.849	0.679	0.597
HS_2 h	0.880	0.944		0.916	0.925	0.809	0.725
HS_3 h	0.825	0.882	0.916		0.905	0.754	0.727
HS_6 h	0.829	0.849	0.925	0.905		0.799	0.838
HS_R_2 h	0.659	0.679	0.809	0.754	0.799		0.835
HS_R_5 h	0.591	0.597	0.725	0.727	0.838	0.835

R2 represents the coefficient of determination between the dependent and independent variables ‘gene expression value’ and ‘treatment time point’ in the linear regression. ‘HS’ represents heat shock treatment. ‘HS_R’ represents recovery after heat shock treatment.

The PCC of gene expression values were significantly greater than 0.78 (Supplementary Table 3). Similarly, during the different time points of heat shock and the recovery process, most PCC values were also greater than 0.78, which indicate that most genes significantly co-express (Supplementary Table 3). Therefore, gene co-expression ‘in response to heat’ represented by module 17 was validated experimentally by qRT-PCR and by PCC analysis of gene expression given that most genes were upregulated together very significantly (p<0.01), and most PCC values were greater than the PCC cutoff value, 0.78, which was used to screen significant co-expression correlation from a large-scale expression data set. Among the 29 genes in module 17 that corresponded to ‘responses to heat stress’, 10 genes showed no response to heat shock, which could suggest that these genes may co-express in other tempospatial condition heat stress environments or in response to other environment stresses, such as ‘response to high light intensity’, ‘response to oxidative stress’ or ‘response to hydrogen peroxide’, because expression of these genes might be regulated depending on time, space and environmental conditions.[37] This process may include many levels, such as chromatin structure, transcription, transcript stability or localization, and translation. The homologous gene comparison for ‘response to heat’ matched quite well between module 17 grapevine genes and those involved in the heat stress response in A. thaliana (Table 5).

Expression patterns of genes in module 17 after low temperature treatment

In contrast to the upregulation of these genes, most of the 19 genes were down regulated in response to low temperature (4 °C) treatment (Figure 6), ranging from 1.05- to 4.55-fold (Figure 6). To further test the co-expression relationship between these genes, the PCC of 19 gene expression values were calculated. Supplementary Table 4 shows that 45.91% of them were greater than 0.78; thus, the co-expression relationship of these genes was not very obvious if inferring from PCC values, compared with those after heat shock treatment.

Figure 6

Gene expression patterns in module 17 after treatment with low temperature at 4 °C for 1 h. The value on the Y-axis is −ΔΔCt. Expression ratio of genes was considered significant if **p<0.01.

Discussion

Plant growth, development and adaptation to the environment are complex, yet highly coordinated, processes. One way to understand these complex processes is to establish gene co-expression networks from which we can predict putative functions of genes in the network because genes sharing a module in a co-expression network are likely involved in similar biological processes.[3,7] In this study, we constructed a GGCN at the genome-wide level with publically available microarray data using the efficient heuristic algorithm Qcut, which is based on the optimization of a modularity function (Q), and combined spectral graph partitioning and local search to optimize Q.[17] Moreover, nodes were densely linked with each other in a sub-network module, but they were sparse or had no connections between the subnetwork modules. The gene-to-gene PCC derived from gene expression data in Gene Expression Omnibus allowed us to portion these co-expressing genes into network modules in various experimental conditions. The goodness of fit, coefficient of determination and PCC statistical tests of module 17 have confirmed that genes in the same module show co-expression relationships under the same tempo-spatial conditions, which may be associated with the same biological function, one of the important features of a co-expression network.[38,39] The homologous gene comparison of ‘response to heat’ between module 17 in grapevine and A. thaliana also demonstrated that partitioning genes into modules from the co-expression network was reliable. HSPs and chaperones are crucial components of the heat shock regulatory network in plants[40] and take a crucial role in response to multiple environmental insults.[41,42] These HSPs are also involved in response to cold[43] and non-thermal stress treatments, such as salinity,[44] drought,[45,46] high light stress,[47] oxidative stress[48] and heavy metal stress.[49] Therefore, the biological functions represented by module 17, a module that responds to environmental stresses, may be tested in multiple stresses in the future. The reliability and biological correlation of the network were further verified by experimentation. The same set of genes in module 17 of the co-expression network exhibited two co-expression patterns, one upregulation (to heat shock treatment) and one downregulation (to cold treatment). The differential response patterns between heat shock and low temperature experimental treatments suggest that other regulatory factors may be involved, which require additional investigation. These covarying patterns could also suggests the complexity of cellular transcriptional activities.[14] The co-expression network and partitions into different modules may also help to identify new genes that may putatively be involved in certain biological processes.[3] In this research, two putative uncharacterized genes without any gene function information, gene annotation, expression sequence tag(EST), transcriptome data or protein domain prediction were detected in response to heat shock. These genes are worthy of further investigation. Overall, the study provided a new insight into the module properties of grapevine gene functions, which facilitated the module research of gene functions and the discovery of new genes.

38 in total

Review 1. Gene networks: how to put the function in genomics.

Authors: Paul Brazhnik; Alberto de la Fuente; Pedro Mendes
Journal: Trends Biotechnol Date: 2002-11 Impact factor: 19.536

2. Cytoscape: a software environment for integrated models of biomolecular interaction networks.

Authors: Paul Shannon; Andrew Markiel; Owen Ozier; Nitin S Baliga; Jonathan T Wang; Daniel Ramage; Nada Amin; Benno Schwikowski; Trey Ideker
Journal: Genome Res Date: 2003-11 Impact factor: 9.043

Review 3. How do plants feel the heat?

Authors: Ron Mittler; Andrija Finka; Pierre Goloubinoff
Journal: Trends Biochem Sci Date: 2012-01-09 Impact factor: 13.807

4. Comparative transcriptomic profiling of Vitis vinifera under high light using a custom-made array and the Affymetrix GeneChip.

Authors: Luísa C Carvalho; Belmiro J Vilela; Phil M Mullineaux; Sara Amâncio
Journal: Mol Plant Date: 2011-04-15 Impact factor: 13.164

Review 5. Role of the major heat shock proteins as molecular chaperones.

Authors: C Georgopoulos; W J Welch
Journal: Annu Rev Cell Biol Date: 1993

6. Galactinol synthase1. A novel heat shock factor target gene responsible for heat-induced synthesis of raffinose family oligosaccharides in Arabidopsis.

Authors: Tressa Jacob Panikulangara; Gabriele Eggers-Schumacher; Markus Wunderlich; Harald Stransky; Fritz Schöffl
Journal: Plant Physiol Date: 2004-10-01 Impact factor: 8.340

7. Genome-scale identification of cell-wall related genes in Arabidopsis based on co-expression network analysis.

Authors: Shan Wang; Yanbin Yin; Qin Ma; Xiaojia Tang; Dongyun Hao; Ying Xu
Journal: BMC Plant Biol Date: 2012-08-09 Impact factor: 4.215

8. Expression of pathogenesis related genes in response to salicylic acid, methyl jasmonate and 1-aminocyclopropane-1-carboxylic acid in Malus hupehensis (Pamp.) Rehd.

Authors: Jiyu Zhang; Xiaoli Du; Qingju Wang; Xiukong Chen; Dong Lv; Kuanyong Xu; Shenchun Qu; Zhen Zhang
Journal: BMC Res Notes Date: 2010-07-27

9. Arabidopsis gene co-expression network and its functional modules.

Authors: Linyong Mao; John L Van Hemert; Sudhansu Dash; Julie A Dickerson
Journal: BMC Bioinformatics Date: 2009-10-21 Impact factor: 3.169

10. Computational discovery of regulatory elements in a continuous expression space.

Authors: Mathieu Lajoie; Olivier Gascuel; Vincent Lefort; Laurent Bréhélin
Journal: Genome Biol Date: 2012-11-27 Impact factor: 13.583

14 in total

1. Identification of putative drought-responsive genes in rice using gene co-expression analysis.

Authors: Yanmei Lv; Lei Xu; Komivi Dossa; Kun Zhou; Mingdong Zhu; Hongjun Xie; Shanjun Tang; Yaying Yu; Xiayu Guo; Bin Zhou
Journal: Bioinformation Date: 2019-07-31

2. Weighted gene co-expression network analysis unveils gene networks regulating folate biosynthesis in maize endosperm.

Authors: Lili Song; Diansi Yu; Hongjian Zheng; Guogan Wu; Yu Sun; Peng Li; Jinbin Wang; Cui Wang; Beibei Lv; Xueming Tang
Journal: 3 Biotech Date: 2021-09-21 Impact factor: 2.893

3. Analysis of chickpea gene co-expression networks and pathways during heavy metal stress.

Authors: Birendra Singh Yadav; Swati Singh; Sameer Srivastava; Ashutosh Mani
Journal: J Biosci Date: 2019-09 Impact factor: 1.826

4. Modern Approaches for Transcriptome Analyses in Plants.

Authors: Diego Mauricio Riaño-Pachón; Hector Fabio Espitia-Navarro; John Jaime Riascos; Gabriel Rodrigues Alves Margarido
Journal: Adv Exp Med Biol Date: 2021 Impact factor: 2.622

5. Analysis of weighted co-regulatory networks in maize provides insights into new genes and regulatory mechanisms related to inositol phosphate metabolism.

Authors: Shaojun Zhang; Wenzhu Yang; Qianqian Zhao; Xiaojin Zhou; Ling Jiang; Shuai Ma; Xiaoqing Liu; Ye Li; Chunyi Zhang; Yunliu Fan; Rumei Chen
Journal: BMC Genomics Date: 2016-02-24 Impact factor: 3.969

6. ChlamyNET: a Chlamydomonas gene co-expression network reveals global properties of the transcriptome and the early setup of key co-expression patterns in the green lineage.

Authors: Francisco J Romero-Campero; Ignacio Perez-Hurtado; Eva Lucas-Reina; Jose M Romero; Federico Valverde
Journal: BMC Genomics Date: 2016-03-12 Impact factor: 3.969

7. Differential Network Analysis Reveals Evolutionary Complexity in Secondary Metabolism of Rauvolfia serpentina over Catharanthus roseus.

Authors: Shivalika Pathania; Ganesh Bagler; Paramvir S Ahuja
Journal: Front Plant Sci Date: 2016-08-18 Impact factor: 5.753

8. Construction of citrus gene coexpression networks from microarray data using random matrix theory.

Authors: Dongliang Du; Nidhi Rawat; Zhanao Deng; Fred G Gmitter
Journal: Hortic Res Date: 2015-06-10 Impact factor: 6.793

9. Functional characterization of drought-responsive modules and genes in Oryza sativa: a network-based approach.

Authors: Sanchari Sircar; Nita Parekh
Journal: Front Genet Date: 2015-07-30 Impact factor: 4.599

10. Characterization of CIPK Family in Asian Pear (Pyrus bretschneideri Rehd) and Co-expression Analysis Related to Salt and Osmotic Stress Responses.

Authors: Jun Tang; Jing Lin; Hui Li; Xiaogang Li; Qingsong Yang; Zong-Ming Cheng; Youhong Chang
Journal: Front Plant Sci Date: 2016-09-07 Impact factor: 5.753