Literature DB >> 24651851

Discovery of a strongly-interrelated gene network in corals under constant darkness by correlation analysis after wavelet transform on complex network model.

Longlong Liu1, Jieqiong Qu1, Xilong Zhou1, Xuefeng Liu1, Zhaobao Zhang2, Xumin Wang3, Tao Liu1, Guiming Liu3.   

Abstract

Coral reefs occupy a relatively small portion of sea area, yet serve as a crucial source of biodiversity by establishing harmonious ecosystems with marine plants and animals. Previous researches mainly focused on screening several key genes induced by stress. Here we proposed a novel method--correlation analysis after wavelet transform of complex network model, to explore the effect of light on gene expression in the coral Acropora millepora based on microarray data. In this method, wavelet transform and the conception of complex network were adopted, and 50 key genes with large differences were finally captured, including both annotated genes and novel genes without accurate annotation. These results shed light on our understanding of coral's response toward light changes and the genome-wide interaction among genes under the control of biorhythm, and hence help us to better protect the coral reef ecosystems. Further studies are needed to explore how functional connections are related to structural connections, and how connectivity arises from the interactions within and between different systems. The method introduced in this study for analyzing microarray data will allow researchers to explore genome-wide interaction network with their own dataset and understand the relevant biological processes.

Entities:  

Mesh:

Substances:

Year:  2014        PMID: 24651851      PMCID: PMC3961355          DOI: 10.1371/journal.pone.0092434

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Coral reefs, by definition, are mainly three-dimensional shallow-water structures dominated by scleractinian corals [1]. Coral reefs occupy only a relatively small part of ocean acreages, yet harbor a considerably disproportionate amount of biodiversity, providing communities with ecosystem goods and services [2]. However, contemporary coral reefs constantly suffer from disease outbreak, overfishing, eutrophication, and ocean acidification. As a result, they increasingly fail to reassemble, leading to mass bleaching events [3], which marks the breakdown in the symbiosis with zooxanthellae [4]. The elevated coral mortality and resilience failure have caused persistent shifts from the original coral dominance to algal preponderance [1], [5], [6]. Numerous efforts have been made, not only by local residents but also by scientists, to achieve the overall goal of management of coral reefs, on which the human welfare depends [7]. Accompanied by increasingly faster sequencing technology and lower cost, molecular level studies will be beneficial to coral reef conservation and management, such as barcoding [8], microarrays [9]–[13] and comparative genomics [14], [15]. Levy et al. [13] performed microarray analysis of coral genes with A. millepora, and indicated that many genes involved in stress responses and protection were coupled to the circadian clock. In their study, signals from time domain were transformed into frequency domain signals using the FFT (fast Fourier transform) method. This study analyzed a published data set from Levy et al. [13] with a novel method—correlation analysis after wavelet transform on complex network model, to explore the effect of light on gene expression in A. millepora. Wavelet transform, of which the time-frequency resolution depends on the signal frequency, can be used for multi-scale analysis of signal through dilation and translation, so it can effectively extract time–frequency features [16]. At high frequencies, the wavelet reaches at high time resolution but low frequency resolution; whereas, at low frequencies, low time resolution and high frequency resolution can be obtained [17]. By the advantage of multi-resolution, we can locate time and frequency more accurately. Firstly, dimension reduction was achieved by Euclidean distance clustering. Secondly, wavelet transform of 300 probes sampled through time axle was conducted. Finally, 50 key genes with large differences were screened, annotated and analyzed. In this way, we identified important genes including both previously-published genes with experiment support and novel genes without accurate annotation, which provides useful information for future research. Moreover, network characterization of gene connectivity under constant darkness shed light on our understanding of the coral's response toward light changes and an orchestrated correlation among genes under the control of biorhythm. Our method has shown advantages in dealing with extremely-complicated time serial data, which would be of wide use for similar data analyses.

Materials and Methods

Data generation

The data set used in this study was published by Levy et al. [13], available online in the NCBI database (GEO: GSE21658, Platform ID: GPL6941). The original aim of this data set was to better understand how corals tune their circadian machinery to respond to external and internal cues. According to the experimental conditions, 2 groups were divided as LD group (alternate light and dark) and DD group (continuous dark). Three coral colonies were sampled at 4 hr intervals during 2 consecutive days under an ambient LD cycle or under DD. A total of 72 arrays (GSM540430-GSM540501) were hybridized (n = 3) and the number of probes was 18432 in each array. In our study, the average of 3 biological replicates' was used, obtaining a total of 24 groups of data. For each probe, there was a 12-dimensional vector composed of its gene expression data taken at 12 time points under the same experimental condition, populating original databases (Table S1). The number of records in original databases was reduced from 18432 to 17198 after deleting those with no corresponding genes (324) and averaging those multiple probes for a single gene. To avoid the immoderate impact of the differences on the model, each record was normalized to [−1, 1] using the following formulae: If x min ≠ x max, thenif x min  =  x max, thenwhere x and x′ represent vector before and after normalization; x min and x max represent the minimum and maximum of the vectors' elements before normalization, respectively. Working databases were established via data preprocessing (Table S2).

Capture of important genes by Euclidean distance clustering analysis

To explore the effect of light on gene expression, genes with large expression differences between the two groups were considered key genes significantly affected by light. Expression differences were measured by Euclidean distance, using the following formula:where x and y represented the values of element j of record i in DD and LD groups respectively. Figure 1 shows the Euclidean distances of the 17198 probes between two groups. After smoothing, an inflection point (16898, 3.996) was found using MATLAB. For further analyses, the 300 probes were selected as important genes (Table S3).
Figure 1

Scatter diagrams of the Euclidean distances of the probes' expression data between the 2 groups.

Wavelet transform

Expression data at discrete time points are non-stationary signals containing a variety of frequency components. A discrete wavelet transform was adopted to process the expression data, and was performed according to the following formulae:where α 0 = 2, t 0 = 2, j = 0,1;2,…, k = N, N denoted the set of natural numbers, represented the wavelet basis function, and was the conjugate function of . Here, was defined as the wavelet function. The Mallat algorithm in the MATLAB toolbox was used to achieve wavelet transform for each probe record X = (x,…,x,…,x), where i = 1,…,300; j = 1,…, 12. The approximate signals C = (c,…,c,…,c) and the detailed signals D = (d,…,d,…,d) of each probe record were acquired according to the following formulae:where i = 1,…,300, j = 1,…,6. The frequency vector of each probe was composed of an approximate signal, while the detailed signals constituted a 12–dimensional vector. The frequency vectors of probe i were and in LD and DD groups respectively, where the superscripts LA and LH represented the low- and high-frequency coefficients of the probe expression data in the LD group, and the superscripts DA and DH represented the low- and high-frequency coefficients of the expression data in the DD group (Table S4).

Correlation analysis in the frequency domain

A correlation analysis quantitatively analyzed the similarity and interdependence of 300 probes in the frequency domain between the two groups. The specific algorithm was as follows: (1) Covariance of the frequency vectors X of probe i and X of probe j. (2) Correlation coefficient between the frequency vectors X and X. The correlation matrix R = (r)300×300 was thus obtained, which was a real symmetric matrix. We then defined the connectivity of a probe to measure the number of highly-relevant probes (0.7≤|r|≤1). The connectivity of probe i was defined as the number of probes that were highly relevant to probe i.

The establishment of complex gene network

The connectivity of a probe (gene) to certain extent reflected its importance in the whole gene network, i.e., the larger the connectivity, the higher the importance. Through comparing the differences in connectivity between the 2 networks (LD and DD groups), we then identified those probes with remarkable changes in connectivity. The algorithm to calculate the connectivity was as follows: (1) Defining a connectivity matrix L = (l)300×300, where l represented the connectivity between probe i and probe j. (2) In the matrix R = (r)300×300, if 0.7≤|r|≤1, then l = 1; else, l = 0. (3) Ignoring autocorrelation, i.e., l = 0. (4) Calculating the connectivity l of probe i,, i = 1,…,300, where N = 300. (5) The difference of the connectivity of probe i between LD and DD groups was obtained by subtracting the connectivity in the DD group from the connectivity in the LD group: The detailed information about M is listed in Table S5, and an example of calculating the connectivity (probe B015-C2) is illustrated in Figure S1. In this way, we discovered 168 probes whose connectivity was greater in the DD group than in the LD group, whereas there were 119 probes whose connectivity was greater in the LD group than in the DD group. The square ratio of negative or positive difference was defined as follows: Here, i and j represent probes with negative and positive differences respectively. The curve in Figure 2A was smoothed, and its inflection point (32, −19.88) in Fig. 2B was obtained using MATLAB. In the same way, another inflection point (101, 10) was also identified (Figure 2B). As shown in Table S6, a total of 18 probes with large positive differences and 32 probes with large negative differences were selected from inflection points.
Figure 2

Difference of connectivity between LD group and DD group.

A: Negative difference obtained by subtracting degree numbers of probes in DD group from degree numbers of corresponding probes in LD group. B: Positive difference gained throughout subtracting degree numbers of probes in DD group from degree numbers of corresponding probes in LD group. C: Negative difference curve by smoothing scatter diagram of negative difference in A. D: Positive difference curve by smoothing scatter diagram of positive difference in B.

Difference of connectivity between LD group and DD group.

A: Negative difference obtained by subtracting degree numbers of probes in DD group from degree numbers of corresponding probes in LD group. B: Positive difference gained throughout subtracting degree numbers of probes in DD group from degree numbers of corresponding probes in LD group. C: Negative difference curve by smoothing scatter diagram of negative difference in A. D: Positive difference curve by smoothing scatter diagram of positive difference in B.

Probe annotation and analysis

All the 50 probes were believed to be key genes and annotated in GO database (http://amigo.geneontology.org/cgi-bin/amigo/blast.cgi?). After filtering the P value of 1.0e-5, 11 genes (61.11%) with large positive differences (Table 1) and 19 genes (59.38%) with large negative differences (Table 2) were finally obtained.
Table 1

Annotated differentially-expressed probes with positive difference.

GenBank acc.Annotation by GO BLAST Search P-value
DY582337NADH dehydrogenase (ubiquinone) Fe-S protein 6 [18] 2.0e-26
DY584986protein tyrosine phosphatase, receptor type, s, a5.6e-13
DY584757ATP synthase subunit alpha3.1e-91
DY577937actin-related protein4.0e-69
GO004138Peptidyl-prolyl cis-trans isomerase7.0e-47
DY582070ubiquitin B3.8e-137
GO002856seryl-tRNA synthetase3.6e-91
GO003374Transcriptional regulator ERG3.7e-26
DY579678phosphatidylinositol-specific phospholipase C [19] 3.3e-33
DY582316Mediator of RNA polymerase II transcription subunit 301.9e-09
DY579711Rae1/Gle25.8e-24
Table 2

Annotated differentially-expressed probes with negative difference.

GenBank acc.Annotation by GO BLAST Search P-value
DY582598G2/mitotic-specific cyclin-B2 [20] 2.0e-60
DY580293RNA binding motif protein 4B6.9e-16
GO003570U2 snRNP-associated SURP motif-containing protein2.5e-15
DY582400cyclin B2 [21] 3.0e-41
DY580978tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein [22], [23] 3.2e-63
DY580776Epsin-21.1e-73
DY580431phosphatidylinositol transfer protein, beta1.5e-63
GO000385Thymosin beta-107.5e-11
DY578730WD repeat-containing protein 1/actin interacting protein 11.4e-07
DY582473WW domain-binding protein 43.7e-34
DY580643monooxygenase, DBH-like 19.4e-11
DY578693Animal heme peroxidases and related proteins2.7e-18
DY577894Transcriptional repressor protein YY11.2e-55
DY578322tetraspanin 111.5e-22
DY58611260S acidic ribosomal protein P16.1e-18
DY577919Membrane magnesium transporter 13.4e-15
DY584166heme binding protein 2 [24], [25] 4.7e-36
DY583753proteasome subunit, alpha type, 41.7e-105
DY582625Pre-mRNA-splicing factor 38B5.0e-74

Results

In this study, 18 probes with large positive differences and 32 probes with large negative differences were selected (Table S6). After gene annotation, we finally obtained 11 genes (61.11%) with large positive differences and 19 genes (59.38%) with large negative differences. The square ratio of negative differences (81.65%) was much larger than the square ratio of positive differences (18.35%), which can be explained by that more genes were upregulated coincidently in corals under LD cycles to maintain normal life activities; when under DD, the previous gene network was weakened, and the corresponding genes were suppressed for the life activities of the coral-algal symbiotic system. Within the positive group (Table 1), probes with fundamental functions were listed, with some believed to be house-keeping genes. Ubiquinone (DY582337) functions as a component of the mitochondrial respiratory chain, while protein tyrosine phosphatase (DY584986) functions in a coordinated manner with protein tyrosine kinases to control signaling pathways. ATP synthase (DY584757) synthesizes cellular ATP from ADP and inorganic phosphate, while seryl-tRNA synthetase (GO002856) belongs to the category of tRNA synthetase. Besides, actin (DY577937) and ubiquitin (DY582070) are usually regarded as reference genes, whose expressions remain stable under different conditions. Our results indicated that these key genes worked synchronically and formed a closer network to maintain homeostasis under an ambient LD cycle, whereas this network became looser and disordered under DD, which is partly in agreement with Levy et al. [13] who suggested that coordinated chaperone expression probably reflected the diel pattern of stress experienced by symbiotic corals. In short, these genes formed a synchronous alliance in basal metabolism, keeping the whole organism functioning well under normal conditions. For the negative group listed in Table 2, some genes are proven to be related to light response or cell cycle. DY584166 was previously annotated as heme binding protein 2, which was certified to be light-regulated gene in zebrafish [25]. DY582598, DY582400 and DY577894 were annotated as cyclin B2 or related proteins, which regulate the G2-M transition [21]. With the absence of light in DD group, circadian clocks might be disturbed, resulting in adjustment of cell cycle, which was the case for plants [26]. Although such assumption lacks support from existing document and experiment, we believe that it is reasonable, because rhythm control may work as a whole, with related genes interacting with each other.

Discussion

Some researchers believed that environmental stimuli could not result in so complex nonlinear dynamical systems such as heart rate and blood pressure [27], [28]. However, a growing body of evidences has shed light on the complicated nature of organism's reaction towards ecological stimuli—complex gene behavior, including upregulated or downregulated expression, and sometimes gene silencing. Due to the combinatorial complexity of gene interaction networks, it is difficult to identify key genes or coordinated gene pairs, and numerous investigative methods have been developed and used, e.g. weighted network analysis, hierarchical model-based analysis and nonlinear multivariate prediction [29]. However, few methods were applied in reef-building coral related researches. In the era of big data, data mining of complex gene network is crucial to understand complicated behavior of an organism. The thought of complex network has been widely used in brain connectivity studies and has been proven to be reliable [30], easily computable [31], [32], and useful for exploring structural–functional connectivity relationships [33]. Here we intended to explore the effect of light on gene expression in A. millepora by correlation analysis after wavelet transform on complex network model. Euclidean distance clustering analysis and the ‘hard’ threshold strategy were used after normalization of the original data. During the whole process of data analysis, wavelet transform played a decisive role. Previously, wavelet transform was mainly applied in image coding, multisensor image fusion and texture classification. As there were no precedents in applying wavelet transform in microarray analysis, the present study for the first time used and proved this method to be effective and accurate in analysis of microarray data. Wavelet transform has advantage over other methods, for it providing an adjustable time-frequency window that can automatically narrow to improve the time resolution in the high-frequency region and broaden to improve the frequency resolution in the low-frequency region [17]; therefore, detailed signals with several closely-related genes and approximate signals with hundreds of genes were obtained. In this way, certain inconspicuous characters of genes can be made distinctive and captured. We tried direct data processing without wavelet transform and only obtained less than 10 key genes with significant differences (data not shown). Comparing with other studies, the present work showed a high efficiency, with 60% (30/50) obtained differentially-expressed genes successfully annotated. For example, only 22% of the 309 differentially-expressed genes identified using maanova method have annotations [12]. A microarray is a complex biochemical-optical system whose purpose is to simultaneously measure the expression of thousands of genes, and thus can be regarded as a complex gene network. The topology of a microarray can be determined using different statistical or biological criteria. A few nodes can play a vital role in maintaining the network's connectivity [34]. When it comes to the gene network, these nodes (genes) may share the same co-expression pattern or cooperate tightly in cellular processes that lay a foundation for subsequent activities. It is thought that the growth of a network in which new nodes are preferentially attached to already established nodes may also be used to characterize the evolution of biological systems [35], [36]. Under DD, reef-building corals in this study tended to build a tighter gene network, as evidenced by probes with large negative differences. In other words, these genes were tightly correlated to fight against abnormal condition (i.e., DD). Key genes are important but they often demonstrate distinctively different patterns of expression, which is hard to apply to overall management. Viewing the gene activities from the aspect of complex network, the present study provided a genome-scale overview of coral's response toward light changes and revealed an orchestrated collaboration among genes under the control of biorhythm. Network characterization of gene connectivity is useful not only for exploring gene co-expression, but also for revealing abnormality in physiological disorders. Therefore, the present data provide us a theoretical basis to better protect the coral reef ecosystems. This network-scale analysis of coral microarray data demonstrates a possibility to investigate gene interaction in corals under different light conditions. A group of genes are screened, showing closer relationship in fighting against continuous darkness. These genes are the key nodes in gene network for coral activity towards light. Our next study will explore how functional connections are related to structural connections, how connectivity arises from interactions within and between different systems and the underlying principles. We believe that the method introduced in this study will be of wide use for researchers to further explore their own datasets and better understand the underlying biological mechanisms. An example of calculating the connectivity. There were only 7 probes which were highly relevant to the probe B015-C2 in the LD group (a), while there were 37 probes highly relevant to B015-C2 in the DD group (b). This implied that B015-C2 was more important in the DD group than in the LD group. Later probe annotation indicated that B015-C2 was related to cell mitosis. (JPG) Click here for additional data file. Two original databases of all probes in the DD and LD groups. (XLS) Click here for additional data file. Working databases established via data preprocessing original databases. (XLS) Click here for additional data file. Expression data of 300 important probes with large differences of expression. (XLS) Click here for additional data file. Frequency coefficients of 300 important probes expression data in the DD and LD groups. (XLS) Click here for additional data file. The results of correlation analysis in frequency domain for 300 important probes. (XLS) Click here for additional data file. Detailed information of 18 probes with large positive degree differences and 32 probes with large negative degree differences. (XLS) Click here for additional data file.
  29 in total

1.  Emergence of scaling in random networks

Authors: 
Journal:  Science       Date:  1999-10-15       Impact factor: 47.728

Review 2.  Rising to the challenge of sustaining coral reef resilience.

Authors:  Terry P Hughes; Nicholas A J Graham; Jeremy B C Jackson; Peter J Mumby; Robert S Steneck
Journal:  Trends Ecol Evol       Date:  2010-08-26       Impact factor: 17.712

3.  Development of a coral cDNA array to examine gene expression profiles in Montastraea faveolata exposed to environmental stress.

Authors:  Sara E Edge; Michael B Morgan; Daniel F Gleason; Terry W Snell
Journal:  Mar Pollut Bull       Date:  2005-08-22       Impact factor: 5.553

4.  Endoglin and Alk5 regulate epithelial-mesenchymal transformation during cardiac valve formation.

Authors:  Melania E Mercado-Pimentel; Antony D Hubbard; Raymond B Runyan
Journal:  Dev Biol       Date:  2006-12-23       Impact factor: 3.582

5.  Evolution of receptors for growth hormone and somatolactin in fish and land vertebrates: lessons from the lungfish and sturgeon orthologues.

Authors:  Shoji Fukamachi; Axel Meyer
Journal:  J Mol Evol       Date:  2007-10-05       Impact factor: 2.395

6.  Reproducibility of graph metrics of human brain functional networks.

Authors:  Lorena Deuker; Edward T Bullmore; Marie Smith; Soren Christensen; Pradeep J Nathan; Brigitte Rockstroh; Danielle S Bassett
Journal:  Neuroimage       Date:  2009-05-20       Impact factor: 6.556

Review 7.  Projecting coral reef futures under global warming and ocean acidification.

Authors:  John M Pandolfi; Sean R Connolly; Dustin J Marshall; Anne L Cohen
Journal:  Science       Date:  2011-07-22       Impact factor: 47.728

8.  Regulation of the Chlamydomonas cell cycle by light and dark.

Authors:  J L Spudich; R Sager
Journal:  J Cell Biol       Date:  1980-04       Impact factor: 10.539

9.  Spatio-temporal regulation of Wnt and retinoic acid signaling by tbx16/spadetail during zebrafish mesoderm differentiation.

Authors:  Rachel Lockridge Mueller; Cheng Huang; Robert K Ho
Journal:  BMC Genomics       Date:  2010-09-09       Impact factor: 3.969

10.  The light responsive transcriptome of the zebrafish: function and regulation.

Authors:  Benjamin D Weger; Meltem Sahinbas; Georg W Otto; Philipp Mracek; Olivier Armant; Dirk Dolle; Kajori Lahiri; Daniela Vallone; Laurence Ettwiller; Robert Geisler; Nicholas S Foulkes; Thomas Dickmeis
Journal:  PLoS One       Date:  2011-02-15       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.