| Literature DB >> 20113513 |
Ritesh Krishna1, Chang-Tsun Li, Vicky Buchanan-Wollaston.
Abstract
BACKGROUND: Time-course microarray experiments can produce useful data which can help in understanding the underlying dynamics of the system. Clustering is an important stage in microarray data analysis where the data is grouped together according to certain characteristics. The majority of clustering techniques are based on distance or visual similarity measures which may not be suitable for clustering of temporal microarray data where the sequential nature of time is important. We present a Granger causality based technique to cluster temporal microarray gene expression data, which measures the interdependence between two time-series by statistically testing if one time-series can be used for forecasting the other time-series or not.Entities:
Mesh:
Year: 2010 PMID: 20113513 PMCID: PMC2841598 DOI: 10.1186/1471-2105-11-68
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Inferred network for Dataset 1. The network structure inferred after applying Granger causality test on the synthetic dataset 1.
Figure 2Inferred network for Dataset 2. The network structure inferred after applying Granger causality test on the synthetic dataset 2.
Figure 3Inferred network for Dataset 3. The network structure inferred after applying Granger causality test on the synthetic dataset 3.
Figure 4Simulation results with Dataset 1, 2 and 3 integrated into one system. The association graph obtained after applying the Granger causality test on the combined dataset is represented in form of a association matrix. We can see three distinct island like modules in the graph, each module representing a dataset.
Figure 5Temporal profiles of genes selected for smaller dataset for Arabidopsis. The temporal profiles of the genes selected to constitute the smaller Arabidopsis dataset is shown. A) Genes annotated for circadian activity B) Genes annotated for death and C) Gene annotated for Ageing.
Figure 6Degree sorted network structure. The association graph obtained after applying Granger causality test is displayed in a degree sorted manner.
Figure 7Extracted subgraphs indicating potential modules of interest in the smaller dataset. The biological functions performed by modules in respective figures are A.) Circadian rhythm B.) Immune and Defense response C.) Circadian rhythm and D.) Aging. The GO annotations for the genes can be seen in Table 1.
Gene ontology details for the networks shown in Figure 7
| GO-ID | corr | Known/Total | Functional Description | Gene Names | |
|---|---|---|---|---|---|
| Figure 7(A) | |||||
| 48511 | 1.3744E-11 | 4.1921E-10 | 4/6 | Rhythmic process | AT5G02810, AT2G46830, AT1G68830, AT2G25930 |
| 7623 | 1.3744E-11 | 4.1921E-10 | 4/6 | Circadian rhythm | AT5G02810, AT2G46830, AT1G68830, AT2G25930 |
| Figure 7(B) | |||||
| 9814 | 4.5406E-11 | 7.6281E-9 | 5/8 | Defense response | AT1G55490, AT2G34690, AT5G03280, AT1G61560, AT4G14400 |
| 45087 | 2.5439E-10 | 2.1369E-8 | 5/8 | Innate immune response | AT1G55490, AT2G34690, AT5G03280, AT1G61560, AT4G14400 |
| 6955 | 3.8828E-10 | 2.1743E-8 | 5/8 | Immune response | AT1G55490, AT2G34690, AT5G03280, AT1G61560, AT4G14400 |
| 2376 | 5.7329E-10 | 2.4078E-8 | 5/8 | Immune system process | AT1G55490, AT2G34690, AT5G03280, AT1G61560, AT4G14400 |
| 8219 | 3.9627E-9 | 1.1096E-7 | 4/8 | Cell death | AT1G55490, AT2G34690, AT5G03280, AT4G14400 |
| 16265 | 3.9627E-9 | 1.1096E-7 | 4/8 | Death | AT1G55490, AT2G34690, AT5G03280, AT4G14400 |
| Figure 7(C) | |||||
| 7623 | 1.6563E-14 | 9.8551E-13 | 5/7 | Circadian rhythm | AT5G57360, AT2G46790, AT1G22770, AT5G61380, AT4G08920 |
| 48511 | 1.6563E-14 | 9.8551E-13 | 5/7 | Rhythmic process | AT5G57360, AT2G46790, AT1G22770, AT5G61380, AT4G08920 |
| Figure 7(D) | |||||
| 16280 | 3.0760E-13 | 1.4149E-11 | 5/6 | Aging | AT3G12090, AT4G23410, AT5G14930, AT2G19580, AT2G21045 |
| 32502 | 1.1218E-8 | 2.5802E-7 | 6/6 | Developmental process | AT3G12090, AT4G23410, AT3G44880, AT5G14930, AT2G19580, AT2G21045 |
Figure 8Extracted subgraph indicating potential module of interest in the bigger dataset - Set 1. The genes belonging to Response to stress category are highlighted in yellow. The GO annotations of the highlighted genes are presented in Table 2.
Figure 9Extracted subgraph indicating potential module of interest in the bigger dataset - Set 2. The genes belonging to Cytoplasmic part category are highlighted in yellow. The GO annotations of the highlighted genes are presented in Table 2.
Figure 10Extracted subgraph indicating potential module of interest in the bigger dataset - Set 3. The genes belonging to Response to stimulus category are highlighted in yellow. The GO annotations of the highlighted genes are presented in Table 2.
Figure 11Extracted subgraph indicating potential module of interest in the bigger dataset - Set 4. The genes belonging to Response to abiotic stimulus category are highlighted in yellow. The GO annotations of the highlighted genes are presented in Table 2.
Figure 12Extracted subgraph indicating potential module of interest in the bigger dataset - Set 5. The genes belonging to Catalytic activity category are highlighted in yellow. The GO annotations of the highlighted genes are presented in Table 2.
Figure 13Extracted subgraph indicating potential module of interest in the bigger dataset - Set 6. The genes belonging to Response to stress category are highlighted in yellow. The GO annotations of the highlighted genes are presented in Table 2.
Figure 14Extracted subgraph indicating potential module of interest in the bigger dataset - Set 7. The genes belonging to Cell part category are highlighted in yellow. The GO annotations of the highlighted genes are presented in Table 2.
GO annotations for the highlighted genes shown in Figures 8-14
| GO-ID | corr | Known/Total | Functional Description | Gene Names | |
|---|---|---|---|---|---|
| Figure 8 | |||||
| 6950 | 5.1715E-13 | 5.6542E-11 | 18/38 | Response to stress | AT3G08730, AT5G27600, AT4G33030, AT1G53670, AT2G37220, AT4G34710, AT4G31550, AT5G54810, AT4G09650, AT4G29040, AT5G24770, AT2G14610, AT3G51780, AT3G53990, AT4G04020, AT1G16880, AT5G25610, AT5G02500 |
| Figure 9 | |||||
| 44444 | 3.5066E-4 | 2.0147E-2 | 7/11 | Cytoplasmic part | AT5G42020, AT3G62030, AT1G27450, AT4G37910, AT2G45030, AT5G50950, AT1G69370 |
| Figure 10 | |||||
| 51869 | 5.5701E-12 | 2.3450E-9 | 20/41 | Response to stimulus | AT5G20850, AT5G55120, AT3G08720, AT4G37680, AT5G26870, AT1G33560, AT2G47180, AT2G05520, AT1G48030, AT4G01060, AT5G37780, AT1G63840, AT2G14580, AT1G58220, AT3G26790, AT3G54320, AT5G10450, AT1G74310, AT5G45340, AT5G40350 |
| Figure 11 | |||||
| 9628 | 1.1048E-5 | 1.3147E-3 | 4/7 | Response to abiotic stimulus | AT5G52310, AT3G17020, AT5G67030, AT5G63890 |
| Figure 12 | |||||
| 3824 | 9.0400E-4 | 4.2857E-2 | 10/15 | Catalytic activity | AT2G17420, AT3G15020, AT5G04590, AT3G13235, AT1G23190, AT3G53160, AT3G48090, AT4G23600, AT4G08790, AT1G51680 |
| Figure 13 | |||||
| 6950 | 7.0271E-10 | 6.6055E-8 | 14/37 | Response to stress | AT5G20230, AT5G61900, AT4G16845, AT3G22370, AT2G04030, AT1G55490, AT3G11820, AT4G12400, AT4G34990, AT4G23100, AT4G20260, AT3G49910, AT5G09810, AT5G05410 |
| Figure 14 | |||||
| 44464 | 2.6470E-3 | 1.8771E-2 | 25/36 | Cell part | AT4G27670, AT5G59220, AT4G25100, AT3G58810, AT4G14630, AT3G53620, AT5G11520, AT3G27300, AT1G42970, AT5G43280, AT4G27430, AT1G49300, AT2G39460, AT2G37040, AT3G01480, AT5G24550, AT1G72140, AT5G62790, AT1G25540, AT1G02860, AT4G38970, AT2G43130, AT3G52960, AT3G01220, AT2G43750 |
Figure 15Structural properties of association network obtained for bigger dataset. A) A power-law like distribution obtained for the node degree distribution. B) A distribution of number of partners shared between a pair of nodes C) Closeness centrality of all the nodes D) Plot for topological coefficient.
The correlation matrix for synthetic datasets 1, 2 and 3.
| Dataset 1 | ||||||
|---|---|---|---|---|---|---|
| 1 | 1.0000 | 0.2613 | -0.2309 | -0.2500 | 0.0871 | |
| 2 | 0.2613 | 1.0000 | -0.7114 | -0.7515 | 0.1351 | |
| 3 | -0.2309 | -0.7114 | 1.0000 | 0.7654 | -0.1283 | |
| 4 | -0.2500 | -0.7515 | 0.7654 | 1.0000 | -0.3125 | |
| 5 | 0.0871 | 0.1351 | -0.1283 | -0.3125 | 1.0000 | |
| Dataset 2 | ||||||
| Node | 1 | 2 | 3 | 4 | ||
| 1 | 1.0000 | -0.0944 | 0.0621 | -0.1088 | ||
| 2 | -0.0944 | 1.0000 | -0.0940 | 0.6040 | ||
| 3 | 0.0621 | -0.0940 | 1.0000 | 0.0024 | ||
| 4 | -0.1088 | 0.6040 | 0.0024 | 1.0000 | ||
| Dataset 3 | ||||||
| Node | 1 | 2 | 3 | 4 | 5 | 6 |
| 1 | 1.0000 | 0.1872 | 0.0449 | 0.0329 | 0.1118 | 0.1531 |
| 2 | 0.1872 | 1.0000 | 0.1105 | 0.0292 | 0.0748 | 0.3101 |
| 3 | 0.0449 | 0.1105 | 1.0000 | -0.0001 | 0.2516 | 0.0665 |
| 4 | 0.0329 | 0.0292 | -0.0001 | 1.0000 | 0.0821 | 0.2282 |
| 5 | 0.1118 | 0.0748 | 0.2516 | 0.0821 | 1.0000 | 0.0907 |
| 6 | 0.1531 | 0.3101 | 0.0665 | 0.2282 | 0.0907 | 1.0000 |
The Euclidean distance matrix for synthetic datasets 1, 2 and 3.
| Dataset 1 | ||||||
|---|---|---|---|---|---|---|
| 1 | 0 | 50.6180 | 60.5454 | 66.3305 | 53.7858 | |
| 2 | 50.6180 | 0 | 49.0080 | 57.3540 | 35.0406 | |
| 3 | 60.5454 | 49.0080 | 0 | 21.1572 | 36.9004 | |
| 4 | 66.3305 | 57.3540 | 21.1572 | 0 | 46.7355 | |
| 5 | 53.7858 | 35.0406 | 36.9004 | 46.7355 | 0 | |
| Dataset 2 | ||||||
| Node | 1 | 2 | 3 | 4 | ||
| 1 | 0 | 57.2707 | 49.4072 | 54.0319 | ||
| 2 | 57.2707 | 0 | 35.4161 | 23.3695 | ||
| 3 | 49.4072 | 35.4161 | 0 | 28.6682 | ||
| 4 | 54.0319 | 23.3695 | 28.6682 | 0 | ||
| Dataset 3 | ||||||
| Node | 1 | 2 | 3 | 4 | 5 | 6 |
| 1 | 0 | 32.1456 | 32.0493 | 31.7813 | 33.2172 | 34.6842 |
| 2 | 32.1456 | 0 | 25.0916 | 25.6146 | 28.3732 | 27.0407 |
| 3 | 32.0493 | 25.0916 | 0 | 21.9953 | 22.6557 | 28.4800 |
| 4 | 31.7813 | 25.6146 | 21.9953 | 0 | 24.4613 | 25.7756 |
| 5 | 33.2172 | 28.3732 | 22.6557 | 24.4613 | 0 | 30.6190 |
| 6 | 34.6842 | 27.0407 | 28.4800 | 25.7756 | 30.6190 | 0 |
Figure 16Correlation matrix for smaller Arabidopsis dataset. The association matrix obtained using Pearson correlation for the smaller Arabidopsis dataset is shown. The strengths of interactions between genes are quantified according to the color-map presented in the figure.
Figure 17Distance matrix for smaller Arabidopsis dataset. The association matrix obtained using Euclidean distance for the smaller Arabidopsis dataset is shown. The strengths of interactions between genes are quantified according to the color-map presented in the figure.
Figure 18Subgraphs obtained by using correlation as a measure of association in the smaller Arabidopsis dataset. Two subgraphs of potential interest were detected when correlation coefficient was used to establish association between genes in the smaller Arabidopsis dataset. The GO annotation of recognised genes are presented in Table 5.
GO annotations for clusters found in the smaller Arabidopsis dataset using correlation as the measure of association between genes
| GO-ID | corr | Known/Total | Functional Description | Gene Names | |
|---|---|---|---|---|---|
| Figure 18(A) | |||||
| 48511 | 4.0370E-9 | 8.2759E-8 | 3/4 | Rhythmic process | AT5G24470, AT2G46790, AT5G61380 |
| Figure 18(B) | |||||
| 16280 | 2.8611E-12 | 2.0600E-10 | 5/8 | Aging | AT5G45900, AT2G29350, AT5G35630, AT3G10985, AT4G28050 |