| Literature DB >> 16046824 |
Nadim W Alkharouf1, D Curtis Jamison, Benjamin F Matthews.
Abstract
Gene expression databases contain a wealth of information, but current data mining tools are limited in their speed and effectiveness in extracting meaningful biological knowledge from them. Online analytical processing (OLAP) can be used as a supplement to cluster analysis for fast and effective data mining of gene expression databases. We used Analysis Services 2000, a product that ships with SQLServer2000, to construct an OLAP cube that was used to mine a time series experiment designed to identify genes associated with resistance of soybean to the soybean cyst nematode, a devastating pest of soybean. The data for these experiments is stored in the soybean genomics and microarray database (SGMD). A number of candidate resistance genes and pathways were found. Compared to traditional cluster analysis of gene expression data, OLAP was more effective and faster in finding biologically meaningful information. OLAP is available from a number of vendors and can work with any relational database management system through OLE DB.Entities:
Year: 2005 PMID: 16046824 PMCID: PMC1184053 DOI: 10.1155/JBB.2005.181
Source DB: PubMed Journal: J Biomed Biotechnol ISSN: 1110-7243
Figure 1OLAP, cubes and where they fit in a data warehousing solution. OLAP provides efficient and easy-to-use reporting tools and graphical interface, to enable users to mine a data warehouse for hidden information.
The organization of a cube with two dimensions. In this example, probe combination and genes are dimensions; P+/P−, K+/K−, biosample 1, biosample 2, A01A10, SSH1B07, D09H12, and B03C02 are levels of the respective dimension. The cells containing various figures are facts. Individual data in the fact cells are the values of the measures. In this example, there are two measures used in the cube, one is the fold induction, the second is the result of the t test (1 significantly induced, −1 significantly suppressed, 0 unchanged).
| Fold induction/ | Probe combination | |||
|---|---|---|---|---|
| Biosample 1 | Biosample 2 | |||
| Genes | A01A10 | 1.2 | 1.5 | 0.76 |
| 1 | −1 | 1 | ||
| SSH1B07 | 0.34 | 2.3 | −0.98 | |
| 0 | 1 | −1 | ||
| D09H12 | −1.6 | 1.4 | 0.03 | |
| −1 | 1 | 0 | ||
| B03C02 | 2 | 1.8 | −2.1 | |
| 1 | 1 | −1 | ||
Figure 2A snapshot of a multidimensional cube of gene expression data constructed in Microsoft's Analysis Services 2000 (shipped with SQLServer2000). (A) shows the dimensions of the cube and their associated levels, (B) is the fact table, and (C) shows the dimension tables.
Genes found to be induced at different time intervals using OLAP, k-means, and SOM clustering. Many of the key candidate resistance genes were identified by OLAP and not cluster analysis, in particular those genes induced at specific time intervals and not others. Cluster analysis did not reveal any other genes that OLAP did not.
| Time | GeneID | GeneName | OLAP | SOM | Comments | |
| Induced at all time Points | BM139889 | Proline-rich glycoproteins | ✓ | — | — | Cell wall proteins that are found |
| activated during pathogen attack | ||||||
| [ | ||||||
| BM107775 | Peroxidase | ✓ | ✓ | ✓ | Involved in detoxification and is | |
| activated during the hypersensitive | ||||||
| response in plants against | ||||||
| pathogen attack [ | ||||||
| BM139591 | Cytochrome P450 | ✓ | — | — | Photosynthesis-related gene | |
| monooxygenase | ||||||
| BM107779 | Photosystem II | ✓ | ✓ | ✓ | Involved in plant photosynthesis | |
| core proteins | and energy production | |||||
| BM107798 | 4-coumarate-CoA ligase | ✓ | ✓ | ✓ | Involved in phenylpropanoid | |
| metabolism and the synthesis of | ||||||
| secondary metabolites that are | ||||||
| known to be involved in plant defense [ | ||||||
| BM108156 | Transcription factor WRKY6 | ✓ | ✓ | ✓ | Believed to suppress PR-1 genes, | |
| thereby inferring susceptibility to | ||||||
| pathogen attack in plant species [ | ||||||
| Induced at the early time points only | CA850582 | Trypsin inhibitor | ✓ | — | — | Proteinase inhibitors |
| proteins | ||||||
| BM107847 | Germin-like protein | ✓ | — | — | Known to have antimicrobial | |
| activity, activated in plants during | ||||||
| pathogen infection [ | ||||||
| CA851099 | Pathogenesis-related | ✓ | — | — | Proteinase inhibitors known to be | |
| protein PR-6 | induced by jasmonic acid [ | |||||
| Induced at the mid time points only | DUP21F10 | Trehalose-6-phosphate | ✓ | — | — | Synthesizes trehalose, is thought to be an important |
| synthase (TPS) | regulator of sugar metabolism [ | |||||
| BM108164 | Pyrophosphatase | ✓ | — | — | Metabolism-related gene | |
| BM108095 | Sali3-2 protein | ✓ | — | — | Induced by aluminum in soybean roots [ | |
| BM107806 | Chalcone synthase | ✓ | — | — | Induced by the jasmonic acid signaling pathway [ | |
| Induced at the late time points only | BM108193 | Glutamate dehydrogenase | ✓ | — | — | Metabolism-related gene |
| CA853854 | Geranylgeranyl hydrogenase | ✓ | — | — | Metabolism-related gene | |
| BM107804 | Tyrosine-phosphatase | ✓ | — | — | Metabolism-related gene | |
| Commonly induced at the early and mid time points | CA850882 | Stress-induced gene SAM-22 | ✓ | — | — | A stress-induced PR-10 protein, |
| which is a ribonuclease protein | ||||||
| found activated in plants after | ||||||
| viral infection [ | ||||||
| BM107930 | Heat shock protein 70 | ✓ | — | — | Helps new or distorted proteins | |
| fold into shape, found induced in | ||||||
| a number of plant species after | ||||||
| pathogen infection [ | ||||||
| BM107821 | Lectin-chitin | ✓ | ✓ | ✓ | Cell wall protein | |
| Commonly induced at the early and late time points | BM107803 | Beta-glucosidase | ✓ | — | — | Metabolism-related gene |
| Commonly induced at the mid and late time points | CA852009 | Fructose-biphosphate aldolase | ✓ | — | — | Metabolism-related gene |
| BM107809 | Sucrose synthase | ✓ | ✓ | ✓ | Metabolism-related gene | |
| BM108104 | ATP-synthase | ✓ | Metabolism-related gene | |||
| BM108223 | Lipoxygenase | ✓ | ✓ | ✓ | Involved in jasmonic acid | |
| synthesis and is implicated in | ||||||
| plant responses against pathogens [ | ||||||
| BM108233 | Ubiquitin | ✓ | — | — | Plays an important role in | |
| marking proteins for proteolytic | ||||||
| degradation, one of the key | ||||||
| events in the systematic defense | ||||||
| mechanism of a plant against | ||||||
| pathogen invasion [ | ||||||
| CA853086 | Metallothionein | ✓ | ✓ | ✓ | A member of the aquaporin | |
| (AQP) water channel family, | ||||||
| induced in rice upon infection | ||||||
| with | ||||||