| Literature DB >> 16995941 |
Jussi Paananen1, Markus Storvik, Garry Wong.
Abstract
BACKGROUND: Current genomic research methods provide researchers with enormous amounts of data. Combining data from different high-throughput research technologies commonly available in biological databases can lead to novel findings and increase research efficiency. However, combining data from different heterogeneous sources is often a very arduous task. These sources can be different microarray technology platforms, genomic databases, or experiments performed on various species. Our aim was to develop a software program that could facilitate the combining of data from heterogeneous sources, and thus allow researchers to perform genomic cross-platform/cross-species studies and to use existing experimental data for compendium studies.Entities:
Mesh:
Year: 2006 PMID: 16995941 PMCID: PMC1592126 DOI: 10.1186/1471-2105-7-418
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Concept of metagenes. A metagene is a common identifier that groups together gene and gene product identifiers originating from a single gene and orthologous genes in other species. Different identifiers can be cross-linked to each other using a common metagene identifier.
Figure 2Process flow of using CROPPER. Datasets are first processed individually. This processing adds a metagene identifier to each data row. After the processing of both datasets, the datasets can be combined using the metagene identifiers. Result file containing the combined data rows is produced. Additional datasets can be combined to the result file by repeating the process and including the combined result file as the second dataset.
Example of combined result dataset
| MGD59E612 | 216248_s_at | Orphan nuclear receptor NR4A2 (Orphan nuclear receptor NURR1) (Immediate-early response protein NOT) (Transcriptionally-inducible nuclear receptor). | Q06219 | nuclear receptor subfamily 4, group A, member 2 | C48D5.1 | Nuclear hormone receptor family member nhr-6 (Cnr8). | 216248_s_at | Orphan nuclear receptor NR4A2 (Orphan nuclear receptor NURR1) (Immediate-early response protein NOT) (Transcriptionally-inducible nuclear receptor). |
| MGDH20656 | 200746_s_at | P04901 | F13D12.7 | Guanine nucleotide-binding protein beta subunit 1. | 200746_s_at | Guanine nucleotide-binding protein G(I)/G(S)/G(T) beta subunit 1 (Transducin beta chain 1). | ||
| MGDH22681 | 201533_at | Beta-catenin. | Q02248 | catenin (cadherin associated protein), beta 1 | K05C4.6 | HuMPback (dorsal lumps) family member (hmp-2) | 201533_at | Beta-catenin. |
| MGDH22861 | 203333_at | Kinesin-associated protein 3 (Smg GDS-associated protein). | P70188 | kinesin-associated protein 3 | F56C9.1 | Putative serine/threonine protein phosphatase F56C9.1 in chromosome III (EC 3.1.3.16). | 203333_at | Kinesin-associated protein 3 (Smg GDS-associated protein). |
| MGDH22922 | 200075_s_at | Guanylate kinase (EC 2.7.4.8) (GMP kinase). | Q64520 | guanylate kinase 1 | T03F1.8 | 200075_s_at | Guanylate kinase (EC 2.7.4.8) (GMP kinase). | |
| MGDH23987 | 217746_s_at | Programmed cell death 6-interacting protein (PDCD6-interacting protein) (ALG-2-interacting protein 1) (Hp95). | O88695 | R10E12.1 | Apoptosis-linked gene 2 interacting protein X 1 (Protein pqn-58) (Protein YNK1). | 217746_s_at | Programmed cell death 6-interacting protein (PDCD6-interacting protein) (ALG-2-interacting protein 1) (Hp95). | |
| MGDH2564 | 203087_s_at | Kinesin-like protein KIF2 (Kinesin-2) (HK2). | P28740 | kinesin family member 2A | K11D9.1 | Kinesin-Like Protein family member (klp-7) | 213598_at | Kinesin-like protein KIF2 (Kinesin-2) (HK2). |
| MGDH2591 | 209503_s_at | 26S protease regulatory subunit 8 (Proteasome subunit p45) (p45/SUG) (Proteasome 26S subunit ATPase 5) (Thyroid hormone receptor-interacting protein 1) (TRIP1). | P47210 | Y49E10.1 | proteasome Regulatory Particle, ATPase-like family member (rpt-6) | 209503_s_at | 26S protease regulatory subunit 8 (Proteasome subunit p45) (p45/SUG) (Proteasome 26S subunit ATPase 5) (Thyroid hormone receptor-interacting protein 1) (TRIP1). | |
| MGDH26335 | 201390_s_at | Casein kinase II subunit beta (CK II beta) (Phosvitin) (G5a). | P13862 | T01G9.6 | Casein kinase II beta subunit (CK II beta). | 201390_s_at | Casein kinase II subunit beta (CK II beta) (Phosvitin) (G5a). | |
| MGDH2988 | 207614_s_at | Cullin-1 (CUL-1). | Q9WTX6 | cullin 1 | D2045.6 | Cullin-1 (Abnormal cell lineage 19 protein). | 207614_s_at | Cullin-1 (CUL-1). |
| MGDH6882 | 200864_s_at | P24410 | F53G12.1 | RAB family member (rab-11.1) | 200864_s_at | Ras-related protein Rab-11A (Rab-11) (YL8). | ||
| MGDH8694 | 201220_x_at | P56546 | C-terminal binding protein 2 | F49E10.5 | 210835_s_at | C-terminal-binding protein 2 (CtBP2). | ||
CROPPER was used to combine datasets from four distinct experiments. Metagene identifiers were assigned to each data row. Inspection of the gene descriptions reveals that it is likely that combining has been successfully and metagene identifiers have been assigned to a common group of genes and gene products across the different species and technology platforms.
Figure 3SOM Clustering of data combined using CROPPER. Four different Parkinson's disease datasets were combined by aligning the metagenes with CROPPER. The data consisted of a total of 9 conditions originating from the datasets. The conditions are shown in the ordinate axis and their z-transformed values are shown in the y-axis. For normalization, the differences in the data value distributions were z-transformed. This was followed by calculation of the z-ratio, in which the differences of the z-values of the treated samples were subtracted from z-values of the controls (for method details, see Cheadle et al. 2002 [7]). The limit for significant alteration was z-ratio ± 1 defined as more than one standard deviation in the z-values of control and treatment data points. The gene expression data from human represent 4055 metagenes. These were clustered into 16 clusters using a self-organizing map (SOM). The expression profiles of the metagenes with altered expression in both human and animal data sets were coloured in green. These 247 "green" genes were considered to be candidates for human neurodegenerative diseases. The genes with altered expression only in the animal experiment datasets, but not in the human datasets were coloured in red. The 225 "red" genes may suggest mechanisms in animal neurodegeneration models, but not in human Parkinson's disease. The separation of red and green genes to peripheral clusters indicates good clustering resolution. The lists of "red" and "green" metagenes were further analyzed for the enriched human KEGG and GO terms based on the human gene identifiers corresponding to the assigned metagenes.