| Literature DB >> 12445336 |
Junbai Wang1, Jan Delabie, Hans Aasheim, Erlend Smeland, Ola Myklebost.
Abstract
BACKGROUND: A method to evaluate and analyze the massive data generated by series of microarray experiments is of utmost importance to reveal the hidden patterns of gene expression. Because of the complexity and the high dimensionality of microarray gene expression profiles, the dimensional reduction of raw expression data and the feature selections necessary for, for example, classification of disease samples remains a challenge. To solve the problem we propose a two-level analysis. First self-organizing map (SOM) is used. SOM is a vector quantization method that simplifies and reduces the dimensionality of original measurements and visualizes individual tumor sample in a SOM component plane. Next, hierarchical clustering and K-means clustering is used to identify patterns of gene expression useful for classification of samples.Entities:
Mesh:
Year: 2002 PMID: 12445336 PMCID: PMC138792 DOI: 10.1186/1471-2105-3-36
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Classification of samples by SOM analysis and K-means clustering. SOM component planes are shown for a) 42 DLBCL samples and three DLBCL cell lines (OCILy3, OCILy10 and OCILy1). SOM map size is (22 × 14) and the color scale of SOM component plane represented the mean ratio in each map node, and red indicates high expression, blue indicates low expression. See supplementary information for full data. b) K-means clustering of SOM, mean SOM component planes for DLBCL, FL and CLL. The cluster numbers are given, and the genes contained within each SOM node and K-means cluster are listed in the web supplement [13], selected genes from clusters 10, 11 and 1, 7, 9 are listed in table 1.
Selected genes grouped to cluster 1,7,9,10,11 of K-means clustering of SOM. Full list can be found in the web supplement [13].
| Cluster No. | Clone ID | Gene Description |
|---|---|---|
| Cluster 1 | 100 | Ki67 (long type) |
| 1287099 | Survivin = apoptosis inhibitor = effector cell protease EPR-1 | |
| 108294, 1287528 | XRCC9 = DNA repair protein | |
| 950690, 824709 | Cyclin A | |
| 563130, 824060 | Cyclin B1 | |
| 1288839, 325880 | Tubulin-beta | |
| 1240822, 588637 | Actin = cytoskeletal gamma-actin | |
| 683084 | Cyclin E2 | |
| 1356512 | Similar to MCM2 = DNA replication licensing factor | |
| 703757 | MPP1 = Putative M phase phosphoprotein 1 | |
| 1240595 | Tubulin-alpha | |
| 1341540, 781047 | BUB1 = putative mitotic checkpoint protein ser/thr kinase | |
| Cluster 7 | 789182 | PCNA = proliferating cell nuclear antigen |
| 1288183, 235938 | BAK = BCL-2 family member | |
| 80592 | Syndecan-1 | |
| 469256, 1322301 | Bag-1 = Bcl-2 interacting anti-apoptotic protein = RAP46 = Glucocorticoid receptor-associated protein | |
| 525540 | BCL-3 | |
| 1338456, 364941 | C-myc binding protein | |
| 784012 | 40S ribosomal protein S21 | |
| 324144 | Ribosomal protein S29 | |
| 1087015, 1240788 | Ribosomal protein S9 | |
| 510395 | Ribosomal protein S16 | |
| 272185 | Ribosomal protein L27 | |
| 1335421 | Similar to ribosomal protein L37a | |
| 1368302 | Ribosomal protein L32 | |
| Cluster 9 | 46778 | BCL-XL |
| 814478, 1353675 | A1 = Bfl-1 = GRs = Bcl-2 related protein | |
| 270770, 1272196 | IRF-4 = LSIRF = Mum1 = homologue of Pip = Lymphoid-specific interferon regulatory factor = Multiple myeloma oncogene 1 | |
| 1290353 | Similar to TREB and X box binding protein 1 | |
| 145093 | MCL1 = myeloid cell differentiation protein | |
| Cluster 10 | 701606, 1286850, 200814 | CD10 = CALLA = Neprilysin = enkepalinase |
| 1337241, 306139 | BCL-7A | |
| 1340526, 712395 | BCL-6 | |
| 824476, 95093, 1350545 | Spi-B transcription factor | |
| 1335782, 13194072, 1338245 | Oct-2 = lymphoid-specific octamer binding transcription factor = POU | |
| 278808 | Spi-1 = PU.1 = ets family transcription factor | |
| 50214 | CD86 = B7-2 = CD28 and CTLA-4 counter-receptor 2 | |
| Cluster 11 | 753794 | BLC = BCA-1 = B lymphocyte chemoattractant BLC = CXC chemokine |
| 1326652 | CD2 | |
| 245959 | SDF-1 = Stromal cell-derived factor 1 = chemokine | |
| 159946 | CD14 = monocyte differentiation antigen | |
| 1130062 | CD3E antigen, epsilon polypeptide | |
| 258802, 470615 | CD64 = high affinity immunogobulin gamma FC receptor I A form precursor = FC-gamma | |
| 377560 | CD3 delta = T cell surface glycoprotein | |
| 505569 | T cell receptor beta chain | |
| 23435, 1306024 | CD11C = leukocyte adhesion protein p150,95 alpha subunit = integrin alpha-X | |
| 1219244, 57, 1071581 | RANTES = chemokine | |
| 472180 | S100 calcium binding protein A4 = Placental calcium binding protein = Calvasculin | |
| 701290 | C-C chemokine receptor 5 == CC CK5 | |
| 47509 | Major histocompatibility complex, class II, DN alpha |
Figure 2Clinically distinct DLBCL subgroups defined by gene expression profiling. a) Kaplan-Meier plot of overall survival of DLBCL patients grouped on the basis of gene expression profiling in K-means cluster 10. b) Kaplan-Meier plot of overall survival of DLBCL patients grouped on the basis of gene expression profiling in K-means cluster 11. c) Kaplan-Meier plot of overall survival of DLBCL patients grouped on the basis of gene expression profiling in K-means cluster (1,7,9). d) Kaplan-Meier plot of overall survival of DLBCL patients grouped on the basis of gene expression profiling in K-means cluster 10 and cluster (1,7,9).
Figure 3Selected genes from K-means clusters. Hierarchical clustering of 72 selected genes from K-means cluster 1, 7 and 9. Depicted are the measurements of gene expression from DLBCL, FL and CLL samples. The dendrogram is colour coded according to the category of sample studied (see upper right key). Each row represents a separate cDNA clone on the microarray and each column a separate mRNA sample. The squares presented represent the ratio of hybridisation of fluorescent cDNA probes prepared from each experimental mRNA sample to reference mRNA sample. These ratios are a measure of relative gene expression, and red indicates high expression, green indicates low expression and grey indicates missing or excluded data. See supplementary information for full data [13].