| Literature DB >> 14633289 |
Aedín C Culhane1, Guy Perrière, Desmond G Higgins.
Abstract
BACKGROUND: Rapid development of DNA microarray technology has resulted in different laboratories adopting numerous different protocols and technological platforms, which has severely impacted on the comparability of array data. Current cross-platform comparison of microarray gene expression data are usually based on cross-referencing the annotation of each gene transcript represented on the arrays, extracting a list of genes common to all arrays and comparing expression data of this gene subset. Unfortunately, filtering of genes to a subset represented across all arrays often excludes many thousands of genes, because different subsets of genes from the genome are represented on different arrays. We wish to describe the application of a powerful yet simple method for cross-platform comparison of gene expression data. Co-inertia analysis (CIA) is a multivariate method that identifies trends or co-relationships in multiple datasets which contain the same samples. CIA simultaneously finds ordinations (dimension reduction diagrams) from the datasets that are most similar. It does this by finding successive axes from the two datasets with maximum covariance. CIA can be applied to datasets where the number of variables (genes) far exceeds the number of samples (arrays) such is the case with microarray analyses.Entities:
Mesh:
Year: 2003 PMID: 14633289 PMCID: PMC317282 DOI: 10.1186/1471-2105-4-59
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Analysis of very similar and unrelated gene expression datasets using CIA. The first two axes of control CIA studies of very similar (A) and unrelated (B) profiles of Ross spotted cDNA gene expression data of the NCI 60 panel of cell lines are shown. The figure shows results from CIA of A) two random gene subsets of the 1375 gene dataset B) two unrelated datasets composed of 1375 genes, where the 60 cell dataset was duplicated and the arrays in one dataset were randomly permutated. Circles and arrows represent the projected co-ordinates of each dataset, and these are joined by a line, where the length of the line is proportional to the divergence between the datasets. The colours represent the eight NCI60 cell line classes as defined by Blower et al., [21].
Figure 2Cross-platform comparison of Affymetrix and spotted cDNA expression profiles using CIA. The first two axes of a CIA of gene expression profiles of the complete gene set from the Ross spotted cDNA array dataset (closed circles) and 1517 genes from the Staunton Affymetrix dataset (arrows) are shown. Circles and arrow represent the projected co-ordinates of each dataset, and these are joined by a line, where the length of the line is proportional to the divergence between the different gene expression profiles. The cell lines are coloured as in Figure 1. The cell lines are derived from breast (BR), melanoma (ME), colon (CO), ovarian (OV), renal (RE), lung (LC), central nervous system (CNS, glioblastoma), prostate (PR) cancers and leukaemia (LE). Colon and leukaemia cells were separated from those with mesenchymal or stromal features (glioblastoma and renal tumour cell lines) on the first axis (F1, horizontal), and melanoma cell lines were distinguished from the other cell lines on the second axis (F2, vertical). A histogram of the main factors which explain the total variability of this CIA is superimposed on the top right corner. The first three axes represented 42%, 21% and 8% of the inertia.
Figure 4Detecting genes defining major trends identified using CIA. The central panel (B) is the CIA from Figure 2. The co-ordinates of the genes in each ordination are shown in the side panels A) Ross cDNA and C) Staunton Affymetrix. The top ten genes at the end of axes F1 and F2 are labelled, where red gene labels indicate genes that were present in both datasets. Genes labelled in bold describe genes that were replicated on the microarray. Genes labelled in blue represent genes that were not contained in the top ten genes, but were in the top thirty genes at the end of each axes and are of biological interest.
Results of CIA of different subsets of gene expression datasets
| 5643 | 3144 | 1416 | 0.85 | 40 | 61 | 0.96 | 0.97 |
| 2455 | 1169 | 0.86 | 40 | 61 | 0.96 | 0.97 | |
| 1517 | 776 | 0.88 | 42 | 63 | 0.96 | 0.98 | |
| 3748 | 3144 | 786 | 0.86 | 30 | 49 | 0.96 | 0.97 |
| 2455 | 625 | 0.87 | 31 | 50 | 0.97 | 0.97 | |
| 1517 | 388 | 0.86 | 32 | 51 | 0.97 | 0.97 | |
| 1415 | 3144 | - | 0.83 | 38 | 62 | 0.95 | 0.96 |
| 2455 | - | 0.85 | 38 | 62 | 0.95 | 0.97 | |
| 1517 | - | 0.86 | 40 | 64 | 0.95 | 0.97 | |
| 1375 | 3144 | 433 | 0.83 | 38 | 62 | 0.95 | 0.96 |
| 2455 | 370 | 0.84 | 37 | 62 | 0.95 | 0.97 | |
| 1517 | 269 | 0.86 | 40 | 64 | 0.95 | 0.97 | |
Gene expression data subsets from *spotted cDNA [16] and **Affymetrix [15] were subjected to CIA, where COA was performed on the Affymetrix dataset, and row weighted COA on spotted cDNA array dataset. Results of the co-inertia analysis show the RV co-efficient, accumulated inertia (% of total sum of eigenvalues of co-inertia analysis), and correlation between the coordinates on first pair (F1) and second pair (F2) of axes. ± Probes (sequence spots on each array) were matched using MatchMiner [22]. The 1415 cDNA subset contained the 1375 cDNA geneset and 40 extra genes for which no image identifier was given, thus matchminer counts for these 40 extra genes could not be determined, but the number of probes matched should be similar to the 1375 cDNA gene set results.
Figure 3Hierarchical clustering of Affymetrix and spotted cDNA expression profiles of 60 cell lines. Dendrograms showings average linkage hierarchical clustering of NCI60 human cancer cell lines using Spearman Rank correlations. Cluster analyses of the 60 cell lines based on A) gene expression profiles of 1415 genes from the Ross spotted cDNA array dataset and B) 1517 genes from the Staunton Affymetrix dataset are shown. The cell lines are coloured as in Figure 1. The colon tumour cell line HT29 and cluster of colon tumour cell lines are highlighted by a green arrow and bar respectively.
Selection of genes identified using CIA
| F1 (mesenchymal) | All CNS, Renal cells and the breast cancer cell line BR-Hs578T | Collagen marker | - | + | |
| Collagen marker | + | - | |||
| Collagen marker | - | + | |||
| Muscle marker | - | + | |||
| Vimentin | - | + | |||
| Fibronectin 1 | + | + | |||
| Inducer of EMT | - | + | |||
| N-cadherin | + | - | |||
| Metallothionein A2-associated with invasive breast cancer | - | + | |||
| F1 (epithelial) | All colon cells and the breast cancer cells MCF-7 and TR7D | E-cadherin, primary epithelial marker | + | - | |
| Serine protease inhibitor, Kunitz type, 2 an inhibitor of hepatocyte growth factor | + | + | |||
| Keratin 8, epithelial marker | + | + | |||
| Keratin 18, epithelial marker | - | + | |||
| Keratin 19, epithelial marker | - | + | |||
| Desmoplakin I, epithelial marker | + | - | |||
| Loss of S100A2 early event in melanoma development | - | + | |||
| F1 (colon cell markers) | Ep-Cam. Target antigen in colorectal carcinoma | + | + | ||
| Target antigen in colorectal carcinoma | + | - | |||
| F1 (Leukaemia) | All leukaemia cell lines | A lymphoid-specific guanosine diphosphate dissociation inhibitor | + | + | |
| Lymphocyte cytosolic protein 1, L-plastin | + | - | |||
| An interferon induced transmembrane protein | - | + | |||
| F2 (Melanoma) | All melanoma cells and the breast cancer cells BR_MDA and BR_MDAMB435 | Microphthalamia-associated transcription factor | - | + | |
| Tyrosinase | - | + | |||
| Dopachrome tautomerase | - | + | |||
| Tyrosinase-related protein 1 | - | + | |||
| Ras-associated protein 7 | - | + | |||
| Melanoma inhibitory activity | - | + | |||
| MUC18, melanoma cell adhesion molecule MCAM | + | - | |||
| Melanoma-associated antigen 3 | - | + | |||
| Melanoma-associated antigen 12 | - | + | |||
| Glycomembrane protein nmb | - | + | |||
| Tissue inhibitor of metalloproteinase 2 | - | + | |||
| Tissue inhibitor of metalloproteinase 3 | + | - |
Genes identified on the first (F1) and second (F2) axes, where + or - indicated whether a gene was detected or not detected within the top 30 genes at the ends of each of these axes in CIA of Affymetrix and spotted cDNA array gene expression profiles of the NCI60 cell lines. These genes are graphically presented in Figure 4 and further details on these genes are available in the Results section. *Official gene symbol names are used for each gene.