| Literature DB >> 16107210 |
Juan Jose Lozano1, Marta Soler, Raquel Bermudo, David Abia, Pedro L Fernandez, Timothy M Thomson, Angel R Ortiz.
Abstract
BACKGROUND: We use an approach based on Factor Analysis to analyze datasets generated for transcriptional profiling. The method groups samples into biologically relevant categories, and enables the identification of genes and pathways most significantly associated to each phenotypic group, while allowing for the participation of a given gene in more than one cluster. Genes assigned to each cluster are used for the detection of pathways predominantly activated in that cluster by finding statistically significant associated GO terms. We tested the approach with a published dataset of microarray experiments in yeast. Upon validation with the yeast dataset, we applied the technique to a prostate cancer dataset.Entities:
Mesh:
Substances:
Year: 2005 PMID: 16107210 PMCID: PMC1239914 DOI: 10.1186/1471-2164-6-109
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Graphical overview of Q-mode Factor Analysis (FA) [9]. Each sample is described by a vector x, containing the expression levels for all genes in the chip. The complete expression for all samples is contained in the matrix X = {x}. The expression levels of each sample are assumed to be generated by a linear combination of a small number of underlying transcriptional programs, the latent (non-observable) variables, contained in the set of vectors {F}, forming matrix F. The relative contribution of each program is given by the thickness of the arrows connecting factors and samples, stored in variables lij, altogether forming the loading matrix L. Each lij element can be understood as the correlation coefficient between the expression levels of the sample and the corresponding latent variable. Residuals are kept in vectors {ε}, giving rise to matrix E. Note that small loadings connecting a given sample (i.e., Xwith the factor model implies large residuals.
Results of the analysis of the yeast dataset [13]. The different clusters found by FADA are shown, together with the significant GO terms associated to them. The samples belonging to each one of the clusters are also shown. The first column shows the cluster number; the second shows the conditions associated to that cluster; columns 3 to 5 show the Z-score of the GO terms associated to the cluster (see Methods) at the Cellular Component (CC), Biological Process (BP) and Molecular Function (MF) levels; columns 6 to 8 show the corresponding GO terms.
| Heat_Shock_05_minutes_hs.1, Heat_Shock_10_minutes_hs.1 | 3.25 | 5.39 | 1.16 | nucleolus (325/88;0.81E-006) | ribosome biog. & ass. (271/75; 0.57E-007) | ||
| constant_0.32_mM_H2O2_.10_min._redo | 0.39 | 9.71 | 11.40 | protein catabolism (114/28; 0.74E-017) cell homeostasis (54/8; 0.10E-003) | |||
| 2.5 mM_DTT_045_min_dtt.1 2.5 mM_DTT_060_min_dtt.1 | 6.55 | 2.53 | 1.86 | endoplasmic ret. (353/27; 0.11E-008) | |||
| constant_0.32_mM_H2O2_.80_min._redo | 1.25 | 0.63 | 1.58 | ||||
| 37_deg_growth_ct.1 | NA | NA | NA | ||||
| Nitrogen_Depletion_8_h Nitrogen_Depletion_12_h | 7.11 | 5.30 | 0.61 | plasma membrane (197/16; 0.86E-005) | |||
| diauxic_shift_timecourse_18.5_h | 9.91 | #### | 5.51 | ribosome (368/126; 0.20E-010) | protein biosynthesis (493/168; 0.12E-012) | structural mol act (359 /119; 0.13E-006) | |
| aa_starv_0.5_h aa_starv_1_h aa_starv_2_h aa_starv_4_h | 3.60 | #### | 4.66 | peroxisome (52/6; 0.14E-003) | transporter act (343/27; 0.12E-004) | ||
| 33C_vs._30C_._90_minutes dtt_480_min_dtt.2 | 0.12 | -1.42 | -0.24 | ||||
| dtt_060_min_dtt.2 YP_galactose_vs_reference_pool_car.2 | 1.43 | 0.12 | -1.53 | ||||
| Diauxic_Shift_Timecourse_._0_h | 2.86 | 1.16 | 0.44 | vacuole (140/6; 0.63E-003) | |||
| YPD_stationary_phase_2_h_ypd.1 | -0.39 | 3.11 | 2.06 | electron transport (14/1; 0.91E-003) | |||
| diauxic_shift_timecourse_13.5_h | -0.51 | -0.27 | -0.20 | ||||
| 1M_sorbitol_._60_min 1M_sorbitol_._90_min | 1.21 | 2.46 | 1.33 | cell cycle (115/4; 0.11E-003) | |||
| YPD_2_h_ypd.2 YPD_4_h_ypd.2 YAP1_overexpression | -0.34 | 0.20 | 0.56 | ||||
| 1_mM_Menadione_.10_min.redo | 4.71 | 6.25 | 2.67 | mitochondrion (732/88; 0.49E-003) | vesicle-med. Transp. (190/31; 0.84E-004) | ||
| Heat_Shock_000_minutes_hs.2 | 5.33 | #### | 5.49 | ribosome (368/145; 0.26E-007) nucleolus (325/138; 0.15E-009) | ribosome biog & ass (271/121; 0.87E-011) | structural mol. act. (359/123; 0.11E-003) | |
| YP_fructose_vs_reference_pool_car.2 | 2.12 | 0.31 | 0.32 | mitochondrial membr (136/7; 0.59E-004) | |||
| Hypo.osmotic_shock_._15_min | 3.08 | 5.06 | 3.67 | bud (59/6; 0.40E-003) | ribosome biog & ass (271/24; 0.12E-007) | ||
| Heat_Shock_060_minutes_hs.2 | NA | NA | NA | ||||
| 17_deg_growth_ct.1 21_deg_growth_ct.1 25_deg_growth_ct.1 | 1.67 | 1.54 | -1.34 | ||||
| steady_state_15_dec_C_ct.2 steady_state_17_dec_C_ct.2 | 0.51 | -1.14 | 0.48 | ||||
| 2.5mM_DTT_005_min_dtt.1 2.5mM_DTT_015_min_dtt.1 | 0.03 | 1.28 | 0.32 | ||||
| galactose_vs._reference_pool_car.1 | 8.69 | 5.93 | 6.57 | cell cortex (39/7; 0.97E-003) | cytokinesis (52/10; 0.25E-003) | helicase activity (71/13; 0.21E-004) | |
| 29C_to_33C_._30_minutes | 0.39 | 2.96 | 3.27 | DNA metabolism (221/8; 0.78E-003) | DNA binding (146/7; 0.21E-003) | ||
| Heat_Shock_005_minutes_hs.2 | NA | NA | NA |
(a) GO terms discussed in the text are shown in bold. Together with each GO term, we show the number of genes corresponding to that term; the number of genes of that term in the cluster; and the corresponding P-value, according to the hipergeometric distribution.
(b) NA: Data Not Available. No significant genes (according to the q-value cutoff) could be found.
Figure 2Dendrogram for the Welsh dataset [14]. The dashed line indicates the thresholding used to define the clusters.
Results of the analysis of the Welsh dataset for up-regulated genes. The different sample clusters found by FADA are shown, together with the significant GO and GenMAPP terms associated to them. The first column shows the cluster number; the second shows the samples associated to that cluster; columns 3 and 4 show the z-score of the GenMAPP and GO terms associated to the cluster (see Methods); columns 5 to 8 show the corresponding GenMAPP and GO terms selected.
| LNCaP_A, LNCaP_B, LNCaP_+_DHT | 7.22 | 5.09 | RNA_transcription_React. (2.40e-03) | oxidoreductase activity (1.58e-04) | intracellular organelle (5.48e-04) | ||
| CAF_1598, BPHF_1598 | 3.83 | 4.66 | Hypertrophy_model (8.97e-03) | Enzyme inhibitor activity (4.96e-04) | regulation of cellular process (4.31e-03) | signalosome complex (7.27e-03) | |
| B_CELLS_A, B_CELLS_B B_CELLS_C, MOLT4, HL60 | 4.27 | 0.42 | mRNA_processing_React. (2.10e-03) | ||||
| T4, T7 T3, T5, T1, T27, T10, T9, T13A, T13B, T22, T12, T29, T8, T31, T30, T26, T19, T16, T23, T6, T24, T21, T11, T17 | 2.06 | 1.93 | Fatty_Acid_Degradation (8.48e-03) | steroid binding (7.57e-03) | |||
| N2, N1, N5, N3, N9, N8, N7, N10, N4 | 6.86 | 2.52 | Smooth_muscle_contraction (6.55e-03) | channel or pore class transporter activity (1.75e-03) |
(a) GO or GenMAPP terms are discussed in the text. Together with each term, we show the corresponding P-value, according to the hipergeometric distribution (see Methods).
(b) The Z(GO) column shows the Z-scores corresponding to Molecular Function (MF), Biological Process (BP), and Celular Component (CC), respectively, obtained for each cluster. The Z(GM) column refers to the Z-score corresponding to the GenMAPP terms.
Results of the analysis of the Welsh dataset for down-regulated genes. The different sample clusters found by FADA are shown, together with the significant GO and GenMAPP terms associated to them. The first column shows the cluster number; the second shows the samples associated to that cluster; columns 3 and 4 show the Z-score of the GenMAPP and GO terms associated to the cluster (see Methods); columns 5 to 8 show the corresponding GenMAPP and GO terms selected.
| LNCaP_A, LNCaP_B, LNCaP_+_DHT | 0.36 | -0.10 | extracellular space (3.74e-04) | ||||
| CAF_1598, BPHF_1598, CAF_1303, CAF_1852, PrSC_A, PrSC_B, CAF_2585, Du145, PC3, HUVEC_A, HUVEC_B, hPr1, PrEC | 2.33 | 2.43 | Hs_GPCRDB_Other (1.96e-03) | structural constituent of ribosome (2.57e-05) | |||
| B_CELLS_A, B_CELLS_B, B_CELLS_C, MOLT4, HL60 | 0.91 | 0.85 | |||||
| T4, T7, T3, T5, T1, T27, T10, T9, T13A, T13B, T22, T12, T29, T8, T31, T30, T26, T19, T16, T23, T6, T24, T21, T11, T17 | 4.30 | 1.04 | G1_to_S_cell_cycle_React (7.76e-04) | signalosome complex (5.23e-04) | |||
| N2, N1, N5, N3, N9, N8, N7, N10, N4 | -0.21 | -0.65 | metabolism (3.90e-04) |
(a) GO or GenMAPP terms are discussed in the text. Together with each term, we show the corresponding P-value, according to the hipergeometric distribution (see Methods).
(b) The Z(GO) column shows the Z-scores corresponding to Molecular Function (MF), Biological Process (BP), and Cellular Component (CC), respectively, obtained for each cluster. The Z(GM) column refers to the Z-score corresponding to the GenMAPP terms.
Figure 3Expression levels for the 20 most relevant genes selected in each cluster for the Welsh dataset. Gene descriptions can be found in Table 2 of the Supporting Information. A) (See Figure 3) Up-regulated; B) Down-regulated. (See Figure 4)
Figure 4
Figure 5Validation of genes selected by FADA from the Welsh et al. dataset [14] as overexpressed in prostate cancer. (A) RT-PCR was applied to 14 paired prostate tumor – normal prostate samples to determine the expression levels of a selection of genes shown by FADA as significantly overrepresented in prostate cancer (HPN, KLK3, IQGAP2, POR1 and HER3), and additional genes relevant to this tumor (genes for the receptor tyrosine kinases EGFR, HER2, HER4, and genes for the steroid hormone receptors AR, ERα and ERβ). The expression values for each gene, previously normalized with respect to the S14r expression level in each sample, are shown as ratios of the normalized values in prostate cancer vs. values in the matching normal prostate tissue. Quantitation of desmin expression levels was used to assess the degree of contribution of stromal components in the samples analyzed. Values equal to or above 100-fold are shown as 100. (B) Heatmap representation of the same data (color scale as shown below). (C) Real-time PCR analysis for HER3 transcript levels of laser microdissected tumor and normal samples, compared with relative transcript levels in enriched (non-microdissected) tissues from the same cases. (D) Immunohistochemical analysis of HER3 on paraffin-embedded prostate tissue sections arranged in tissue microarrays (see Methods). Left, low magnification image (×100) of one case, with weak staining for HER3 in normal glands (n), and a strong staining in tumor epithelial cells (t). Right, higher magnification (×400) of a second case.