| Literature DB >> 18611264 |
Kamila Naxerova1, Carol J Bult, Anne Peaston, Karen Fancher, Barbara B Knowles, Simon Kasif, Isaac S Kohane.
Abstract
BACKGROUND: In recent years, the molecular underpinnings of the long-observed resemblance between neoplastic and immature tissue have begun to emerge. Genome-wide transcriptional profiling has revealed similar gene expression signatures in several tumor types and early developmental stages of their tissue of origin. However, it remains unclear whether such a relationship is a universal feature of malignancy, whether heterogeneities exist in the developmental component of different tumor types and to which degree the resemblance between cancer and development is a tissue-specific phenomenon.Entities:
Mesh:
Year: 2008 PMID: 18611264 PMCID: PMC2530866 DOI: 10.1186/gb-2008-9-7-r108
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Figure 1Approach to data analysis. A developmental timeline (DT), which is a linear number ray on which each of 5,166 genes has a definite position, is constructed from a time course of gene expression during development (top left panel), positioning genes that are expressed in early development on the left end, genes that are upregulated in late development on the right end and neutral genes in the middle. The DT is integrated with genes that are deregulated in a population of tumors versus corresponding normal tissues (top right panel). (a) Frequency plot showing a histogram-like representation of the frequency of upregulated (red) and downregulated (green) cancer genes in different portions of the DT. The height of each bar indicates how many deregulated genes map to one of 13 equally sized segments of the DT. Each segment corresponds to approximately 400 genes. Up- and downregulated genes are depicted on separate DTs, that is, the first red bar refers to the same DT segment as the first green bar. Stated differently, the height of the first red bar signifies the number of upregulated cancer genes that map to the first 400 developmental genes and the height of the first green bar signifies the number of downregulated cancer genes that map to the same set of 400 developmental genes. (b) Probability density plot showing P(DEV[1,2,3...i] | cancer) for i = 2,3...5,166 for upregulated and downregulated cancer genes. The probability of being among the first i genes on the DT (genes are numbered 1-5,166 from left/early to right/late) if deregulated in cancer directly reflects the preference of cancer genes for different segments of the DT. The shape of each probability distribution is summarized by two linear functions that are fitted to its early and late portions (blue lines). The slopes of these functions are subsequently used as a quantification of the developmental profile of a cancer.
Figure 2Frequency plots and probability distributions for (a) lung adenocarcinoma, (b) Wilms' tumor, (c) glioblastoma, (d) clear cell ovarian cancer and (e) liver cirrhosis. These cases were selected because they are representative of most tumors in our database.
Figure 3Heatmap of probability distribution slopes. Thirty-two expression data sets of neoplasia versus corresponding normal tissue (and liver cirrhosis versus normal liver, dysplastic liver versus normal liver and ulcerative colitis versus non-inflamed colon) are compared against all 10 DTs. Each comparison is characterized by a four-dimensional vector of slopes derived from the probability distributions (example in top left corner). Two slope values stem from the distribution of upregulated genes on the DT, two are derived from the distribution of downregulated genes (Figure 1). UpE = slope for upregulated genes in the early part of the DT; UpL = slope for upregulated genes in the late part of the DT; DownE = slope for downregulated genes in the early part of the DT; DownL = slope for downregulated genes in the late part of the DT. Red indicates a steep slope (high specificity of up- or downregulated genes for that segment of the DT), green indicates a flat slope (depletion of up- or downregulated genes in that segment).
Figure 4Effects of CC subtraction. Frequency plots of selected cancer types on the backdrop of lung development (left panel) and ES cell differentiation (middle panel) are depicted before and after the dismissal of hundreds of CC regulated genes. The corresponding probability distributions can be viewed in Additional data files 9 and 10. The right panel shows the effects of this CC subtraction on all data sets, quantified as the difference of the early probability distribution slope value (UpE) before and after elimination of CC regulated genes. PEN versus ESEN = proliferating endometrium versus early secretory endometrium; PEN versus MSEN = proliferating endometrium versus mid secretory endometrium.
GO category enrichment
| BP - overrepresented | BP - underrepresented | CC - overrepresented | |
| eDEV500 | DNA replication | Multicellular organismal process | Intracellular |
| lDEV500 | Immune response | Biopolymer metabolic process | Membrane |
| Group 1 (16) | |||
| Up | DNA repair (15) | Multicellular organismal process (16) | Intracellular (16) |
| Down | Multicellular organismal process (15) | Primary metabolic process (14) | Plasma membrane (16) |
| Group 2 (6) | |||
| Up | Cell cycle (6) | Multicellular organismal development (5) | Chromosome (6) |
| Down | Monovalent inorganic cation transport (5) | DNA recombination (6) | Proton-transporting two-sector ATPase complex (5) |
| Group 3 (13) | |||
| Up | Immune response (10) | Cellular metabolic process (10) | Plasma membrane (10) |
| Down | Cellular metabolic process (10) | Multicellular organismal process (10) | Cytoplasm (10) |
| PEN versus ESEN | |||
| Up | DNA replication | Biosynthetic process | Chromosome |
| Down | Lipid metabolic process | Macromolecule metabolic process | Desmosome |
Next to the most significant GO categories for eDEV500, lDEV500 and PEN versus ESEN, the GO categories that are most frequently enriched in the up- and downregulated genes of group 1, 2 and 3 data sets are listed with the number of occurrences in parentheses. BP, biological process; CC, cellular component. For example, DNA repair is enriched in the upregulated genes of 15 out of 16 data sets belonging to group 1.
MSigDB C2 gene sets most significantly enriched in groups 1-3
| Upregulated genes | Downregulated genes |
| Group 1 | |
| eDEV500 | lDEV500 |
| STEMCELL_NEURAL_UP | SANSOM_APC_5_DN |
| eDEV200 | NAKAJIMA_MCS_UP |
| TARTE_PLASMA_BLASTIC | TARTE_MATURE_PC |
| STEMCELL_EMBRYONIC_UP | CALCIUM_REGULATION_IN_CARDIAC_CELLS |
| PRMT5_KD_UP | LEE_DENA_DN |
| CANCER_NEOPLASTIC_META_UP | SMOOTH_MUSCLE_CONTRACTION |
| LI_FETAL_VS_WT_KIDNEY_DN | YAO_P4_KO_VS_WT_UP |
| eDEV100 | lDEV200 |
| MOREAUX_TACI_HI_IN_PPC_UP | LEE_ACOX1_DN |
| Group 2 | |
| HOFFMANN_BIVSBII_BI_TABLE2 | FLECHNER_KIDNEY_TRANSPL_REJ_DN |
| YU_CMYC_UP | AGEING_KIDNEY_SPECIFIC_DN |
| DNA_REPLICATION_REACTOME | CHANG_SERUM_RESPONSE_DN |
| eDEV500 | LE_MYELIN_DN |
| SERUM_FIBROBLAST_CORE_UP | AGEING_KIDNEY_DN |
| CMV_IE86_UP | VENTRICLES_UP |
| CHANG_SERUM_RESPONSE_UP | CARIES_PULP_DN |
| G1_TO_S_CELL_CYCLE_REACTOME | UVB_NHEK1_UP |
| PEART_HISTONE_DN | SMOOTH_MUSCLE_CONTRACTION |
| GENOTOXINS_ALL_4HRS_REG | BRCA_ER_POS |
| Group 3 | |
| SERUM_FIBROBLAST_CELLCYCLE | FLECHNER_KIDNEY_TRANSPL_REJ_DN |
| BRCA_ER_NEG | AGEING_KIDNEY_DN |
| TARTE_MATURE_PC | IDX_TSA_UP_CLUSTER6 |
| DAC_PANC_UP | AGEING_KIDNEY_SPECIFIC_DN |
| SANSOM_APC_5_DN | DIAB_NEPH_DN |
| NAKAJIMA_MCS_UP | CARIES_PULP_DN |
| CANCER_UNDIFFERENTIATED_META_UP | MITOCHONDRIA |
| HIF1_TARGETS | BRCA_ER_POS |
| LEE_TCELLS3_UP | VENTRICLES_UP |
| GENOTOXINS_ALL_4HRS_REG | HEARTFAILURE_ATRIA_DN |
Figure 5Heatmap of enrichment p-values. The p-values for gene sets that ranked among the 20 most enriched in the upregulated genes of either group 1, 2 or 3 are shown for all data sets. Red indicates low p-values, green high p-values.
Figure 6Heatmap of probability distribution slopes for all data sets with respect to the lung development validation time series. Abbreviations and colors are the same as in Figure 3.
Figure 7Average expression level of consensus gene sets in the lung development validation time series. Consensus group 1 = genes overexpressed in 11/16 data sets belonging to group 1; consensus group 2 = genes overexpressed in 5/6 data sets belonging to group 2; consensus group 3 = genes overexpressed in 8/13 data sets belonging to group 3.
Example genes from the consensus sets of groups 1-3 ordered by their average rank across all DTs
| ProbeID | Average rank | Gene symbol | Description | |
| Consensus group 1 | ||||
| Early | 201577_at | 627.1 | NME1* | Non-metastatic cells 1, protein (NM23A) |
| 200812_at | 633.8 | Chaperonin containing TCP1, subunit 7 (eta) | ||
| 201202_at | 875.7 | Proliferating cell nuclear antigen | ||
| 205436_s_at | 890.0 | H2A histone family, member X | ||
| 201476_s_at | 904.6 | Ribonucleotide reductase M1 polypeptide | ||
| 200910_at | 922.9 | Chaperonin containing TCP1, subunit 3 (gamma) | ||
| 202330_s_at | 924.1 | Uracil-DNA glycosylase | ||
| 202503_s_at | 1060.0 | KIAA0101* | KIAA0101 | |
| 222077_s_at | 1146.2 | Rac GTPase activating protein 1 | ||
| 204170_s_at | 1188.5 | CDC28 protein kinase regulatory subunit 2 | ||
| Consensus group 2 | ||||
| Early | 204766_s_at | 530.6 | Nudix (nucleoside diphosphate linked moiety X)-type motif 1 | |
| 204825_at | 629.0 | Maternal embryonic leucine zipper kinase | ||
| 203270_at | 636.6 | Deoxythymidylate kinase (thymidylate kinase) | ||
| 218585_s_at | 768.5 | DTL | Denticleless homolog ( | |
| 220085_at | 772.6 | Helicase, lymphoid-specific | ||
| Late | 205741_s_at | 3587.9 | DTNA | Dystrobrevin, alpha |
| 204279_at | 3816.5 | Proteasome (prosome, macropain) subunit, beta type, 9 (large multifunctional peptidase 2) | ||
| 204416_x_at | 3944.6 | APOC1 | Apolipoprotein C-I | |
| 202307_s_at | 3987.2 | Transporter 1, ATP-binding cassette, sub-family B (MDR/TAP) | ||
| 209040_s_at | 4243.6 | Proteasome (prosome, macropain) subunit, beta type, 8 (large multifunctional peptidase 7) | ||
| Consensus group 3 | ||||
| Early | 204825_at | 629.0 | Maternal embryonic leucine zipper kinase | |
| 202503_s_at | 1060.0 | KIAA0101* | KIAA0101 | |
| 202705_at | 1095.9 | Cyclin B2 | ||
| 222077_s_at | 1146.2 | Rac GTPase activating protein 1 | ||
| 204170_s_at | 1188.5 | CDC28 protein kinase regulatory subunit 2 | ||
| Late | 208997_s_at | 3296.1 | Uncoupling protein 2 (mitochondrial, proton carrier) | |
| 205798_at | 3499.9 | Matrix metallopeptidase 7 (matrilysin, uterine) | ||
| 202307_s_at | 3724.7 | Protective protein for beta-galactosidase (galactosialidosis) | ||
| 209166_s_at | 3946.3 | Interleukin 7 receptor | ||
| 206707_x_at | 3987.2 | Transporter 1, ATP-binding cassette, sub-family B (MDR/TAP) | ||
| 208812_x_at | 4485.7 | Major histocompatibility complex, class I, C |
To identify only the most relevant genes, the definition of 'consensus gene set' was tightened from the definition employed in Figure 7. Consensus group 1 = 20 genes overexpressed in at least 15/16 data sets belonging to group 1; consensus group 2 = 58 genes overexpressed in all data sets (6/6) belonging to group 2; consensus group 3 = 29 genes overexpressed in 9/13 data sets belonging to group 3. Bold entries are those expressed more than three times above median in at least one of murine E6, E7, E8, E9, E10 (Symatlas). Italicized entries are those expressed more than three times above median in at least one of the following human cell types: CD4+ T cells, CD8+ T cells, CD19+ B cells (peripheral blood), BDCA4+ dendritic cells, B lymphoblasts (peripheral blood). *Expression data not available in Symatlas.
Figure 8Cancer core program genes before and after cell cycle subtraction. Genes overexpressed in >20/32 data sets and with an average DT rank <1,000 are marked in red and their names are listed below the table (left panel). Analogously for the right panel, with the parameter relaxed to overexpression in >15/32 data sets to account for the reduced number of genes after elimination of CC genes. Genes belonging to the GO category 'cell cycle' are marked as orange asterisks (and with orange boxes in the right panel) to allow a better assessment of the effects of CC subtraction.