| Literature DB >> 26568679 |
Nabila Bennani-Baiti1, Idriss M Bennani-Baiti2.
Abstract
Whole-genome analyses have uncovered that most cancer-relevant genes cluster into 12 signaling pathways. Knowledge of the signaling pathways and associated gene signatures not only allows us to understand the mechanisms of oncogenesis inherent to specific cancers but also provides us with drug targets, molecular diagnostic and prognosis factors, as well as biomarkers for patient risk stratification and treatment. Publicly available genomic data sets constitute a wealth of gene mining opportunities for hypothesis generation and testing. However, the increasingly recognized genetic and epigenetic inter- and intratumor heterogeneity, combined with the preponderance of small-size cohorts, hamper reliable analysis and discovery. Here, we review two methods that are used to infer meaningful biological events from small-size data sets and discuss some of their applications and limitations.Entities:
Keywords: cohort size; expression profiling; gene data set; intertumor heterogeneity; intratumor heterogeneity; low-incidence cancers
Year: 2015 PMID: 26568679 PMCID: PMC4631160 DOI: 10.4137/CIN.S32696
Source DB: PubMed Journal: Cancer Inform ISSN: 1176-9351
Figure 1Probability density distribution of all cancer gene data sets in Gene Expression Omnibus (GEO). All cancer data sets were retrieved from GEO (query performed on August 15, 2015) and plotted against sample size (x axis). Gene data sets size refers to the number of tumor samples per data set. The analysis included 368 data sets and 9,845 tumor samples. Only data sets limited to tumor samples were retrieved; those solely listing data on tumor stroma or normal peripheral blood lymphocytes in cancer patients or those that combined several cancer types were omitted from the analysis. There were no other exclusion criteria.
Figure 2Probability density distribution of cancer gene data sets of high-incidence adult cancers. All cancer data sets were retrieved from GEO (query performed on August 15, 2015) and plotted against sample size (x axis). These included (A) breast cancer (number of data sets n = 110), (B) prostate cancer (n = 43), (C) lung cancer (n = 25), and (D) colorectal cancer (n = 37).
Figure 3Bivariate kernel density estimates of gene expression consistency across small-size cohorts. Genes tested were LSD1 (A), EZH2 (B), and CXCR4 (C). x and y axes represent Log2 expression values of given genes (x or y) across Ewing’s sarcoma data sets either in GDS# GSE7007 (x axis; n = 27) or in ArrayExpress data set E-MEXP-1142 (y axis; n = 27). Lines across graphs depict regression curves as computed in a multiple regression model based on the ordinary least squares method as defined by the equation .