| Literature DB >> 24656147 |
Ivan P Gorlov1, Ji-Yeon Yang, Jinyoung Byun, Christopher Logothetis, Olga Y Gorlova, Kim-Anh Do, Christopher Amos.
Abstract
BACKGROUND: Whole-genome profiling of gene expression is a powerful tool for identifying cancer-associated genes. Genes differentially expressed between normal and tumorous tissues are usually considered to be cancer associated. We recently demonstrated that the analysis of interindividual variation in gene expression can be useful for identifying cancer associated genes. The goal of this study was to identify the best microarray data-derived predictor of known cancer associated genes.Entities:
Mesh:
Year: 2014 PMID: 24656147 PMCID: PMC3997969 DOI: 10.1186/1471-2164-15-223
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1An outline of the study design.
Brief description of the datasets used
| Breast | [ | GSE10780 | Affymetrix HG-U133_Plus_2 | 35764 | 142 | 42 |
| Breast | [ | GDS3716 | Affymetrix HuEx-1_0-st | 21169 | 24 | 18 |
| Colorectal | [ | GSE31737 | Affymetrix HG-U133_Plus_2 | 17528 | 40 | 40 |
| Lung | [ | GSE19188 | Affymetrix HG-U133_Plus_2 | 38597 | 65 | 91 |
| Lung | [ | GSE18842 | Affymetrix HG-U133A | 38578 | 45 | 45 |
| Prostate | [ | GSE21034 | Affymetrix HuEx-1_0-st | 27090 | 29 | 29 |
| Prostate | [ | GSE6919 | Affymetrix HG_U95Av2 | 27964 | 63 | 63 |
*Only probes linked to a single gene were used in the analysis.
AN, adjacent normal tissue; T, tumor tissue.
Figure 2The “shifting means model” (upper panel) and the “outliers model” (lower panel) of gene expression in tumors. In the shifting means model, all tumors are similar in terms of gene expression. In the outlier’s model, the tumors are heterogeneous: a specific cancer gene is extremely upregulated or downregulated only in a small fraction of tumors in which this gene is a driver of tumorigenesis.
Differences between cancer and all other genes for 4 cancer gene predictors in 7 datasets
| BC_GSE10780 | -LOG(P) | 6.09 | 5.84 | 2.89 | 0.003852 | 6 |
| m(AN) | 6.32 | 6.16 | 3.68 | 0.000233 | 4 | |
| m(T) | 6.51 | 6.18 | 3.3 | 0.000967 | 5 | |
| FC | 1.33 | 1.19 | 4.65 | 3.32E-06 | 3 | |
| sd(AN) | 0.42 | 0.33 | 5.61 | 2.02E-08 | 1 | |
| sd(T) | 0.62 | 0.45 | 5.37 | 7.87E-08 | 2 | |
| BC_GSE3716 | -LOG(P) | 0.62 | 0.64 | 1.04 | 0.29834 | 3 |
| m(AN) | 7.34 | 7.42 | 0.37 | 0.711382 | 6 | |
| m(T) | 7.41 | 7.51 | 0.75 | 0.453255 | 4 | |
| FC | 1.21 | 1.21 | 0.58 | 0.561915 | 5 | |
| sd(AN) | 0.84 | 0.79 | 2.79 | 0.005271 | 2 | |
| sd(T) | 0.89 | 0.81 | 3.55 | 0.000385 | 1 | |
| CC_GSE31737 | -LOG(P) | 3.51 | 2.53 | 4.64 | 3.48E-06 | 6 |
| m(AN) | 5.74 | 4.77 | 5.54 | 3.02E-08 | 5 | |
| m(T) | 5.89 | 4.77 | 6.19 | 6.02E-10 | 4 | |
| FC | 1.40 | 1.19 | 6.58 | 4.7E-11 | 2 | |
| sd(AN) | 0.38 | 0.22 | 6.52 | 7.03E-11 | 3 | |
| sd(T) | 0.56 | 0.24 | 7.53 | 5.07E-14 | 1 | |
| LC_GSE19188 | -LOG(P) | 6.55 | 5.37 | 1.15 | 0.250144 | 6 |
| m(AN) | 6.49 | 6.06 | 4.74 | 2.14E-06 | 4 | |
| m(T) | 6.61 | 6.12 | 4,47 | 7.82E-06 | 5 | |
| FC | 1.58 | 1.29 | 5.62 | 1.91E-08 | 3 | |
| sd(AN) | 0.34 | 0.17 | 7.86 | 3.84E-15 | 2 | |
| sd(T) | 0.93 | 0.56 | 10.77 | 4.77E-27 | 1 | |
| LC_GSE18842 | -LOG(P) | 5.98 | 4.31 | 3.08 | 0.00207 | 6 |
| m(AN) | 6.24 | 5.75 | 6.26 | 3.85E-10 | 4 | |
| m(T) | 6.47 | 5.75 | 6.56 | 5.38E-11 | 3 | |
| FC | 1.66 | 1.31 | 6.1 | 1.06E-09 | 5 | |
| sd(AN) | 0.43 | 0.33 | 8.21 | 2.21E-16 | 2 | |
| sd(T) | 0.75 | 0.47 | 8.72 | 2.78E-18 | 1 | |
| PC_GSE6919 | -LOG(P) | 2.28 | 1.75 | 3.62 | 0.000295 | 4 |
| m(AN) | 7.41 | 7.01 | 1.87 | 0.061484 | 5 | |
| m(T) | 7.42 | 6.99 | 1.86 | 0.062886 | 6 | |
| FC | 1.32 | 1.21 | 4,57 | 4.88E-06 | 2 | |
| sd(AN) | 0.61 | 0.59 | 3.87 | 0.000109 | 3 | |
| sd(T) | 0.74 | 0.64 | 4.71 | 2.48E-06 | 1 | |
| PC_GSE21034 | -LOG(P) | 2.55 | 1.68 | 3.97 | 7.19E-05 | 6 |
| m(AN) | 8.65 | 7.86 | 6.82 | 9.1E-12 | 4 | |
| m(T) | 8.59 | 7.82 | 5.87 | 4.36E-09 | 5 | |
| FC | 1.21 | 1.12 | 6.78 | 1.2E-11 | 3 | |
| sd(AN) | 0.31 | 0.27 | 6.3 | 2.98E-10 | 3 | |
| sd(T) | 0.37 | 0.28 | 6.83 | 8.49E-12 | 1 |
CG, cancer genes; OG, other genes. Statistics from nonparametric Mann-Whitney test; rank, rank of the variable for a given dataset based on Z score; m(AN), mean expression in adjacent normal tissue: m(T), mean expression in tumor tissue; FC, fold change; SD(AN), standard deviation of the gene expression values in adjacent normal tissue; SD(T), standard deviation of the gene expression values in tumor tissue.
Figure 3The enrichment factor (EF) for known cancer genes among the top 5% of the probes ranked on the basis of the predicting variables. The horizontal lines show the expected proportion of cancer genes under the null hypothesis. Left panel shows individual studies, right panel shows averages across the studies.
Percentage of outliers in tumor samples
| GSE3716_BC | Paired controls | 7.01 | 439 | 0.38 | | |
| Cancer genes | 7.65 | 226 | 0.54 | 1.08 | 0.22 | |
| GSE10780_BC | Paired controls | 4.36 | 457 | 0.38 | | |
| Cancer genes | 4.59 | 469 | 0.41 | 0.84 | 0.61 | |
| GSE31737_CC | Paired controls | | | |||
| Cancer genes | ||||||
| GSE18842_LC | Paired controls | | | |||
| Cancer genes | ||||||
| GSE19188_LC | Paired controls | | | |||
| Cancer genes | ||||||
| GSE6919_PC | Paired controls | 0.81 | 457 | 0.08 | | |
| Cancer genes | 0.98 | 230 | 0.11 | 0.56 | 0.74 | |
| GSE21034_PC | Paired controls | 1.72 | 343 | 0.34 | | |
| Cancer genes | 2.51 | 182 | 0.52 | 1.03 | 0.11 | |
Data in bold face are statistically significant between paired controls and cancer genes.
Differences between recently reported cancer genes and all other genes in the human genome
| BC_GSE10780 | -LOG(P) | 5.81 | 4.36 | 2.33 | 0.019828 | 6 |
| m(AN) | 6.65 | 6.16 | 3.09 | 0.002 | 5 | |
| m(T) | 6.77 | 6.18 | 3.61 | 0.0003 | 3 | |
| FC | 0.37 | 0.25 | 3.43081 | 0.000602 | 4 | |
| sd(AN) | 0.31 | 0.41 | 3.88355 | 0.000103 | 2 | |
| sd(T) | 0.62 | 0.45 | 4.52874 | 0.000006 | 1 | |
| BC_GSE3716 | -LOG(P) | 0.53 | 0.64 | 1.27891 | 0.200931 | 2 |
| m(AN) | 7.74 | 7.43 | 0.92 | 0.36 | 4 | |
| m(T) | 7.79 | 7.50 | 0.79 | 0.43 | 5 | |
| FC | 0.26 | 0.27 | 0.26448 | 0.791411 | 6 | |
| sd(AN) | 0.82 | 0.79 | 1.2498 | 0.211373 | 3 | |
| sd(T) | 0.89 | 0.80 | 2.71994 | 0.00653 | 1 | |
| CC_GSE31737 | -LOG(P) | 2.99 | 2.38 | 1.41345 | 0.157523 | 6 |
| m(AN) | 5.24 | 4.78 | 2.11 | 0.04 | 4 | |
| m(T) | 5.24 | 4.78 | 2.06 | 0.04 | 5 | |
| FC | 0.40 | 0.25 | 2.78341 | 0.005379 | 3 | |
| sd(AN) | 0.52 | 0.40 | 3.52082 | 0.00043 | 2 | |
| sd(T) | 0.57 | 0.43 | 4.12776 | 0.000037 | 1 | |
| LC_GSE19188 | -LOG(P) | 6.47 | 5.38 | 1.91017 | 0.056112 | 6 |
| m(AN) | 6.55 | 6.24 | 2.01 | 0.04 | 5 | |
| m(T) | 6.71 | 6.57 | 2.11 | 0.04 | 4 | |
| FC | 0.61 | 0.38 | 4.50134 | 0.000007 | 3 | |
| sd(AN) | 0.52 | 0.35 | 5.09778 | <10-6 | 2 | |
| sd(T) | 0.84 | 0.56 | 7.51503 | <10-6 | 1 | |
| LC_GSE18842 | -LOG(P) | 6.01 | 4.32 | 3.59092 | 0.00033 | 6 |
| m(AN) | 6.67 | 5.75 | 3.71 | 0.0002 | 5 | |
| m(T) | 6.81 | 5.75 | 4.91 | 0.000001 | 4 | |
| FC | 0.78 | 0.39 | 5.99594 | <10-6 | 2 | |
| sd(AN) | 0.51 | 0.33 | 5.1629 | <10-6 | 3 | |
| sd(T) | 0.82 | 0.47 | 8.26221 | <10-6 | 1 | |
| PC_GSE6919 | -LOG(P) | 0.94 | 0.87 | 0.74248 | 0.457796 | 6 |
| m(AN) | 7.16 | 7.01 | 0.92 | 0.36 | 5 | |
| m(T) | 7.18 | 6.99 | 1.04 | 0.3 | 4 | |
| FC | 0.17 | 0.14 | 1.47057 | 0.141409 | 3 | |
| sd(AN) | 0.63 | 0.59 | 1.84723 | 0.064715 | 2 | |
| sd(T) | 0.69 | 0.64 | 2.15976 | 0.030792 | 1 | |
| PC_GSE21034 | -LOG(P) | 1.69 | 1.69 | 0.74953 | 0.453541 | 6 |
| m(AN) | 8.09 | 7.87 | 2.16 | 0.03 | 4 | |
| m(T) | 8.15 | 7.83 | 2.25 | 0.02 | 3 | |
| FC | 0.24 | 0.16 | 1.98956 | 0.04664 | 5 | |
| sd(AN) | 0.32 | 0.27 | 2.66829 | 0.007624 | 2 | |
| sd(T) | 0.34 | 0.29 | 2.99909 | 0.002708 | 1 |
RRCG, recently reported cancer genes; OG, other genes.
Figure 4The proportions of correctly predicted cancer genes for the shifting means (left panel) and outliers (right panel) models. The prediction based on – LOG(P) is shown in blue, and that based on SD(T) is shown in red.
Results of applying the binary logistic regression model to the 7 datasets
| | | ||||||
|---|---|---|---|---|---|---|---|
| Breast | GDS3716 | ns | Ns | Ns | ns | Ns | 10.5(0.001) |
| Breast | GSE10780 | ns | Ns | Ns | ns | Ns | 76.1(<10-6) |
| Colorectal | GSE31737 | 5.2(0.02) | Ns | 19.6(<10-6)1.5 E-82 | ns | Ns | 27.8(<10-6) |
| Lung | GSE18842 | ns | Ns | 13.1(<10-6)3.3 E-39 | 15.2(3.5 E-52) | 6.5(0.01) | 41.5(<10-6) |
| Lung | GSE19188 | ns | Ns | Ns | ns | Ns | 220.1(<10-6) |
| Prostate | GSE6919 | 7.9(0.005) | | 7.1(0.007) | ns | Ns | 74.9(<10-6) |
| Prostate | GSE21034 | 22.3(<10-6) | 4.8(0.04) | 18.9 | 8.1(0.004) | 48.8(<10-6) | |
ns – the variable is not significant; numbers are Wald statistics for the variables in the model; significance is shown in parentheses.