| Literature DB >> 20569444 |
Joseph Irgon1, C Chris Huang, Yi Zhang, Dmitri Talantov, Gyan Bhanot, Sándor Szalma.
Abstract
BACKGROUND: We have identified a set of genes whose relative mRNA expression levels in various solid tumors can be used to robustly distinguish cancer from matching normal tissue. Our current feature set consists of 113 gene probes for 104 unique genes, originally identified as differentially expressed in solid primary tumors in microarray data on Affymetrix HG-U133A platform in five tissue types: breast, colon, lung, prostate and ovary. For each dataset, we first identified a set of genes significantly differentially expressed in tumor vs. normal tissue at p-value = 0.05 using an experimentally derived error model. Our common cancer gene panel is the intersection of these sets of significantly dysregulated genes and can distinguish tumors from normal tissue on all these five tissue types.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20569444 PMCID: PMC2906482 DOI: 10.1186/1471-2407-10-319
Source DB: PubMed Journal: BMC Cancer ISSN: 1471-2407 Impact factor: 4.430
Sample distribution for each tissue, and number of differentially expressed genes as calculated by Genes@Work at p < 0.05 [5]
| Tissue type | Sample distribution | # of Differentially Expressed Gene probes |
|---|---|---|
| Prostate | 7 Benign, 2 Normal, 10 Cancer | 2035 |
| Lung | 37 Normal, 29 Cancer | 1961 |
| Ovarian | 24 Normal, 22 Cancer | 2717 |
| Colon | 4 Benign, 4 Normal, 33 Cancer | 4159 |
| Breast | 6 Benign, 10 Normal, 31 Cancer | 2704 |
| Intersection | 77 Normal, 125 Cancer | 113 |
Leave-one-out and m-fold CV accuracies for normal-tumor classification in our training data using our panel and a linear SVM classifier
| Tissue Type | Training LOO Accuracy | M-fold CV Accuracy |
|---|---|---|
| Breast | 94.73% | 92.7% |
| Colon | 96.41% | 95.5% |
| Prostate | 100.00% | 99.9% |
| Lung | 95.80% | 97.3% |
| Ovary | 96.20% | 98.2% |
| Global | 91.4% | 90.9% |
Figure 1Example of Genes@Work feature selection. Genes@ Work [5] output for cancer and matched normal sample from the Lung Cancer dataset. Normalized gene expression values for each gene from the cancer and normal sample are plotted on the X and Y axes respectively. The solid blue lines defines statistical significance boundaries at p = 0.05 from the experimental error model. Red points lying outside above and below the solid lines are gene expression values which are significantly different at p < 0.05 between cancer and normal samples.
Figure 2Expression and signature trends of the training data. 2a - Signature trends of each tissue type. Red designates genes that are on average over-expressed when compared to their tissue's average normal expression. Black designates under expression versus normal. 2b - Combined training data for Lung, Breast, Ovarian, Prostate and Colon tissue samples grouped by phenotype and ranked by the ratio of differential gene expression. Tumors are on the left and normals are on the right.
Gene Names, Affymetrix Probe Ids and their corresponding signatures in each tissue type
| Genes | Affy ID | Global | Lung | Colon | Breast | Prostate | Ovary |
|---|---|---|---|---|---|---|---|
| ABAT | 209459_s_at | ||||||
| ACTA2 | 200974_at | ||||||
| AMIGO2 | 222108_at | ||||||
| ANK2 | 202920_at | ||||||
| BHLHB3 | 221530_s_at | ||||||
| C17orf91 | 214696_at | ||||||
| C1orf115 | 218546_at | ||||||
| CAV1 | 203065_s_at | ||||||
| CAV1 | 203065_s_at | ||||||
| CUGBP2 | 202158_s_at | ||||||
| CXCL2 | 209774_x_at | ||||||
| CYLD | 39582_at | ||||||
| DCN | 201893_x_at | ||||||
| DKK3 | 214247_s_at | ||||||
| DMN | 212730_at | ||||||
| DPYSL2 | 200762_at | ||||||
| DST | 212254_s_at | ||||||
| DST | 212254_s_at | ||||||
| EDNRB | 204273_at | ||||||
| EFEMP1 | 201842_s_at | ||||||
| EGR1 | 201694_s_at | ||||||
| EPB41L2 | 201719_s_at | ||||||
| FBLN1 | 202995_s_at | ||||||
| FERMT2 | 209210_s_at | ||||||
| FHL1 | 201540_at | ||||||
| GSN | 200696_s_at | ||||||
| GSTM5 | 205752_s_at | ||||||
| H3F3A///H3F3B | 211998_at | ||||||
| HBA1///HBA2 | 209458_x_at | ||||||
| HBA1///HBA2 | 209458_x_at | ||||||
| HBA1///HBA2 | 209458_x_at | ||||||
| HBB | 209116_x_at | ||||||
| HBB | 209116_x_at | ||||||
| HBB | 209116_x_at | ||||||
| JAM3///LOC100133502 | 212813_at | ||||||
| JUN | 201466_s_at | ||||||
| LAMA4 | 202202_s_at | ||||||
| MAOA | 212741_at | ||||||
| MBNL1 | 201153_s_at | ||||||
| ME1 | 204059_s_at | ||||||
| MEIS1 | 204069_at | ||||||
| METTL7A | 207761_s_at | ||||||
| MT1P2 | 211456_x_at | ||||||
| MYLK | 202555_s_at | ||||||
| NR3C1 | 211671_s_at | ||||||
| OPTN | 202073_at | ||||||
| PAM | 202336_s_at | ||||||
| PDGFRA | 203131_at | ||||||
| PDLIM3 | 209621_s_at | ||||||
| PLSCR4 | 218901_at | ||||||
| PPAP2A | 210946_at | ||||||
| PPAP2B | 212230_at | ||||||
| RAB31 | 217762_s_at | ||||||
| RBPMS | 209487_at | ||||||
| RCAN2 | 203498_at | ||||||
| RNASE4 | 205158_at | ||||||
| SEPP1 | 201427_s_at | ||||||
| SERPINA1 | 211429_s_at | ||||||
| SORBS1 | 218087_s_at | ||||||
| SPARCL1 | 200795_at | ||||||
| SRPX | 204955_at | ||||||
| STOM | 201060_x_at | ||||||
| SYNPO | 202796_at | ||||||
| TMEM140 | 218999_at | ||||||
| TMEM47 | 209656_s_at | ||||||
| TNS1 | 221748_s_at | ||||||
| TSC22D3 | 208763_s_at | ||||||
| TUBA1A | 209118_s_at | ||||||
| WWTR1 | 202133_at | ||||||
| ZFP36L2 | 201368_at | ||||||
| ACTR3 | 200996_at | ||||||
| C7orf24 | 215380_s_at | ||||||
| CALR | 214315_x_at | ||||||
| CKS2 | 204170_s_at | ||||||
| COL6A2 | 209156_s_at | ||||||
| CXCL1 | 204470_at | ||||||
| DHCR24 | 200862_at | ||||||
| FABP5///FABP5L2///FABP5L7 | 202345_s_at | ||||||
| GALNT7 | 218313_s_at | ||||||
| GAPDH | M33197_5_at | ||||||
| GAPDH | M33197_5_at | ||||||
| HDGF | 200896_x_at | ||||||
| HIST1H2AC | 215071_s_at | ||||||
| HIST1H2BD | 209911_x_at | ||||||
| HIST1H2BK | 209806_at | ||||||
| HIST2H2AA3///HIST2H2AA4 | 214290_s_at | ||||||
| HN1L | 212115_at | ||||||
| KIAA0101 | 202503_s_at | ||||||
| KRT19 | 201650_at | ||||||
| KRT8 | 209008_x_at | ||||||
| LAPTM4B | 214039_s_at | ||||||
| LOC100130414///SPINT2 | 210715_s_at | ||||||
| LRRFIP1 | 201861_s_at | ||||||
| LSR | 208190_s_at | ||||||
| MCM2 | 202107_s_at | ||||||
| MIF | 217871_s_at | ||||||
| MLF1IP | 218883_s_at | ||||||
| MLPH | 218211_s_at | ||||||
| NME1///NME2 | 201577_at | ||||||
| NOLA2 | 209104_s_at | ||||||
| NPM1 | 221923_s_at | ||||||
| NUSAP1 | 218039_at | ||||||
| P4HB | 200654_at | ||||||
| PABPC3 | 208113_x_at | ||||||
| PDIA4 | 211048_s_at | ||||||
| PMM2 | 203201_at | ||||||
| RNASE2 | 215193_x_at | ||||||
| RRBP1 | 201204_s_at | ||||||
| SORD | 201563_at | ||||||
| TACSTD1 | 201839_s_at | ||||||
| TMED3 | 208837_at | ||||||
| YWHAZ | 200641_s_at | ||||||
| ZWINT | 204026_s_at |
1 (Blue) represents upregulated with respect to normal and -1 (Red) downregulated.
Leave-one-out and validation accuracies for normal-tumor classification in various tissue types using our panel and a linear SVM classifier
| Data Set | Accuracy |
|---|---|
| Multi-tissue SVM on Wang Data[ | 100% |
| Breast Cancer Metastasis Training [ | 88.80% |
| Multi-tissue SVM on Ovarian Cancer GCOD (2 studies) [ | 100% |
| Multi-tissue SVM on Colon Cancer GCOD (2 studies) [ | 100% |
| Other tissue validation sets (Bladder, Melanoma, ccRCC) [ | 95.4% |
Figure 3Distribution of LOO accuracies. Distribution of LOO accuracies using randomly selected gene lists to classify Lung cancer from the dataset of Spira et al. [6], overlayed with accuracies using the dataset from [6] compared to our gene panel.
Comparison with the Rhodes and Xu signatures on the same independent data
| Rhodes Signature | Xu Signature | Our Signature | |||||
|---|---|---|---|---|---|---|---|
| GSE# | Accuracy(%) | P-value | Accuracy(%) | P-value | Accuracy(%) | P-value | |
| Gordon_Lung | GSE2549 | 91.8 | 3.48E-07 | 95.9 | 1.75E-07 | 97.8 | 3.50E-09 |
| Hoffman_Myometrium | GSE593 | 80 | 2.06E-01 | 80 | 8.33E-02 | 90 | 8.00E-04 |
| Lenburg_Kidney | GSE781 | 76.5 | 1.01E-01 | 76.5 | 1.07E-01 | 76.5|100* | 1.00E-10 |
| Talantov_Skin | GSE3189 | 94.2 | 8.97E-07 | 98.1 | 3.44E-07 | 94.2 | 5.11E-04 |
| Wachi_Lung | GSE3268 | 100 | 3.97E-03 | 100 | 3.97E-03 | 100 | 3.97E-03 |
| Yoon_Soft_Tissue | GSE2719 | 85.2 | 5.67E-08 | 96.3 | 6.76E-11 | 94.3 | 9.50E-07 |
| Overall | 89.1 | 9.28E-26 | 94.3 | 9.74E-30 | 96 | 1.04E-30 | |
*100% Accuracy achieved after labelling is corrected.