| Literature DB >> 16919171 |
N Ancona1, R Maglietta, A Piepoli, A D'Addabbo, R Cotugno, M Savino, S Liuni, M Carella, G Pesole, F Perri.
Abstract
BACKGROUND: In this paper we present a method for the statistical assessment of cancer predictors which make use of gene expression profiles. The methodology is applied to a new data set of microarray gene expression data collected in Casa Sollievo della Sofferenza Hospital, Foggia--Italy. The data set is made up of normal (22) and tumor (25) specimens extracted from 25 patients affected by colon cancer. We propose to give answers to some questions which are relevant for the automatic diagnosis of cancer such as: Is the size of the available data set sufficient to build accurate classifiers? What is the statistical significance of the associated error rates? In what ways can accuracy be considered dependant on the adopted classification scheme? How many genes are correlated with the pathology and how many are sufficient for an accurate colon cancer classification? The method we propose answers these questions whilst avoiding the potential pitfalls hidden in the analysis and interpretation of microarray data.Entities:
Mesh:
Year: 2006 PMID: 16919171 PMCID: PMC1564153 DOI: 10.1186/1471-2105-7-387
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Error rate e and p-value p for different training set sizes.
| WVA | RLS | SVM | ||||
| e | e | e | ||||
| 10 | 25% | 0.078 | 21% | 0.048 | 21% | 0.053 |
| 15 | 24% | 0.056 | 19% | 0.035 | 18% | 0.037 |
| 20 | 23% | 0.066 | 16% | 0.028 | 15% | 0.026 |
| 25 | 21% | 0.045 | 16% | 0.028 | 14% | 0.022 |
| 30 | 21% | 0.050 | 15% | 0.027 | 13% | 0.017 |
| 35 | 19% | 0.069 | 14% | 0.027 | 11% | 0.019 |
| 40 | 21% | 0.102 | 15% | 0.109 | 12% | 0.022 |
| 46 | 21% | 0.493 | 14% | 0.489 | 11% | 0.495 |
Figure 1Error rate of a) WVA, b) RLS and c) SVM classifiers varying the training set size.
Figure 2Estimated statistical significance for different training set sizes using WVA, RLS and SVM classifiers.
Figure 3Number of genes more highly expressed in a) normal and b) tumor tissues determined in the actual data set (observed curve) and in data sets with randomly permuted class labels (1% and 5% curves) for different values of the Tstatistics.
Error rate e and p-value p of classifiers trained with a fixed number of examples and a different number of genes.
| WVA | RLS | SVM | ||||
| 22283 | 21% | 0.045 | 14% | 0.027 | 11% | 0.019 |
| 16384 | 20% | 0.065 | 14% | 0.021 | 11% | 0.025 |
| 8192 | 18% | 0.073 | 14% | 0.034 | 14% | 0.039 |
| 4096 | 16% | 0.116 | 14% | 0.021 | 14% | 0.039 |
| 2048 | 15% | 0.168 | 14% | 0.034 | 14% | 0.033 |
| 1024 | 14% | 0.216 | 13% | 0.024 | 13% | 0.040 |
| 512 | 13% | 0.118 | 13% | 0.028 | 14% | 0.033 |
| 256 | 13% | 0.127 | 13% | 0.040 | 14% | 0.025 |
| 128 | 13% | 0.139 | 13% | 0.036 | 14% | 0.013 |
| 64 | 13% | 0.142 | 13% | 0.036 | 14% | 0.022 |
| 32 | 13% | 0.131 | 13% | 0.022 | 14% | 0.031 |
| 16 | 14% | 0.242 | 13% | 0.030 | 14% | 0.040 |
| 8 | 15% | 0.202 | 14% | 0.029 | 14% | 0.041 |
| 4 | 16% | 0.165 | 14% | 0.041 | 16% | 0.031 |
| 2 | 19% | 0.213 | 16% | 0.046 | 16% | 0.041 |
Figure 4Frequency analysis of the genes selected. a) Frequencies of all the genes in the top g = 2048 positions in the sorted gene list. The frequencies of the highly expressed genes in normal and tumor specimens are indicated with HN and HT respectively. b) Number of genes with frequency ≥ 80% and c) the number of genes with a given p-value.
45 genes up-regulated in tumoral tissue, comparing normal mucosa to matched tumor colon tissue.
| Function | Gene | OMIM | Accession no. | p-value | Gene description |
| Cell cycle: mitosis (spindle checkpoint) | TTK | 604092 | 0.029 | Threonine-tyrosine kinase | |
| BUB1 | 602452 | 0.035 | Budding uninhibited by benzimidazoles 1 homolog (yeast) | ||
| BUB3 | 603719 | 0.037 | Budding uninhibited by benzimidazoles 3 homolog (yeast) | ||
| CDC20 | 603618 | 0.044 | Cell division cycle 20 | ||
| MAD2L1 | 602686 | 0.049 | MAD2 (mitotic arrest deficient, yeast, homolog) like-1 | ||
| BUB1B | 602860 | 0.050 | Budding uninhibited by benzimidazoles 1 homolog beta (yeast) | ||
| Cell cycle: G0/G1 transition | INSIG1 | 602055 | 0.039 | Insulin induced gene 1 (cell division cycle, G0 to G1) | |
| Cell cycle: mitosis (G1/S checkpoint) | CKS2 | 116901 | 0.047 | CDC28 protein kinase regulatory subunit 2 | |
| CKS1B | 116900 | 0.046 | CDC28 protein kinase regulatory subunit 1B | ||
| SKP2 | 601436 | 0.050 | S-phase kinase-associated protein 2 (p45) | ||
| FOXM1 | 602341 | 0.045 | Forkhead box M1 | ||
| MCM4 | 602638 | 0.036 | Minichromosome maintenance deficient (S. cerevisiae) 4 | ||
| MCM3 | 602693 | 0.048 | Minichromosome maintenance deficient (S. cerevisiae) 3 | ||
| MCM7 | 600592 | 0.048 | Minichromosome maintenance deficient 7 (S. cerevisiae) | ||
| MCM2 | 116945 | 0.049 | Minichromosome maintenance deficient (S. cerevisiae) 2 | ||
| MCM6 | 601806 | 0.050 | Minichromosome maintenance deficient (S. pombe) 6 | ||
| Cell cycle: mitosis (G1/S and G2/M checkpoints) | CRKRS | 0.039 | Cdc2-related kinase, arginine/serine-rich | ||
| CDC2/CDK1 | 116940 | 0.044 | Cell division cycle 2, G1 to S and G2 to M | ||
| CDC25A | 116947 | 0.050 | Cell division cycle 25A | ||
| CDC25B | 116949 | 0.050 | Cell division cycle 25B | ||
| CCNA2 | 123835 | 0.050 | Cyclin A2 | ||
| Cell cycle: mitosis (G2/M checkpoint) | CCNB1 | 123836 | 0.047 | Cyclin B1(cell division cycle, G2 to M) | |
| CCNB2 | 602755 | 0.047 | Cyclin B2 (cell division cycle, G2 to M) | ||
| NEK2 | 604043 | 0.037 | NIMA (never in mitosis gene a)-related kinase 2 | ||
| Cell cycle: mitosis | STK15 | 602687 | 0.039 | Serine/threonine kinase 6 (chr segregation) | |
| SRPK1 | 601939 | 0.046 | SFRS protein kinase 1 (chr segregation) | ||
| TOP2A | 126430 | 0.050 | Topoisomerase (DNA) II alpha (170 kD) (chr segregation) | ||
| KIF4A | 300521 | 0.035 | Kinesin family member 4A (spindle formation/chr condensation) | ||
| CNAP1 | 609689 | 0.046 | Chromosome condensation-related SMC-associated protein 1 | ||
| SMC4L1 | 0.048 | SMC4 structural maintenance of chromosomes 4-like 1 (yeast) | |||
| HCAP-G | 606280 | 0.042 | Chromosome condensation protein G (chr condensation) | ||
| Signal transduction | TDGF1 | 187395 | 0.048 | Teratocarcinoma-derived growth factor 1 (EGF signaling) | |
| ENC1 | 605173 | 0.048 | Pig 10, ectodermal-neural cortex (WNT//beta-catenin pathway) | ||
| Transcription | SOX9 | 608160 | 0.045 | Sex determining region Y-box 9 | |
| MYC | 190080 | 0.047 | V-myc avian myelocytomatosis viral oncogene homolog | ||
| HGFR/MET | 164860 | 0.047 | Met proto-oncogene | ||
| Transport: intracellular | NUP62 | 605815 | 0.039 | Nucleoporin 62 kD | |
| NUPL1 | 607615 | 0.050 | Nucleoporin-like 1 | ||
| NUP155 | 606694 | 0.045 | Nucleoporin 155 kD (NUP155) | ||
| KPNA2 | 600685 | 0.045 | Karyopherin alpha 2 (RAG cohort 1, importin alpha 1) | ||
| RANBP5 | 602008 | 0.050 | RAN binding protein 5 or karyopherin (importin) beta 3 | ||
| CSE1L/CAS | 601342 | 0.050 | CSE1 chromosome segregation 1-like (yeast) | ||
| NXT1 | 605811 | 0.050 | Nuclear transport factor 2 (NTF2) | ||
| RANBP1 | 601180 | 0.048 | RAN binding protein 1 | ||
| Transport | SLCO4A1 | 605495 | 0.048 | Solute carrier family 21 (organic anion transporter) |
47 genes down-regulated in tumoral tissue, comparing normal mucosa to matched tumor colon tissue.
| Function | Gene | OMIM | Accession no. | p-value | Gene description |
| Apoptosis | PDCD4 | 608610 | 0.032 | Programmed cell death 4 (neoplastic transformation inhibitor) | |
| FAS | 604306 | 0.044 | Fas (TNF receptor superfamily, member 6) | ||
| CASP7 | 601761 | 0.050 | Caspase 7, apoptosis-related cysteine protease | ||
| Transport | SLC30A10 | 0.036 | Solute carrier family 30, member 10 (zinc transport?) | ||
| SLC9A2 | 600530 | 0.041 | Solute carrier family 9 (sodium/hydrogen exchanger), member 2 | ||
| SLC4A4 | 603345 | 0.041 | Solute carrier family 4, sodium bicarbonate cotransporter, member 4 | ||
| SLC26A3 | 126650 | 0.044 | Solute carrier family 26, member 3 | ||
| SLC26A2 | 606718 | 0.044 | Solute carrier family 26 (sulfate transporter), member 2 | ||
| SGK2 | 607589 | 0.038 | Serum glucocorticoid regul. kinase 2 (potassium channel activation) | ||
| KIF5C | 604593 | 0.040 | Kinesin family member 5C (intracellu-lar transport) | ||
| KIF13B | 607350 | 0.046 | Kinesin family member 13B (intracel-lular transport) | ||
| VAPA | 605703 | 0.047 | VAMP (vesicle-associated membrane protein)-assoc. protein A,33 kDa | ||
| Signalling | MAP2K4 | 601335 | 0.033 | Mitogen-activated protein kinase kinase 4 (MAPK signaling pathway) | |
| RPS6KA5 | 603608 | 0.040 | Ribos. prot. S6 kinase, 90 kDa, polyp. 5(MAPK signalling pathway) | ||
| MEF2C | 600662 | 0.033 | MADS box transcr. enhancer factor 2, (MAPK signalling pathway) | ||
| PPP2R3A | 604944 | 0.037 | Protein phosphatase 2, regulatory sub-unit B, alpha (Wnt signalling) | ||
| PDE9A | 602973 | 0.040 | Phosphodiesterase 9A (signal transduc-tion) | ||
| PPAP2A | 607124 | 0.042 | Phosphatidic acid phosphatase type 2A (signal transduction) | ||
| MUC4 | 158372 | 0.044 | Mucin 4 (Erb2 signalling pathway) | ||
| DSCR1 | 602917 | 0.045 | Down syndrome critical region gene 1 (signal transduction) | ||
| SHOC2 | 602775 | 0.046 | Soc-2 suppressor of clear homolog (MAPK signaling pathway) | ||
| SOCS2 | 605117 | 0.049 | Suppressor of cytokine signaling 2 (GH/IGF1 signaling pathway) | ||
| SMAD2 | 601366 | 0.049 | SMAD, homolog 2 (Drosophila) (TGF-beta_signaling) | ||
| Cell-surface signalling | TSPAN7 | 300096 | 0.036 | Tetraspanin 7 | |
| EDG2 | 602282 | 0.041 | Lysophosphatidic acid G-protein-coupled receptor, 2 | ||
| TMPRSS2 | 602060 | 0.046 | Transmembrane protease, serine 2 | ||
| CEACAM7 | 0.047 | Carcinoembryonic antigen-related cell adhesion molecule 7 | |||
| Cell adhesion | DSC2 | 125645 | 0.045 | Desmocollin 2 | |
| Cell differentiation | NDRG2 | 605272 | 0.038 | NDRG family member 2 | |
| EPB41L3 | 605331 | 0.044 | Erythrocyte membrane protein band 4.1-like 3 (suppressor gene?) | ||
| MTUS1 | 609589 | 0.045 | Mitochondrial tumor suppressor 1 | ||
| Metabolism | HMGCL | 246450 | 0.040 | 3-hydroxymethyl-3-methylglutaryl-Coenzyme A lyase | |
| UGDH | 603370 | 0.041 | UDP-glucose dehydrogenase | ||
| CA12 | 603263 | 0.044 | Carbonic anhydrase XII | ||
| CA2 | 259730 | 0.049 | Carbonic anhydrase II | ||
| CA4 | 114760 | 0.050 | Carbonic anhydrase IV | ||
| CA1 | 114800 | 0.050 | Carbonic anhydrase I | ||
| CA7 | 114770 | 0.050 | Carbonic anhydrase VII | ||
| HPGD | 601688 | 0.046 | Hydroxyprostaglandin dehydrogenase 15-(NAD) | ||
| FUCA1 | 230000 | 0.047 | Fucosidase, alpha-L-1, tissue | ||
| ACAT1 | 607809 | 0.048 | Acetyl-Coenzyme A acetyltransferase 1 | ||
| ADH1C | 103730 | 0.048 | Alcohol dehydrogenase3 (class I), gamma polypeptide | ||
| AQP8 | 603750 | 0.050 | Aquaporin 8 | ||
| Cell growth | FAM107A | 608295 | 0.040 | Family with sequence similarity 107, member A (TU3A) | |
| EMP1 | 602333 | 0.047 | Epithelial membrane protein 1 (growth arrest) | ||
| BTG1 | 109580 | 0.050 | B-cell translocation gene 1, anti-proliferative | ||
| KLF4 | 602253 | 0.050 | Kruppel-like factor 4 (gut) |