| Literature DB >> 24959000 |
Chi-Ming Chu1, Chung-Tay Yao2, Yu-Tien Chang1, Hsiu-Ling Chou3, Yu-Ching Chou4, Kang-Hua Chen5, Harn-Jing Terng6, Chi-Shuan Huang7, Chia-Cheng Lee8, Sui-Lun Su4, Yao-Chi Liu9, Fu-Gong Lin4, Thomas Wetter10, Chi-Wen Chang5.
Abstract
BACKGROUND: Microarray technology shows great potential but previous studies were limited by small number of samples in the colorectal cancer (CRC) research. The aims of this study are to investigate gene expression profile of CRCs by pooling cDNA microarrays using PAM, ANN, and decision trees (CART and C5.0).Entities:
Mesh:
Year: 2014 PMID: 24959000 PMCID: PMC4055246 DOI: 10.1155/2014/634123
Source DB: PubMed Journal: Dis Markers ISSN: 0278-0240 Impact factor: 3.434
Figure 1(a) The research process of flow chart diagram for the datasets with analysis. (b) Diagram of the methods used to identify candidate genes and establish prediction models.
Descriptive statistics of study samples.
| Variables | Normal mucosa | Colorectal tumors | Logistic regression | |||
|---|---|---|---|---|---|---|
|
| % |
| % | OR |
| |
| Gender ( | ||||||
| Female | 10 | 48 | 331 | 45 | 1 | |
| Male | 11 | 52 | 407 | 55 | 1.12 | 0.8 |
| Age ( | ||||||
| ≤60 | 17 | 81 | 240 | 33 | 1 | |
| >60 | 4 | 19 | 461 | 62 | 8.16 | ∗∗∗ |
| Race ( | ||||||
| European | 14 | 16 | 321 | 27 | 1 | |
| Han | 38 | 43 | 177 | 15 | 0.21 | ∗∗∗ |
| Australia | 32 | 36 | 389 | 33 | 0.54 | 0.05 |
| USA | 4 | 5 | 299 | 25 | 3.26 | ∗ |
| Location of tissues ( | ||||||
| Proximal | 5 | 11 | 32 | 14 | 1 | |
| Distal | 42 | 89 | 193 | 86 | 0.72 | 0.52 |
OR: odds ratio; *<0.05; ***<0.001. Proximal position: cecum, ascending colon, hepatic flexure, transverse colon, and splenic flexure. Distal position: descending colon, sigmoid colon, and rectum.
55 differential expressed genes in colorectal tumors of the primary screening.
| Accession number | Gene symbol | Gene name | Chromosome | Fold change* |
|---|---|---|---|---|
| HGNC: 74 |
| ATP-binding cassette, subfamily G (white), member 2 | 4q22 | −6.26 |
| HGNC: 22204 |
| Adenosylhomocysteinase-like 2 | 7q32.1 | −5.82 |
| HGNC: 642 |
| Aquaporin 8 | 16p12 | −6.56 |
| HGNC: 17107 |
| Bestrophin 2 | 19p13.2 | −4.52 |
| HGNC: 1143 |
| Butyrophilin-like 3 | 5q35.3 | −6.03 |
| HGNC: 21214 |
| Chromosome 6 open reading frame 105 | 6p24.1 | −7.31 |
| HGNC: 28180 |
| Chromosome 6 open reading frame 105 | 9q31.1 | −4.23 |
| HGNC: 1368 |
| Carbonic anhydrase I | 8q21.2 | −6.22 |
| HGNC: 1375 |
| Carbonic anhydrase IV | 17q23 | −3.95 |
| HGNC: 1381 |
| Carbonic anhydrase VII | 16q22.1 | −5.39 |
| HGNC: 30072 |
| CD177 molecule | 19q13.2 | −4.65 |
| HGNC: 1762 |
| Cadherin 3, type 1, | 16q22.1 | 5.49 |
| ENSG00000166869 |
| Calcineurin B homologous protein 2 | 16p12.2 | −6.23 |
| HGNC: 1973 |
| Carbohydrate ( | 16q22.3 | −3.83 |
| HGNC: 2015 |
| Chloride channel accessory 1 | 1p22.3 | −5.68 |
| HGNC: 2018 |
| Chloride channel accessory 4 | 1p31-p22 | −4.08 |
| HGNC: 2032 |
| Claudin 1 | 3q28-q29 | 5.36 |
| HGNC: 2050 |
| Claudin 8 | 21q22.11 | −3.69 |
| HGNC: 2311 |
| Carboxypeptidase M | 12q14.3 | −3.64 |
| HGNC: 26133 |
| Cell wall biogenesis 43 C-terminal homolog ( | 4p11 | −4.59 |
| HGNC: 2765 |
| Defensin, alpha 6, Paneth cell-specific | 8p23.1 | 5.95 |
| HGNC: 3178 |
| Endothelin 3 | 20q13.2-q13.3 | −4.27 |
| HGNC: 23117 |
| Family with sequence similarity 55, member D | 11q23.2 | −6.18 |
| HGNC: 13572 |
| Fc fragment of IgG binding protein | 19q13.1 | −5.52 |
| HGNC: 4128 |
| UDP- | 12q13 | 3.56 |
| HGNC: 4191 |
| Glucagon | 2q36-q37 | −6.1 |
| HGNC: 4682 |
| Guanylate cyclase activator 2A (guanylin) | 1p35-p34 | −4.32 |
| HGNC: 4683 |
| Guanylate cyclase activator 2B (uroguanylin) | 1p34-p33 | −6.62 |
| HGNC: 4764 |
| H3 histone, family 3A | 1q41 | −3.86 |
| HGNC: 5141 |
| Haptoglobin | 16q22.1 | 11.72 |
| HGNC: 6019 |
| Interleukin 6 receptor | 1q21 | −3.46 |
| HGNC: 29213 |
| KIAA1199 | 15q24 | 4.43 |
| HGNC: 6359 |
| Kallikrein-related peptidase 11 | 19q13.33 | 4.65 |
| HGNC: 7174 |
| Matrix metallopeptidase 7 (matrilysin, uterine) | 11q21-q22 | 6.75 |
| HGNC: 13370 |
| Membrane-spanning 4-domains, subfamily A, member 12 | 11q12 | −10.66 |
| HGNC: 14296 |
| Metallothionein 1M | 16q13 | −3.94 |
| HGNC: 7512 |
| Mucin 2, oligomeric mucus/gel-forming | 11p15.5 | −7.23 |
| HGNC: 7783 |
| Nuclear factor (erythroid-derived 2)-like 3 | 7p15.2 | 3.02 |
| HGNC: 7978 |
| Nuclear receptor subfamily 3, group C, member 1 (glucocorticoid receptor) | 5q31.3 | −3.2 |
| HGNC: 7979 |
| Nuclear receptor subfamily 3, group C, member 2 | 4q31.1 | −5.08 |
| HGNC: 8062 |
| Nucleoporin 153kDa | 6p22.3 | 5.9 |
| HGNC: 9748 |
| Peptide YY | 17q21.1 | −4.35 |
| HGNC: 10600 |
| Sodium channel, nonvoltage-gated 1, beta | 16p12.2-p12.1 | −3.28 |
| HGNC: 3018 |
| Solute carrier family 26, member 3 | 7q31 | −5.59 |
| HGNC: 25355 |
| Solute carrier family 30, member 10 | 1q41 | −5.69 |
| HGNC: 11030 |
| Solute carrier family 4, sodium bicarbonate cotransporter, member 4 | 4q21 | −5.14 |
| HGNC: 11063 |
| Solute carrier family 7 (cationic amino acid transporter, y+ system), member 5 | 16q24.3 | 4.05 |
| HGNC: 11242 |
| Spi-B transcription factor (Spi-1/PU.1 related) | 19q13.3-q13.4 | −4.24 |
| HGNC: 15464 |
| Serine peptidase inhibitor, Kazal type 5 | 5q32 | −4.31 |
| HGNC: 11255 |
| Secreted phosphoprotein 1 | 4q22.1 | 9.69 |
| HGNC: 11329 |
| Somatostatin | 3q28 | −5.84 |
| HGNC: 11652 |
| Transcobalamin I (vitamin B12 binding protein, R binder family) | 11q11-q12 | 8.66 |
| HGNC: 11799 |
| Thyroid hormone receptor, beta (erythroblastic leukemia viral (v-erb-a) oncogene homolog 2, avian) | 3q24.2 | −4.25 |
| HGNC: 17995 |
| Transient receptor potential cation channel, subfamily M, member 6 | 9q21.13 | −4.85 |
| HGNC: 30961 |
| Zymogen granule protein 16 homolog (rat) | 16q11.2 | −3.94 |
*equation: fold change = log2(g crc/g nm); g crc: the average gene expression in colorectal tumors; g nm: the average gene expression in normal mucosal tissues.
Figure 2Test accuracy rates of 4 approaches in 1,000 bootstrapping rounds for classifying colorectal tumors and normal mucosal tissues.
Spearman's correlations of ranking orders of 55 significant genes among the methods of PAM, ANN, CART, and C5.0.
| Spearman's correlation | ANN | CARTΔ | C5.0Δ | C5.0_importance | PAM_centroid | PAMΔ | |
|---|---|---|---|---|---|---|---|
| ANN |
| 1 | |||||
| CARTΔ |
| 0.42** | 1 | ||||
| C5.0Δ |
| 0.48*** | 0.75*** | 1 | |||
| C5.0_importance |
| 0.24 | 0.62*** | 0.73*** | 1 | ||
| PAM_centroid |
| 0.09 | 0.01 | 0.18 | 0.1 | 1 | |
| PAMΔ |
| 0.09 | 0.01 | 0.18 | 0.1 | 1.00*** | 1 |
*P value < 0.05, **P value < 0.01, ***P value < 0.001.
Figure 3Stacked bar chart of the gene importance of 55 genes. The percentile ranks of scores (RS%) of 55 genes were derived from PAM, ANN, CART, and C5.0 methods. The higher RS% represents that the gene was more important in the classification of colorectal tumors and normal mucosa tissues. CV: coefficient of variance. The RS% of genes in different methods was calculated by four approaches noted with the suffix, Δ, importance, centroid, and #. Symbols represent the way to calculate RS%. Δ: the significant times of each gene in 1,000 times of bootstrapping. Importance: the node location of genes in decision trees. Centroids: centroid values of each gene in PAM. #: the relative importance values in ANN.