| Literature DB >> 24382981 |
Wen-Kuei Chien1, Chuhsing Kate Hsiao2.
Abstract
Recent advancement in microarray technologies has led to a collection of an enormous number of genetic markers in disease association studies, and yet scientists are interested in selecting a smaller set of genes to explore the relation between genes and disease. Current approaches either adopt a single marker test which ignores the possible interaction among genes or consider a multistage procedure that reduces the large size of genes before evaluation of the association. Among the latter, Bayesian analysis can further accommodate the correlation between genes through the specification of a multivariate prior distribution and estimate the probabilities of association through latent variables. The covariance matrix, however, depends on an unknown parameter. In this research, we suggested a reference hyperprior distribution for such uncertainty, outlined the implementation of its computation, and illustrated this fully Bayesian approach with a colon and leukemia cancer study. Comparison with other existing methods was also conducted. The classification accuracy of our proposed model is higher with a smaller set of selected genes. The results not only replicated findings in several earlier studies, but also provided the strength of association with posterior probabilities.Entities:
Mesh:
Year: 2013 PMID: 24382981 PMCID: PMC3870637 DOI: 10.1155/2013/420412
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.238
The posterior inclusion probability and description of the leading 20 genes for the colon cancer study. Genes identified in other studies were also noted.
| Gene | Probability | Description |
|---|---|---|
| Z50753 | 0.1519 |
|
| D14812 | 0.1303 | Human mRNA for ORF, complete cdsbc |
| H06524 | 0.1163 | Gelsolin precursor, plasma ( |
| R87126 | 0.1081 | Myosin heavy chain, nonmuscle ( |
| H08393 | 0.1012 | Collagen alpha-2(XI) chain ( |
| T62947 | 0.0987 | 60S ribosomal protein L24 ( |
| T57882 | 0.0881 | Myosin heavy chain, nonmuscle type A ( |
| R88740 | 0.0594 | Atp synthase coupling factor 6, mitochondrial precursor ( |
| J02854 | 0.0527 | Myosin regulatory light chain 2, smooth muscle isoform ( |
| T94579 | 0.0494 | Human chitotriosidase precursor mRNA, complete cdsb |
| H64807 | 0.0490 | Placental folate transporter ( |
| M59040 | 0.0439 | Human cell adhesion molecule (CD44) mRNA, complete cdsc |
| R55310 | 0.0437 | S36390 mitochondrial processing peptidasec |
| M82919 | 0.0333 | Human gamma aminobutyric acid (GABAA) receptor beta-3 subunit mRNA, complete cdsbc |
| H20709 | 0.0330 | Myosin light chain alkali, smooth-muscle isoform ( |
| T92451 | 0.0319 | Tropomyosin, fibroblast, and epithelial muscle-type ( |
| R33481 | 0.0312 | Transcription factors ATF-A and ATF-A-DELTA ( |
| L06175 | 0.0309 |
|
| T64012 | 0.0309 | Acetylcholine receptor protein, delta chain precursor ( |
| H09719 | 0.0300 | Tubulin alpha-6 chain ( |
aGene also identified in Ben-Dor et al. [38].
bGene also identified in Furlanello et al. [39].
cGene also identified in Chu et al. [40].
Figure 1The largest 100 posterior probabilities of the genes for colon cancer study.
Performance comparison of different procedures with LOOCV for the colon cancer study.
| Methods | No. of genes | LOOCV error rate | LOOCV accuracy |
|---|---|---|---|
| Bayesian | 6 | 0.1452 (9/62) | 0.8548 (53/62) |
| Bayesian | 10 | 0.1452 (9/62) | 0.8548 (53/62) |
| Bayesian | 14 |
|
|
| SVMa | 1000 |
|
|
| Classification treeb | 200 | 0.1452 (9/62) | 0.8548 (53/62) |
| 1-Nearest-neighborb | 25 | 0.1452 (9/62) | 0.8548 (53/62) |
| LogitBoost, estimatedb | 25 | 0.1935 (12/62) | 0.8065 (50/62) |
| LogitBoost, 100 iterationsb | 10 | 0.1452 (9/62) | 0.8548 (53/62) |
| AdaBoost, 100 iterationsb | 10 | 0.1613 (10/62) | 0.8387 (52/62) |
| MAVE-LDc | 50 | 0.1613 (10/62) | 0.8387 (52/62) |
| IRWPLSd | 20 |
|
|
| SGLassoe | 19 | 0.1290 (8/62) | 0.8710 (54/62) |
| MRMS + SVM + D1f | 5 | 0.1290 (8/62) | 0.8710 (54/62) |
| MRMS + SVM + D2f | 33 | 0.1452 (9/62) | 0.8548 (53/62) |
|
| 6 | 0.1452 (9/62) | 0.8548 (53/62) |
|
| 10 | 0.1774 (11/62) | 0.8226 (51/62) |
|
| 14 | 0.2258 (14/62) | 0.7742 (48/62) |
aProposed by Furey et al. [41].
bProposed by Dettling andBühlmann [42].
cProposed by Antoniadis et al. [43].
dProposed by Ding and Gentleman [44].
eProposed by Ma et al. [45].
fProposed by Maji and Paul [46].
The posterior inclusion probability and description of the leading 20 genes for the leukemia study. Genes identified in other studies were also noted.
| Gene | Probability | Description |
|---|---|---|
| X95735 | 0.0691 | Zyxinabc |
| M27891 | 0.0519 | CST3 cystatin C (amyloid angiopathy and cerebral hemorrhage)abc |
| M23197 | 0.0302 | CD33 cD33 antigen (differentiation antigen)abc |
| Y12670 | 0.0251 | LEPR leptin receptora |
| X85116 | 0.0226 | Epb72 gene exon 1ab |
| D88422 | 0.0196 | CYSTATIN Abc |
| X62654 | 0.0196 | ME491 gene extracted from |
| X04085 | 0.0195 | Catalase (EC 1.11.1.6) 5′ank and exon 1 mapping to chromosome 11, band p13 (and joined CDS)a |
| L09209 | 0.0195 | APLP2 amyloid beta (A4) precursor-like protein 2bc |
| HG1612-HT1612 | 0.0186 | Macmarcksbc |
| M16038 | 0.0186 | LYN V-yes-1 Yamaguchi sarcoma viral related oncogene homologabc |
| U50136 | 0.0181 | Leukotriene C4 synthase (LTC4S) geneab |
| M55150 | 0.0172 | FAH fumarylacetoacetateab |
| M92287 | 0.0172 | CCND3 cyclin D3bc |
| M22960 | 0.0168 | PPGB protective protein for beta-galactosidase (galactosialidosis)bc |
| X70297 | 0.0168 | CHRNA7 cholinergic receptor, nicotinic, and alpha polypeptide 7b |
| X51521 | 0.0163 | VIL2 Villin 2 (ezrin)b |
| M63138 | 0.0154 | CTSD cathepsin D (lysosomal aspartyl protease)ab |
| M27783 | 0.0154 | ELA2 elastase 2, neutrophilc |
| U81554 | 0.0137 | CaM kinase II isoform mRNA |
aGene also identified in Golub et al. [36].
bGene also identified in Ben-Dor et al. [38].
cGene also identified in in Lee et al. [22].
Figure 2The largest 100 posterior probabilities of the genes for leukemia study.
Performance comparison of different procedures for the leukemia study.
| Methods | No. of genes | Testing error rate | Testing accuracy |
|---|---|---|---|
| Bayesian | 6 |
|
|
| Bayesian | 10 | 0.0588 (2/34) | 0.9412 (32/34) |
| Bayesian | 14 | 0.0588 (2/34) | 0.9412 (32/34) |
| Weighted voting machinea | 50 | 0.1471 (5/34) | 0.8529 (29/34) |
| MAVE-LDb | 50 |
|
|
| Two-step EBMc | 32 | 0.1471 (5/34) | 0.8529 (29/34) |
| Two-step EBMc | 256 | 0.0588 (2/34) | 0.9412 (32/34) |
| Two-step EBMc | 512 |
|
|
| KIGP + PKd | 20 | 0.0588 (2/34) | 0.9412 (32/34) |
|
| 6 | 0.1765 (6/34) | 0.8235 (28/34) |
|
| 10 | 0.0882 (3/34) | 0.9118 (31/34) |
|
| 14 | 0.1176 (4/34) | 0.8824 (30/34) |
aProposed by Gloub et al. [36].
bProposed by Antoniadis et al. [43].
cProposed by Ji et al. [47].
dProposed by Zhao and Cheung [48].
Performance comparison of different procedures with LOOCV for the colon cancer study.
| Methods | No. of genes | LOOCV error rate | LOOCV accuracy |
|---|---|---|---|
| Bayesian | 5 | 0.0390 (3/77) | 0.9610 (74/77) |
| Bayesian | 6 | 0.0519 (4/77) | 0.9481 (73/77) |
| Bayesian | 10 | 0.0649 (5/77) | 0.9351 (72/77) |
| Bayesian | 14 | 0.0779 (6/77) | 0.9221 (71/77) |
| Bayesian | 20 | 0.0779 (6/77) | 0.9221 (71/77) |
| HBE | 6 | 0.0390 (3/77) | 0.9610 (74/77) |
|
| 6 | 0.1169 (9/77) | 0.8831 (68/77) |
|
| 10 | 0.1558 (12/77) | 0.8442 (65/77) |
|
| 14 | 0.2208 (17/77) | 0.7792 (60/77) |
Figure 3The accuracy of the proposed procedure at different numbers (p* = 5,…, 20) of selected genes with c following the generalized g-prior (pink line) or fixed at constant 5 (red line), 10 (blue), or 20 (black) for the colon cancer study.
Figure 4The accuracy of the proposed procedure at different numbers (p* = 5,…, 20) of selected genes with c following the generalized g-prior (pink line) or fixed at constant 5 (red line), 10 (blue), or 20 (black) for the leukemia study.
Figure 5Average accuracy when the number of genes ranges from 1 to 15 under the mixtures of g-priors on c (pink line), c fixed at 5 (black), c at 50 (red), and c at 500 (blue).