| Literature DB >> 29996709 |
Xixi Xiang1, Yu-Ping Wang2, Hongbao Cao3,4, Xi Zhang1.
Abstract
Objective To investigate whether previously curated chronic lymphocytic leukemia (CLL) risk genes could be leveraged in gene marker selection for the diagnosis and prediction of CLL. Methods A CLL genetic database (CLL_042017) was developed through a comprehensive CLL-gene relation data analysis, in which 753 CLL target genes were curated. Expression values for these genes were used for case-control classification of four CLL datasets, with a sparse representation-based variable selection (SRVS) approach employed for feature (gene) selection. Results were compared with outcomes obtained by using analysis of variance (ANOVA)-based gene selection approaches. Results For each of the four datasets, SRVS selected a subset of genes from the 753 CLL target genes, resulting in significantly higher classification accuracy, compared with randomly selected genes (100%, 100%, 93.94%, 89.39%). The SRVS method outperformed ANOVA in terms of classification accuracy. Conclusion Gene markers selected from the 753 CLL genes could enable significantly greater accuracy in the prediction of CLL. SRVS provides an effective method for gene marker selection.Entities:
Keywords: Chronic lymphocytic leukemia (CLL); case-control classification; disease prediction; gene markers; genetic databases; sparse representation; variable selection
Mesh:
Substances:
Year: 2018 PMID: 29996709 PMCID: PMC6134680 DOI: 10.1177/0300060518783072
Source DB: PubMed Journal: J Int Med Res ISSN: 0300-0605 Impact factor: 1.671
Figure 1.Chronic lymphocytic leukemia (CLL) genetic database schematic.
Statistics of four gene expression datasets.
| NCBI GEO ID | GSE2466 | GSE19147 | GSE50006 | GSE8835 |
|---|---|---|---|---|
| #CLL case/control | 72/11 | 25/8 | 188/32 | 42/24 |
| #genes from CLL_042017 | 564 | 624 | 685 | 624 |
| Sample source | Peripheral blood lymphocytes | Peripheral blood CD3+T cells | leukemia cells | Peripheral blood CD4 T cells and CD8 T cells |
| Sample population | Austria | Germany | USA | USA |
Figure 2.Comparison of different metrics through leave-one-out (LOO) cross-validation. Genes were ranked in ascending order according to SRVSScore or PValueScore, for sparse representation-based variable selection (SRVS) or analysis of variance (ANOVA), respectively. (a) GSE 2466, (b) GSE 19147, (c) GSE 50006 and (d) GSE 8835.
LOO cross-validation and permutation results
GSE2466 (case/control:72/11) | GSE19147 (case/control:25/8) | GSE50006 (case/control:188/32) | GSE8835 (case/control:42/24) | |||||
|---|---|---|---|---|---|---|---|---|
| SRVS | ANOVA | SRVS | ANOVA | SRVS | ANOVA | SRVS | ANOVA | |
| MaxCRs | 100.00 | 100.00 | 100.00 | 93.94 | 98.64 | 98.18 | 89.39 | 84.85 |
| # Selected Genes | 4 | 3 | 65 | 3 | 131 | 20 | 101 | 345 |
| p-value | 0.001 | 0.0002 | ∼0 | 0.0016 | 0.0014 | 0.0012 | ∼0 | ∼0 |
| Unique genes from all datasets (%) | 25%(1/4) | 66.67%(2/3) | 52.31%(34/65) | 33.33%(1/3) | 75.57%(99/131) | 40%(8/20) | 97.03%(98/101) | 95.94%(331/345) |
| Overlap genes of two methods (%) | 0%(0/4) | 0%(0/3) | 3.08%(2/65) | 66.67%(2/3) | 15.27%(20/131) | 100%(20/20) | 65.35%(66/101) | 19.13%(66/345) |
SRVS, sparse representation-based variable selection; ANOVA, analysis of variance.