| Literature DB >> 16684357 |
Shuangge Ma1, Xiao Song, Jian Huang.
Abstract
BACKGROUND: An important application of microarrays is to discover genomic biomarkers, among tens of thousands of genes assayed, for disease diagnosis and prognosis. Thus it is of interest to develop efficient statistical methods that can simultaneously identify important biomarkers from such high-throughput genomic data and construct appropriate classification rules. It is also of interest to develop methods for evaluation of classification performance and ranking of identified biomarkers.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16684357 PMCID: PMC1513612 DOI: 10.1186/1471-2105-7-253
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Simulation study I. Means of AUC and classification error (with their standard errors in parentheses) for small, moderate and large mean differences.
| ( | small | moderate | large | small | moderate | large | small | moderate | large | |
| (15, 15) | AUC | 0.63(0.18) | 0.88(0.10) | 0.99(0.01) | 0.64(0.17) | 0.92(0.09) | 1.00(0.00) | 0.71(0.17) | 0.95(0.07) | 1.00(0.00) |
| Error | 0.37(0.16) | 0.16(0.16) | 0.00(0.00) | 0.37(0.17) | 0.12(0.13) | 0.00(0.00) | 0.32(0.15) | 0.07(0.10) | 0.00(0.00) | |
| (20, 10) | AUC | 0.58(0.18) | 0.86(0.12) | 0.99(0.01) | 0.65(0.18) | 0.92(0.08) | 1.00(0.00) | 0.66(0.18) | 0.95(0.06) | 1.00(0.00) |
| Error | 0.37(0.17) | 0.17(0.14) | 0.00(0.00) | 0.34(0.15) | 0.11(0.13) | 0.00(0.00) | 0.32(0.17) | 0.07(0.10) | 0.00(0.00) | |
| (50, 50) | AUC | 0.66(0.09) | 0.96(0.03) | 1.00(0.00) | 0.72(0.09) | 0.99(0.01) | 1.00(0.00) | 0.77(0.09) | 0.99(0.01) | 1.00(0.00) |
| Error | 0.36(0.09) | 0.09(0.06) | 0.00(0.00) | 0.33(0.08) | 0.04(0.04) | 0.00(0.00) | 0.29(0.08) | 0.04(0.04) | 0.00(0.00) | |
| (70, 30) | AUC | 0.66(0.12) | 0.95(0.03) | 1.00(0.00) | 0.70(0.10) | 0.98(0.02) | 1.00(0.00) | 0.77(0.09) | 0.99(0.01) | 1.00(0.00) |
| Error | 0.32(0.09) | 0.09(0.05) | 0.00(0.00) | 0.29(0.08) | 0.05(0.04) | 0.00(0.00) | 0.24(0.08) | 0.03(0.04) | 0.00(0.00) | |
Simulation study II. Means of AUC and classification error (with their standard errors in parentheses) for π = 0.05 and moderate mean differences. Marginal distributions: Normal(0, 1), Uniform[-, ] Gamma(1/4, 1/2) and 2Beta(0.5, 0.5).
| ( | Normal | Uniform | Gamma | Beta | |
| (15, 15) | AUC | 0.92(0.09) | 0.92(0.08) | 0.89(0.12) | 0.92(0.08) |
| Error | 0.12(0.13) | 0.12(0.13) | 0.10(0.12) | 0.13(0.14) | |
| (20, 10) | AUC | 0.92(0.08) | 0.92(0.08) | 0.89(0.12) | 0.92(0.08) |
| Error | 0.11(0.13) | 0.13(0.12) | 0.13(0.13) | 0.13(0.13) | |
| (50, 50) | AUC | 0.99(0.01) | 0.99(0.01) | 0.95(0.06) | 0.99(0.01) |
| Error | 0.04(0.04) | 0.04(0.04) | 0.06(0.06) | 0.03(0.04) | |
| (70, 30) | AUC | 0.98(0.02) | 0.98(0.01) | 0.96(0.06) | 0.99(0.01) |
| Error | 0.05(0.04) | 0.04(0.04) | 0.07(0.06) | 0.03(0.04) |
Simulation study III. Means of AUC and classification error (with their standard errors in parentheses) for π = 0.05 and moderate mean differences. Marginal distributions: Uniform[-, ] and Normal(0, 1). Independent, weakly correlated and strongly correlated genes.
| Uniform | Normal | ||||||
| ( | Independent | Weak | Strong | Independent | Weak | Strong | |
| (15, 15) | AUC | 0.92(0.08) | 0.91(0.08) | 0.91(0.09) | 0.92(0.09) | 0.91(0.09) | 0.90(0.10) |
| Error | 0.12(0.13) | 0.14(0.13) | 0.14(0.13) | 0.12(0.13) | 0.12(0.12) | 0.14(0.12) | |
| (20, 10) | AUC | 0.92(0.08) | 0.91(0.10) | 0.91(0.08) | 0.92(0.08) | 0.91(0.09) | 0.89(0.10) |
| Error | 0.13(0.12) | 0.12(0.13) | 0.13(0.13) | 0.11(0.13) | 0.13(0.13) | 0.13(0.12) | |
| (50, 50) | AUC | 0.99(0.01) | 0.98(0.01) | 0.98(0.02) | 0.99(0.01) | 0.98(0.02) | 0.98(0.02) |
| Error | 0.04(0.04) | 0.05(0.04) | 0.06(0.04) | 0.04(0.04) | 0.05(0.05) | 0.07(0.05) | |
| (70, 30) | AUC | 0.98(0.01) | 0.98(0.02) | 0.97(0.02) | 0.98(0.02) | 0.98(0.02) | 0.97(0.03) |
| Error | 0.04(0.04) | 0.04(0.04) | 0.06(0.05) | 0.05(0.04) | 0.05(0.05) | 0.07(0.05) | |
Colon and estrogen data. Model features for different τ. Variable: number of genes with nonzero coefficients.
| Colon | Estrogen | |||
| variable | CV | variable | CV | |
| 1.0 | 18 | 2.64 | 20 | 2.93 |
| 0.9 | 26 | 2.61 | 27 | 2.93 |
| 0.8 | 73 | 2.63 | 79 | 2.93 |
| 0.7 | 278 | 2.70 | 139 | 2.93 |
| 0.6 | 453 | 2.74 | 307 | 2.94 |
| 0.5 | 481 | 2.76 | 422 | 2.94 |
| 0.4 | 492 | 2.79 | 490 | 2.96 |
| 0.3 | 498 | 2.81 | 499 | 2.97 |
| 0.2 | 500 | 2.76 | 500 | 2.97 |
| 0.1 | 500 | 2.74 | 500 | 2.97 |
| 0.0 | 500 | 2.74 | 500 | 2.97 |
Colon Data: genes with nonzero coeficients.
| GeneID | Gene Description | |
| Hsa.949 | 0.15 | M59807, NATURAL KILLER CELLS PROTEIN 4 PRECURSOR. |
| Hsa.8219 | -1.01 | R46753, CYCLIN-DEPENDENT KINASE INHIBITOR 1 (Homo sapiens). |
| Hsa.10047 | 0.10 | T51849, TYROSINE-PROTEIN KINASE RECEPTOR ELK PRECURSOR. |
| Hsa.8214 | 0.40 | R62549, PUTATIVE SERINE/THREONINE-PROTEIN KINASE B0464.5. |
| Hsa.8175 | -0.31 | H49870, MAD PROTEIN (Homo sapiens). |
| Hsa.2483 | -0.10 | D14665, Human mRNA for ORF, complete cds. |
| Hsa.3016 | 1.33 | T47377, S-100P PROTEIN (HUMAN). |
| Hsa.5392 | 0.15 | T62947, 60S RIBOSOMAL PROTEIN L24 (Arabidopsis thaliana). |
| Has.341 | -0.71 | M26683, Human interferon gamma treatment inducible mRNA. |
| Hsa.1410 | 1.00 | R54097, TRANSLATIONAL INITIATION FACTOR 2 BETA SUBUNIT (HUMAN). |
| Hsa.2928 | 0.54 | X63629, H.sapiens mRNA for p cadherin. |
| Hsa.9246 | -0.40 | T47383, ALKALINE PHOSPHATASE, PLACENTAL TYPE 1 PRECURSOR. |
| Hsa.1240 | -0.30 | M31994, Human cytosolic aldehyde dehydrogenase (ALDH1) gene, exon 13. |
| Hsa.1454 | -0.96 | M82919, Human gamma amino butyric acid receptor beta-3 subunit mRNA. |
| Hsa.627 | 1.00 | M26383 Human monocyte-derived neutrophil-activating protein (MONAP) mRNA. |
| Hsa.2688 | 0.25 | X60489, Human mRNA for elongation factor-1-beta. |
| Hsa.6814 | 0.51 | H08393, COLLAGEN ALPHA 2(XI) CHAIN (Homo sapiens). |
| Hsa.1491 | 0.94 | M35531, Human GDP-L-fucose:beta-D-galactoside 2-alpha-l-fucosyltransferase mRNA. |
Estrogen data: genes with nonzero coeficients.
| GeneID | Gene Description | |
| AB002365_at | 0.06 | AB002365, Human mRNA for KIAA0367 gene, partial cds. |
| D43772_at | -0.10 | D43772, Human squamous cell carcinama of esophagus mRNA for GRB-7 SH2 domain protein. |
| D87468_at | -0.61 | D87468, Human mRNA for KIAA0278 gene, partial cds. |
| HG2755-HT2862_at | 0.10 | T-Plastin. |
| J02871_s_at | 0.35 | J02871, Human lung cytochrome P450 (IV subfamily) BI protein, complete cds. |
| K02054_at | 0.55 | K02054, Human gastrin-releasing peptide mRNA, complete cds. |
| K03460_at | -0.06 | K03460, Human alpha-tubulin isotype H2-alpha gene, last exon. |
| M11718 _at | 0.05 | M11718, Human alpha-2 type V collagen gene, 3' end. |
| M24069_at | 0.25 | M24069, Human DNA-binding protein A (dbpA) gene, 3' end. |
| M32053_at | 0.45 | M32053, Human H19 RNA gene, complete cds (spliced in silico). |
| M81758_at | -0.10 | M81758, Homo sapiens skeletal muscle voltage-dependent sodium channel alpha subunit (SkM1) mRNA. |
| M83186_at | 0.05 | M83186, Human cytochrome c oxidase subunit VIIa (COX7A) muscle isoform mRNA, complete cds. |
| U01062_at | -0.20 | U01062, Human type 3 inositol 1,4,5-trisphosphate receptor (ITPR3) mRNA, complete cds. |
| U03057_at | 0.15 | U03057, Human actin bundling protein (HSN) mRNA, complete cds. |
| U28386_at | 0.05 | U28386, Human nuclear localization sequence receptor hSRP1alpha mRNA, complete cds. |
| U60115_at | 0.05 | U60115, Human skeletal muscle LIM-protein SLIM1 mRNA, complete cds. |
| U82169_at | -0.40 | U82169, Human frizzled homolog (FZD3) mRNA, complete cds. |
| X03635_at | 1.00 | X03635, Human mRNA for oestrogen receptor. |
| X56667_at | -0.20 | X56667, Human mRNA for calretinin. |
| X86693_at | 0.10 | X86693, H.sapiens mRNA for hevin like protein. |
Figure 1Colon data. X axis: natural order of genes. Left-upper panel: occurrence index, red "+": genes identified with the TGDR. Right-upper panel: kernel density estimation of the OPD (solid line) and PPD (dashed line) of AUG. Left-lower panel: Binormal AUC of every gene. Lower-right panel: t-statistics of every gene.
Figure 2Estrogen data. X axis: natural order of genes. Left-lower panel: occurrence index, red "+": genes identified with the TGDR. Right-lower panel: kernel density estimation of the OPD (solid line) and PPD (dashed line) of AUC. Left-lower panel: Binormal AUC of every gene. Lower-right panel: t-statistics of every gene.
Figure 3Binormal ROC plot. Left panel: training set. Right panel: testing set.