| Literature DB >> 22442625 |
Shuhei Kaneko1, Akihiro Hirakawa, Chikuma Hamada.
Abstract
Mining of gene expression data to identify genes associated with patient survival is an ongoing problem in cancer prognostic studies using microarrays in order to use such genes to achieve more accurate prognoses. The least absolute shrinkage and selection operator (lasso) is often used for gene selection and parameter estimation in high-dimensional microarray data. The lasso shrinks some of the coefficients to zero, and the amount of shrinkage is determined by the tuning parameter, often determined by cross validation. The model determined by this cross validation contains many false positives whose coefficients are actually zero. We propose a method for estimating the false positive rate (FPR) for lasso estimates in a high-dimensional Cox model. We performed a simulation study to examine the precision of the FPR estimate by the proposed method. We applied the proposed method to real data and illustrated the identification of false positive genes.Entities:
Keywords: cancer prognostic; false positive rate; gene selection; high-dimensional regression; microarray data; survival analysis
Year: 2012 PMID: 22442625 PMCID: PMC3298378 DOI: 10.4137/CIN.S9048
Source DB: PubMed Journal: Cancer Inform ISSN: 1176-9351
Accuracy of the FPR estimated using the method proposed in the simulation studies.
|
|
|
| |||||
|---|---|---|---|---|---|---|---|
| 0 | 5 | 10 | 126.2 | 96.0 | 96.0 | 5.0 | 121.1 |
| 15 | 69.2 | 92.7 | 92.6 | 5.1 | 64.1 | ||
| 20 | 31.0 | 83.5 | 82.9 | 5.2 | 25.8 | ||
| 25 | 12.4 | 57.4 | 53.4 | 5.5 | 6.9 | ||
| 30 | 6.5 | 19.8 | 14.6 | 5.4 | 1.1 | ||
| 30 | 10 | 106.7 | 71.7 | 71.4 | 30.3 | 76.5 | |
| 15 | 72.1 | 57.7 | 56.8 | 30.6 | 41.5 | ||
| 20 | 57.8 | 50.0 | 44.0 | 32.0 | 25.8 | ||
| 25 | 42.5 | 44.4 | 32.5 | 28.5 | 14.1 | ||
| 30 | 28.2 | 36.9 | 27.9 | 20.2 | 8.0 | ||
| 0.2 | 5 | 10 | 122.7 | 95.9 | 95.9 | 5.0 | 117.6 |
| 15 | 65.4 | 92.3 | 92.2 | 5.1 | 60.3 | ||
| 20 | 27.8 | 81.5 | 80.7 | 5.2 | 22.5 | ||
| 25 | 10.3 | 48.6 | 43.8 | 5.5 | 4.8 | ||
| 30 | 5.7 | 10.1 | 6.8 | 5.2 | 0.5 | ||
| 30 | 10 | 64.1 | 52.8 | 52.0 | 30.5 | 33.6 | |
| 15 | 32.1 | 6.4 | 5.0 | 30.4 | 1.7 | ||
| 20 | 30.0 | 0.1 | 0.1 | 30.0 | 0.0 | ||
| 25 | 30.0 | 0.0 | 0.0 | 30.0 | 0.0 | ||
| 30 | 30.0 | 0.0 | 0.0 | 30.0 | 0.0 | ||
| 0.5 | 5 | 10 | 119.3 | 95.8 | 95.8 | 5.0 | 114.2 |
| 15 | 62.5 | 91.9 | 91.8 | 5.1 | 57.4 | ||
| 20 | 25.4 | 79.7 | 78.8 | 5.2 | 20.2 | ||
| 25 | 9.2 | 42.6 | 36.4 | 5.5 | 3.6 | ||
| 30 | 5.4 | 6.5 | 3.3 | 5.2 | 0.2 | ||
| 30 | 10 | 59.8 | 49.5 | 48.5 | 30.5 | 29.3 | |
| 15 | 31.1 | 3.4 | 2.1 | 30.4 | 0.7 | ||
| 20 | 30.0 | 0.0 | 0.0 | 30.0 | 0.0 | ||
| 25 | 30.0 | 0.0 | 0.0 | 30.0 | 0.0 | ||
| 30 | 30.0 | 0.0 | 0.0 | 30.0 | 0.0 |
The GenBank accession numbers, descriptions, and coefficient estimates of 12 genes selected by the lasso.
| AA805575 | Thyroxine-binding globulin precursor | −0.1039 |
| X00452 | Major histocompatibility complex, class II, DQ alpha 1 | −0.1026 |
| LC_29222 | – | −0.0927 |
| AF044323 | COX15 homolog, cytochrome c oxidase assembly protein (yeast) | 0.0167 |
| L19872 | Hydrocarbon receptor | −0.0078 |
| M20430 | Major histocompatibility complex, class II, DR beta 5 | −0.0076 |
| K01171 | Major histocompatibility complex, class II, DR alpha | −0.0067 |
| X59812 (R92015) | Cytochrome P450, subfamily XXVIIA polypeptide | −0.0028 |
| M63438 | Immunoglobulin kappa constant | 0.0028 |
| X82240 (AA729003) | T-cell leukemia/lymphoma 1A | −0.0017 |
| X82240 (R97095) | T-cell leukemia/lymphoma 1A | −0.0010 |
| X59812 (H98765) | Cytochrome P450, subfamily XXVIIA polypeptide | −0.0002 |
Figure 1.The estimated mixture distribution assuming the lasso estimates in the DLBCL data; f and f are the probability density functions of laplace and normal distributions, respectively. β̂ is the estimate by the lasso and f(β̂) is the probability density of β̂.
Note: A magnified image of the distribution between the β̂ values −0.3 and 0.1 is inserted.
The estimated numbers of TP and FP genes and the estimated FPR for the cut-off values from 0.0001 to 0.05.
|
|
|
| ||
|---|---|---|---|---|
| 0.0001 | 12 | 8.96 | 3.04 | 74.6 |
| 0.0005 | 11 | 8.05 | 2.95 | 73.2 |
| 0.001 | 10 | 7.13 | 2.87 | 71.3 |
| 0.005 | 7 | 3.76 | 3.24 | 53.7 |
| 0.01 | 4 | 1.24 | 2.76 | 30.9 |
| 0.02 | 3 | 0.19 | 2.81 | 6.3 |
| 0.03 | 3 | 0.03 | 2.97 | 1.0 |
| 0.04 | 3 | 0.00 | 3.00 | 0.0 |
| 0.05 | 3 | 0.00 | 3.00 | 0.0 |
Gene sets with FDR < 0.5 in the GSEA.
| Biosynthetic process | <0.001 | <0.001 | AF044323 |
| Cellular biosynthetic process | <0.001 | <0.001 | AF044323 |
| Mitochondrial part | 0.002 | 0.035 | AF044323 |
| Mitochondrion | 0.005 | 0.066 | AF044323 |
| Mitochondrial envelope | 0.008 | 0.085 | AF044323 |
| Cytoplasmic part | 0.014 | 0.093 | AF044323, K01171 |
| Lytic vacuole | 0.014 | 0.093 | K01171 |
| Lysosome | 0.014 | 0.093 | K01171 |
| Vacuole | 0.022 | 0.103 | K01171 |
| Cellular component assembly | 0.025 | 0.103 | AF044323 |
| Protein metabolic process | 0.028 | 0.103 | AF044323 |
| Cellular macromolecule metabolic process | 0.028 | 0.103 | AF044323 |
| Secondary metabolic | 0.029 | 0.103 | AF044323 |
| Pigment biosynthetic process | 0.029 | 0.103 | AF044323 |
| Pigment metabolic process | 0.029 | 0.103 | AF044323 |
| Cellular protein metabolic process | 0.034 | 0.109 | AF044323 |
| Mitochondrial membrane | 0.035 | 0.109 | AF044323 |
| Cytoplasm | 0.039 | 0.115 | AF044323, K01171 |
| Heme biosynthetic process | 0.047 | 0.125 | AF044323 |
| Heme metabolic process | 0.047 | 0.125 | AF044323 |
| Heterocycle metabolic process | 0.067 | 0.169 | AF044323 |
| Macromolecular complex assembly | 0.082 | 0.198 | AF044323 |
| Cofactor biosynthetic process | 0.106 | 0.244 | AF044323 |
| Protein complex assembly | 0.111 | 0.245 | AF044323 |
| Cofactor metabolic process | 0.134 | 0.284 | AF044323 |
| Mitochondrial inner membrane | 0.143 | 0.292 | AF044323 |
| Receptor activity | 0.184 | 0.349 | X00452 |
| Multicellular organismal development | 0.191 | 0.349 | X82240 |
| Transmembrane receptor activity | 0.191 | 0.349 | X00452 |
| Organelle inner membrane | 0.200 | 0.349 | AF044323 |
| Cellular protein complex assembly | 0.209 | 0.349 | AF044323 |
| Envelope | 0.217 | 0.349 | AF044323 |
| Organelle envelope | 0.217 | 0.349 | AF044323 |
| Organelle part | 0.324 | 0.467 | AF044323 |
| Intracellular organelle part | 0.324 | 0.467 | AF044323 |
| Inorganic cation transmembrane transporter activity | 0.324 | 0.467 | AF044323 |
| Mitochondrial membrane part | 0.326 | 0.467 | AF044323 |
| Cytochrome c oxidase activity | 0.356 | 0.497 | AF044323 |
Three criteria for model evaluation.
| 0.002 | 0.007 | 0.246 | |
| 0.002 | 0.002 | 0.120 | |
| Deviance | −8.942 | −9.072 | −1.967 |
Figure 2.Kaplan-Meier curves of overall survival for the 2 groups; (A) in the models that identified 3 genes by the proposed method, (B) in the models that identified 12 genes by the lasso method, (C) in the models that identified 2 genes by the GSEA.