| Literature DB >> 23281681 |
Pei-Fang Su1, Xi Chen, Heidi Chen, Yu Shyr.
Abstract
BACKGROUND: Dealing with high dimensional markers, such as gene expression data obtained using microarray chip technology or genomics studies, is a key challenge because the numbers of features greatly exceeds the number of biological samples. After selecting biologically relevant genes, how to summarize the expression of selected genes and then further build predicted model is an important issue in medical applications. One intuitive method of addressing this challenge assigns different weights to different features, subsequently combining this information into a single score, named the compound covariate. Investigators commonly employ this score to assess whether an association exists between the compound covariate and clinical outcomes adjusted for baseline covariates. However, we found that some clinical papers concerned with such analysis report bias p-values based on flawed compound covariate in their training data set.Entities:
Mesh:
Year: 2012 PMID: 23281681 PMCID: PMC3524312 DOI: 10.1186/1752-0509-6-S3-S11
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Figure 1P-value distribution under the null hypothesis with nominal level 0.05.
Empirical type I error rates
| Method | The total number of genes | |||||
|---|---|---|---|---|---|---|
| 10 | 30 | 50 | 70 | |||
| 50 | 10% | 0.052 | 0.057 | 0.051 | 0.048 | |
| 40% | 0.041 | 0.047 | 0.045 | 0.046 | ||
| 75 | 10% | 0.052 | 0.048 | 0.045 | 0.046 | |
| 40% | 0.044 | 0.046 | 0.050 | 0.046 | ||
| 100 | 10% | 0.056 | 0.049 | 0.052 | 0.052 | |
| 40% | 0.045 | 0.044 | 0.048 | 0.050 | ||
| 50 | 10% | 0.058 | 0.046 | 0.052 | 0.050 | |
| 40% | 0.034 | 0.046 | 0.036 | 0.043 | ||
| 75 | 10% | 0.046 | 0.042 | 0.051 | 0.051 | |
| 40% | 0.044 | 0.038 | 0.044 | 0.040 | ||
| 100 | 10% | 0.051 | 0.046 | 0.048 | 0.060 | |
| 40% | 0.044 | 0.041 | 0.046 | 0.048 | ||
| 50 | 10% | 0.937 | 1.000 | 1.000 | 1.000 | |
| 40% | 0.910 | 1.000 | 1.000 | 1.000 | ||
| 75 | 10% | 0.944 | 1.000 | 1.000 | 1.000 | |
| 40% | 0.946 | 1.000 | 1.000 | 1.000 | ||
| 100 | 10% | 0.957 | 1.000 | 1.000 | 1.000 | |
| 40% | 0.952 | 1.000 | 1.000 | 1.000 | ||
| 50 | 10% | 0.926 | 1.000 | 1.000 | 1.000 | |
| 40% | 0.916 | 1.000 | 1.000 | 1.000 | ||
| 75 | 10% | 0.920 | 1.000 | 1.000 | 1.000 | |
| 40% | 0.936 | 1.000 | 1.000 | 1.000 | ||
| 100 | 10% | 0.929 | 1.000 | 1.000 | 1.000 | |
| 40% | 0.933 | 1.000 | 1.000 | 1.000 | ||
Empirical type I error rates for comparing SRCand DCmethods. The value of β was set to 0.
Power comparison under two different scenario
| Scenarios 1 | Scenarios 2 | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Strong effect: | |||||||||
| 50 | 10% | 0.757 | 0.742 | 0.675 | 0.650 | 0.600 | 0.599 | 0.723 | 0.692 |
| 30% | 0.624 | 0.580 | 0.536 | 0.490 | 0.480 | 0.422 | 0.546 | 0.505 | |
| 50% | 0.448 | 0.350 | 0.381 | 0.312 | 0.350 | 0.250 | 0.359 | 0.294 | |
| 75 | 10% | 0.960 | 0.956 | 0.907 | 0.902 | 0.876 | 0.870 | 0.944 | 0.942 |
| 30% | 0.883 | 0.864 | 0.828 | 0.766 | 0.783 | 0.771 | 0.875 | 0.822 | |
| 50% | 0.758 | 0.626 | 0.690 | 0.526 | 0.607 | 0.494 | 0.694 | 0.580 | |
| 100 | 10% | 0.998 | 0.997 | 0.985 | 0.982 | 0.974 | 0.973 | 0.996 | 0.995 |
| 30% | 0.982 | 0.974 | 0.948 | 0.917 | 0.940 | 0.905 | 0.966 | 0.955 | |
| 50% | 0.928 | 0.846 | 0.868 | 0.730 | 0.806 | 0.695 | 0.883 | 0.802 | |
| Low effect: | |||||||||
| 50 | 10% | 0.666 | 0.625 | 0.594 | 0.576 | 0.266 | 0.242 | 0.326 | 0.305 |
| 30% | 0.498 | 0.487 | 0.440 | 0.430 | 0.206 | 0.165 | 0.224 | 0.214 | |
| 50% | 0.362 | 0.285 | 0.328 | 0.244 | 0.144 | 0.122 | 0.162 | 0.124 | |
| 75 | 10% | 0.930 | 0.924 | 0.859 | 0.850 | 0.492 | 0.466 | 0.574 | 0.570 |
| 30% | 0.824 | 0.756 | 0.756 | 0.688 | 0.370 | 0.325 | 0.432 | 0.421 | |
| 50% | 0.642 | 0.553 | 0.571 | 0.469 | 0.263 | 0.206 | 0.312 | 0.224 | |
| 100 | 10% | 0.992 | 0.990 | 0.964 | 0.950 | 0.662 | 0.654 | 0.796 | 0.792 |
| 30% | 0.959 | 0.944 | 0.918 | 0.850 | 0.558 | 0.505 | 0.652 | 0.594 | |
| 50% | 0.852 | 0.760 | 0.802 | 0.654 | 0.412 | 0.319 | 0.472 | 0.370 | |
We compared the power under each method SRC/SRCand SC/SC. The first scenario considers 30 disease related genes. The second scenario considers 30 disease related genes, with the other 27 genes considered no effect.
Figure 2Power curves with varying gene effect.
Figure 3Power curves with varying gene effect and number of noise genes (sample size, 50, censoring fraction, 10%).
Figure 4Power curves with different numbers of genes and sample sizes.
Figure 5Kaplan-Meier curves for two data sets.
Breast cancer data set analysis
| Method | Coef | RR | p-value |
|---|---|---|---|
| 0.052 | 1.12 | 1.9 × 10-8 | |
| 0.022 | 1.12 | 1.8 × 10-8 | |
| 0.093 | 1.10 | 1.1 × 10-7 | |
| 0.040 | 1.04 | 1.3 × 10-7 | |
| 0.078 | 1.08 | 8.6 × 10-13 | |
| 0.015 | 1.02 | 1.1 × 10-13 |
To evaluate the established 70 breast cancer gene signature published by Van't Veer with ther proposed method.
Non-small-cell lung cancer data set analysis
| Method | Pathway | Coef | RR | p-value | Overall p-value |
|---|---|---|---|---|---|
| NOD | 0.033 | 1.0013 | 0.59 | 0.236 | |
| P53 | 0.037 | 1.0044 | 0.67 | ||
| NOD | 0.016 | 1.0063 | 0.37 | 0.358 | |
| P53 | 0.001 | 1.0002 | 0.99 | ||
| NOD | 0.077 | 1.08 | 0.36 | 0.432 | |
| P53 | 0.015 | 1.01 | 0.90 | ||
| NOD | 0.034 | 1.03 | 0.24 | 0.432 | |
| P53 | -0.01 | 0.99 | 0.74 | ||
| NOD | 0.072 | 1.07 | 0.37 | 2.29 × 10-6 | |
| P53 | 0.314 | 1.37 | 0.003 | ||
| NOD | 0.019 | 1.02 | 0.21 | 1.85 × 10-5 | |
| P53 | 0.055 | 1.06 | 0.006 | ||
To evaluate the 90 gene expression profiling from National Center for Biotechnology Information Gene Expression Omnibus (GSE14814). The first signaling pathway is the p53 pathway. The other pathway is the NOD-like receptor signaling pathway.