| Literature DB >> 20111740 |
Yvonne E Pittelkow1, Susan R Wilson.
Abstract
Scientific advances are raising expectations that patient-tailored treatment will soon be available. The development of resulting clinical approaches needs to be based on well-designed experimental and observational procedures that provide data to which proper biostatistical analyses are applied. Gene expression microarray and related technology are rapidly evolving. It is providing extremely large gene expression profiles containing many thousands of measurements. Choosing a subset from these gene expression measurements to include in a gene expression signature is one of the many challenges needing to be met. Choice of this signature depends on many factors, including the selection of patients in the training set. So the reliability and reproducibility of the resultant prognostic gene signature needs to be evaluated, in such a way as to be relevant to the clinical setting. A relatively straightforward approach is based on cross validation, with separate selection of genes at each iteration to avoid selection bias. Within this approach we developed two different methods, one based on forward selection, the other on genes that were statistically significant in all training blocks of data. We demonstrate our approach to gene signature evaluation with a well-known breast cancer data set.Entities:
Mesh:
Year: 2010 PMID: 20111740 PMCID: PMC2810473 DOI: 10.1155/2009/587405
Source DB: PubMed Journal: J Biomed Biotechnol ISSN: 1110-7243
Annotation analysis of the signatures 1 to 10; the table entries show the number of genes found in each signature to include the function shown in the first column, with a blank indicating zero.
| Function | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
|---|---|---|---|---|---|---|---|---|---|---|
| Cell cycle | 2 | 2 | 1 | 3 | 1 | 3 | ||||
| Cell Proliferation | 1 | 2 | 1 | 1 | ||||||
| DNA repair | 1 | 1 | 1 | 1 | ||||||
| Immune response | 1 | 1 | 2 | 3 | 1 | 1 | ||||
| Cell Growth | 1 | 2 | 1 | |||||||
| Transcription | 3 | 3 | 1 | 2 | 2 | 3 | 2 | 1 | 1 | 2 |
| Cell-cell signal | 1 | 1 | 1 | |||||||
| Development | 2 | 1 | 2 | 1 | ||||||
| ATP binding | 2 | 2 | 3 | 2 | 3 | 3 | 4 | 2 | 1 | |
| Nucleotide binding | 3 | 4 | 3 | 1 | 1 | 4 | 2 | 4 | 2 | |
| DNA binding | 2 | 2 | 1 | 1 | 1 | |||||
| Cell adhesion | 1 | 1 | 2 | 2 | 3 | 1 | 1 | |||
| Golgi stack | 1 | 1 | 1 | 1 | ||||||
| Kinase activity | 2 | 3 | 3 | 1 | 2 | 1 | 1 | 1 | 1 | 1 |
| Transferase activity | 2 | 2 | 3 | 3 | 1 | 1 | 2 | 1 | 1 |
Biased estimators of prediction performance for the ten signatures estimated on the training sets.
| Signature | Prop. of true positives | Prop. of false positives | Prop. correctly predicted |
|---|---|---|---|
| 1 | 0.943 | 0.213 | 0.891 |
| 2 | 0.943 | 0.206 | 0.892 |
| 3 | 0.926 | 0.230 | 0.874 |
| 4 | 0.934 | 0.230 | 0.879 |
| 5 | 0.959 | 0.213 | 0.901 |
| 6 | 0.942 | 0.197 | 0.896 |
| 7 | 0.942 | 0.262 | 0.874 |
| 8 | 0.967 | 0.164 | 0.923 |
| 9 | 0.950 | 0.279 | 0.874 |
| 10 | 0.942 | 0.344 | 0.846 |
| Average | 0.945 | 0.234 | 0.885 |
Unbiased estimators of prediction performance for the ten signatures estimated on the validation sets.
| Signature | Prop. of true positives | Prop. false positives | Prop. correctly predicted |
|---|---|---|---|
| 1 | 0.750 | 0.714 | 0.579 |
| 2 | 1.000 | 0.800 | 0.778 |
| 3 | 0.769 | 0.714 | 0.600 |
| 4 | 0.786 | 0.714 | 0.619 |
| 5 | 0.929 | 0.429 | 0.810 |
| 6 | 0.857 | 0.571 | 0.714 |
| 7 | 0.929 | 0.714 | 0.714 |
| 8 | 0.538 | 0.714 | 0.450 |
| 9 | 0.857 | 0.571 | 0.714 |
| 10 | 0.857 | 0.429 | 0.762 |
| Average | 0.827 | 0.637 | 0.674 |
Figure 1Performance assessment for method (ii). The pairs (TPF, FPF) estimated on the validation set are plotted in each graph. The assessment for the 59 gene signature (α1 ≤ .001) is shown on the upper and that for the 14 gene signature (α1 ≤ .0001) is shown on the lower.