| Literature DB >> 20191105 |
David E Axelrod1, Naomi Miller, Judith-Anne W Chapman.
Abstract
Information about tumors is usually obtained from a single assessment of a tumor sample, performed at some point in the course of the development and progression of the tumor, with patient characteristics being surrogates for natural history context. Differences between cells within individual tumors (intratumor heterogeneity) and between tumors of different patients (intertumor heterogeneity) may mean that a small sample is not representative of the tumor as a whole, particularly for solid tumors which are the focus of this paper. This issue is of increasing importance as high-throughput technologies generate large multi-feature data sets in the areas of genomics, proteomics, and image analysis. Three potential pitfalls in statistical analysis are discussed (sampling, cut-points, and validation) and suggestions are made about how to avoid these pitfalls.Entities:
Year: 2009 PMID: 20191105 PMCID: PMC2828739 DOI: 10.4137/bii.s2222
Source DB: PubMed Journal: Biomed Inform Insights ISSN: 1178-2226
Figure 1Cut-points of a continuous distribution of breast cancer patients. Eighty patients with in situ duct carcinoma of the breast were ranked by a canonical variable derived by weighting 39 nuclear features of 200 cells per patient.25 Left panel: there is a continuous distribution of patients. Middle panel: one cut-point at the mean separates patients into two groups (Low and High). Right panel: two cut-points separate patients into three groups (Low, Intermediate, and High). The choice of the number of cut-points, and the value of the cut-points, produces different groups of patients. Such discrete groups of patients are useful for comparing the outcome of the groups of patients, for instance by Kaplan-Meier survival analysis. However, the discrete groups are not distinct biological classes recognized from the distribution of their canonical variable of nuclear features.