| Literature DB >> 15987437 |
Ahmed Ashour Ahmed1, James D Brenton.
Abstract
This review takes a sceptical view of the impact of breast cancer studies that have used microarrays to identify predictors of clinical outcome. In addition to discussing general pitfalls of microarray experiments, we also critically review the key breast cancer studies to highlight methodological problems in cohort selection, statistical analysis, validation of results and reporting of raw data. We conclude that the optimum use of microarrays in clinical studies requires further optimisation and standardisation of methodology and reporting, together with improvements in clinical study design.Entities:
Mesh:
Substances:
Year: 2005 PMID: 15987437 PMCID: PMC1143564 DOI: 10.1186/bcr1017
Source DB: PubMed Journal: Breast Cancer Res ISSN: 1465-5411 Impact factor: 6.466
Figure 1A simple case of over-fitting. Consider that a researcher is studying the effect of TP53 expression level (x) on survival (y) of a group of breast cancer patients. (a) Simple regression: from knowing the expression level and survival (the variables) for each patient, the relationship between the two variables can be modelled with a simple univariable linear regression equation of the form y = a + bx, where a is the interception point with the y axis and b is the slope of the equation line. Applying this equation to a TP53 expression value will result in a new y value that corresponds to predicted survival. However, the equation seldom gives a perfect match between the real survival (triangles) and the predicted survival from the equation (circles) for any given x. In general, the closer the predicted values are to the real values, the better the equation (model) is in explaining the observations or the better the 'fit' of the model. The fit of the model is therefore used as a measure of its performance. (b) Over-fitting: an equation that is dependent on only two observations will always result in a line that passes between these two observations, giving an artificially perfect match between the predicted and the observed data. This represents meaningless good performance of a model or 'over-fitting'. This results from using too few observations (patients) per variable (gene) studied. To make a more complex 'multi-variable analysis' requires even more observations (patients) required to avoid over-fitting. In practice, a working ratio of 10 patients for every variable studied is recommended. However, in microarray studies few patients are evaluated for many thousands of genes.