Literature DB >> 15987437

Microarrays and breast cancer clinical studies: forgetting what we have not yet learnt.

Abstract

This review takes a sceptical view of the impact of breast cancer studies that have used microarrays to identify predictors of clinical outcome. In addition to discussing general pitfalls of microarray experiments, we also critically review the key breast cancer studies to highlight methodological problems in cohort selection, statistical analysis, validation of results and reporting of raw data. We conclude that the optimum use of microarrays in clinical studies requires further optimisation and standardisation of methodology and reporting, together with improvements in clinical study design.

Entities: Chemical Disease Gene Species

Mesh：

Substances：
Genetic Markers

Year: 2005 PMID： 15987437 PMCID： PMC1143564 DOI： 10.1186/bcr1017

Source DB: PubMed Journal: Breast Cancer Res ISSN： 1465-5411 Impact factor: 6.466

Introduction

By the time that a breast cancer is clinically apparent it has undergone multiple genetic and epigenetic primary carcinogenic events and further secondary molecular changes that ensure the adaptation of its cells to the changing micro-environment. The diversity of these genetic changes has made it difficult to classify breast cancer molecularly, and as a consequence there has been great enthusiasm for using genome-wide profiling methods to acquire a better understanding of the disease. This has led to an increasing number of studies using expression array profiling to improve the prediction of cancer prognosis [1-7]. Great things have been promised by exponents of these technologies [8]. How should we view the impact of current work?

Microarray technology

Irrespective of the questions being addressed in a profiling study, microarray techniques have inherent problems that lead to considerable data variability. Major sources of variability can arise from methods of RNA extraction [9,10], different types of probe preparation [9,11], probe labelling [12,13] and hybridisation [14,15]. It is also clear that varying the microarray platform, reference sample or segmentation method used for microarray image analysis leads to significant differences in data repeatability and gene discovery [16-18]. Although the MIAME (minimum information about a microarray experiment) report defines standards for information needed for reporting microarray experiments [19], it does not describe or quantify variabilities in the experiments. More studies addressing these experimental issues are urgently needed [20,21] along with efforts to define common standards for expression measurement controls. Guidelines are already emerging for best practice in using expression profiling for clinical trials [22]. The aim of supervised classification of microarray data is to detect genes that might prospectively predict defined outcomes. Existing studies in breast cancer have involved three steps: identifying a set of genes that are different between survival or drug response, refining this set for optimal classification within the sample set and finally validating the performance of the classifier genes on independent samples. Several studies have addressed these questions [1-7], but even before examining the technology a critical appraisal of the studies shows multiple methodological problems that make the interpretation of the results difficult.

Clinical study design

The problems can be summarised into four main categories: cohort selection, statistical analysis, validation of results and reporting of raw data. With the exception of the report by Chang and colleagues [5], studies were conducted as retrospective analyses of 'available' samples. Data collected retrospectively are inevitably incomplete, posing a complex problem in the interpretation of results [23,24]. Lack of detailed clinical information from paper records often means that important clinical predictors cannot be included in multivariate analysis to estimate the true predictive values of novel classifiers. This is exemplified by the studies from Ahr and colleagues [4] and van 't Veer and colleagues [2] that examined the association between a microarray classifier and prognosis without accounting for the effects of important clinical parameters such as performance status or treatment modality. The use of 'available' samples may introduce significant heterogeneity into patient characteristics and unexpected temporal effects. van de Vijver and colleagues [3] used a 'validation set' (see below) containing patients treated with different modalities of surgery, chemotherapy and radiotherapy over 11 years. Each of these variables could introduce significant prognostic differences and make the estimation of the true independent effect of a molecular classifier difficult. A multi-variable analysis of data from van de Vijver and colleagues [3] clearly shows a highly significant decrease in hazard of recurrence in patients treated with chemotherapy in comparison with those who received no chemotherapy (hazard ratio of 0.37; P < 0.001). This confounding variable combined with the limited number of samples tested makes the microarray results difficult to interpret. Prospective studies that are much less sensitive to these sources of bias should be the priority for future research.

Defined criteria and endpoints

However, it is vital that both prospective and retrospective studies use clinically relevant criteria for categorising patients; these should be clearly defined and prospectively applied. Chang and colleagues [5] used median residual volume to measure tumour response to docetaxel in a prospective study of 24 patients with primary breast cancer, although pathological response is known to be the most important clinical outcome measure because it is strongly correlated with survival [25]. van de Vijver and colleagues [3] classified their breast cancers as positive or negative for oestrogen receptor on the basis of the expression array values and not a validated immunohistochemical test. This value was then used inconsistently as a categorical variable for examining association with the prognostic signature, and as a continuous variable in multivariate analysis to examine the independent effect of the signature on prognosis. Arbitrarily defined outcome measures that do not represent established clinical criteria are likely to increase subjective bias.

Statistical considerations

How can we decide whether a classifier might be a useful clinical test? The performance of any test is dependent upon the cut-off point used to discriminate between outcomes. van 't Veer and colleagues [2] and van de Vijver and colleagues [3] claim a correct classification rate of 83% for good prognosis. Similarly, Huang and colleagues [7] report a 90% accuracy for predicting outcome. However, these results were based on arbitrarily defined cut-off values. As these cut-off points were user defined they do not allow true estimation of the predictive power of the classifier and the use of differing values by van de Vijver and colleagues [3] is inappropriate and confusing. A more robust estimate is obtained by using sensitivity and specificity values obtained at multiple cut-off points to draw a receiver operating characteristics (ROC) curve. The area under the curve (AUC) is the best estimate of the performance of a classifier and this method was used by Chang and colleagues [5]: the reported area under the curve for their classifier was 0.96 (range 0 to 1). Even with robust technology and rigorous analysis, the major challenge in the experimental design is the huge disproportion between the number of variables tested (gene expression values) and the number of samples. This inevitably leads to a high false-discovery rate and over-fitting of statistical models to the cohort under study (Fig. 1). It follows that appropriate validation of the classifier is an essential requirement in estimating the error of a classifier. Internal validation on the set from which a classifier was generated is usually performed. This is performed either by dividing the data into a test set (for obtaining a classifier) and a training set (for estimating the error) or by leaving one case out at a time, developing a model from the remaining cases (training set) and testing it on the omitted case (test set). In either method it is mandatory not to include all cases for developing a classifier before testing it on the training set because this results in overestimating the accuracy of a classifier. van 't Veer and colleagues [2] performed an internal validation on their data set with (improperly) and without (properly) this distinct separation between training sets and test sets. The published sensitivity of their classifier of 73% was obtained when the internal validation was improperly done and only 59% when the validation was properly done (published as supplementary material) [26,27].

Figure 1

A simple case of over-fitting. Consider that a researcher is studying the effect of TP53 expression level (x) on survival (y) of a group of breast cancer patients. (a) Simple regression: from knowing the expression level and survival (the variables) for each patient, the relationship between the two variables can be modelled with a simple univariable linear regression equation of the form y = a + bx, where a is the interception point with the y axis and b is the slope of the equation line. Applying this equation to a TP53 expression value will result in a new y value that corresponds to predicted survival. However, the equation seldom gives a perfect match between the real survival (triangles) and the predicted survival from the equation (circles) for any given x. In general, the closer the predicted values are to the real values, the better the equation (model) is in explaining the observations or the better the 'fit' of the model. The fit of the model is therefore used as a measure of its performance. (b) Over-fitting: an equation that is dependent on only two observations will always result in a line that passes between these two observations, giving an artificially perfect match between the predicted and the observed data. This represents meaningless good performance of a model or 'over-fitting'. This results from using too few observations (patients) per variable (gene) studied. To make a more complex 'multi-variable analysis' requires even more observations (patients) required to avoid over-fitting. In practice, a working ratio of 10 patients for every variable studied is recommended. However, in microarray studies few patients are evaluated for many thousands of genes.

Neither of the two types of internal validation is a substitute for independent validation on different data sets. Only three reports attempted such validation in breast cancer studies [2,3,5]. van 't Veer and colleagues [2] and Chang and colleagues [5] performed only a limited validation on 15 and 6 patients, respectively. Although van de Vijver and colleagues [3] reported a validation of the classifier of van 't Veer and colleagues [2] on 151 patients with lymph-node-negative disease, 61 patients were in fact taken from the original study. It is therefore unclear how applicable these classifiers are to the wider population at risk.

Reproducible analysis

These criticisms underscore the importance of comprehensive reporting of the raw data so that results can be compared and possibly validated with different studies. Sorlie and colleagues [1] published both microarray image files as well as individual feature intensity values, allowing full reinterpretation of their data. This example has not been followed by subsequent researchers. For example, van 't Veer and colleagues [2] merely reported average outcome correlations for 232 genes of their classifier and not the original raw data. Sotiriou and colleagues [6] identified 56 overlapping genes between their set of 485 differentially expressed genes and those reported by van 't Veer and colleagues [2]. Because the raw data for all the genes in the latter study are not available, it is difficult to exclude a random effect as the cause of this overlap. In addition, most descriptions of analysis methods in published papers are inadequate (for example see [28]). Analysis tools such as the open-source statistical language R and its microarray-specific Bioconductor packages are essentially high-level programming environments that oblige the user to enter declarations and expressions to analyse data [29,30]. This type of interaction makes it relatively easy to output detailed transcripts that contain both commands and data, and therefore allow reproducible analyses [31]. Analysis methods based on using software with graphical user interfaces are harder to record, but as a minimum, significant intermediate calculations and data objects should be submitted as supplementary information so that cross-checking by the reader is possible. Finally, to make the best use of microarray data sets, individual patient data should be anonymously reported and electronically accessible. The use of controlled vocabulary and standardised indices is critical for the reuse of clinical information.

Conclusion

Microarray profiling has, unquestionably, been established as a powerful tool in unravelling mechanistic insights into tumour biology. We argue here that the optimum use of such a technique in clinical studies requires the further optimisation and standardisation of reporting procedures coupled with carefully planned prospective studies. It is important to underscore the difference between validating a classifier and justifying its use in clinical practice. The latter requires evidence of significant improvement of clinical outcome for patients when a classifier is used to guide management. This ultimately requires testing a classifier in a randomised prospective trial to prove that a 'classifier-informed' management yields a better clinical outcome than a 'classifier-blind' arm. However, we argue that the data produced so far may be too preliminary to launch large-scale expensive phase III studies. Many of the methodological problems in identifying prognostic factors are not new and have been successively ignored by the clinical community over the past 20 years. The great danger of using new technology with newer problems is that these older lessons are quickly forgotten.

Competing interests

The author(s) declare that they have no competing interests.

30 in total

1. High-fidelity mRNA amplification for gene profiling.

Authors: E Wang; L D Miller; G A Ohnmacht; E T Liu; F M Marincola
Journal: Nat Biotechnol Date: 2000-04 Impact factor: 54.908

2. Minimum information about a microarray experiment (MIAME)-toward standards for microarray data.

Authors: A Brazma; P Hingamp; J Quackenbush; G Sherlock; P Spellman; C Stoeckert; J Aach; W Ansorge; C A Ball; H C Causton; T Gaasterland; P Glenisson; F C Holstege; I F Kim; V Markowitz; J C Matese; H Parkinson; A Robinson; U Sarkans; S Schulze-Kremer; J Stewart; R Taylor; J Vilo; M Vingron
Journal: Nat Genet Date: 2001-12 Impact factor: 38.330

3. Current issues for DNA microarrays: platform comparison, double linear amplification, and universal RNA reference.

Authors: Peter J Park; Yun Anna Cao; Sun Young Lee; Jong-Woo Kim; Mi Sook Chang; Rebecca Hart; Sangdun Choi
Journal: J Biotechnol Date: 2004-09-09 Impact factor: 3.307

4. Microarrays and clinical investigations.

Authors: Edison T Liu; Krishna R Karuturi
Journal: N Engl J Med Date: 2004-04-15 Impact factor: 91.245

Review 5. When is a genomic classifier ready for prime time?

Authors: Richard Simon
Journal: Nat Clin Pract Oncol Date: 2004-11

6. Pre-validation and inference in microarrays.

Authors: Robert J Tibshirani; Brad Efron
Journal: Stat Appl Genet Mol Biol Date: 2002-08-22

7. Gene expression profiling predicts clinical outcome of breast cancer.

Authors: Laura J van 't Veer; Hongyue Dai; Marc J van de Vijver; Yudong D He; Augustinus A M Hart; Mao Mao; Hans L Peterse; Karin van der Kooy; Matthew J Marton; Anke T Witteveen; George J Schreiber; Ron M Kerkhoven; Chris Roberts; Peter S Linsley; René Bernards; Stephen H Friend
Journal: Nature Date: 2002-01-31 Impact factor: 49.962

8. Distinctive gene expression patterns in human mammary epithelial cells and breast cancers.

Authors: C M Perou; S S Jeffrey; M van de Rijn; C A Rees; M B Eisen; D T Ross; A Pergamenschikov; C F Williams; S X Zhu; J C Lee; D Lashkari; D Shalon; P O Brown; D Botstein
Journal: Proc Natl Acad Sci U S A Date: 1999-08-03 Impact factor: 11.205

9. Prognostic significance of a complete pathological response after induction chemotherapy in operable breast cancer.

Authors: P Chollet; S Amat; H Cure; M de Latour; G Le Bouedec; M-A Mouret-Reynier; J-P Ferriere; J-L Achard; J Dauplat; F Penault-Llorca
Journal: Br J Cancer Date: 2002-04-08 Impact factor: 7.640

Review 10. Diagnostic and prognostic prediction using gene expression profiles in high-dimensional microarray data.

Authors: R Simon
Journal: Br J Cancer Date: 2003-11-03 Impact factor: 7.640

8 in total

1. Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer.

Authors: Liat Ein-Dor; Or Zuk; Eytan Domany
Journal: Proc Natl Acad Sci U S A Date: 2006-04-03 Impact factor: 11.205

2. Challenges translating breast cancer gene signatures into the clinic.

Authors: Britta Weigelt; Lajos Pusztai; Alan Ashworth; Jorge S Reis-Filho
Journal: Nat Rev Clin Oncol Date: 2011-08-30 Impact factor: 66.675

Review 3. The impact of expression profiling on prognostic and predictive testing in breast cancer.

Authors: J S Reis-Filho; C Westbury; J-Y Pierga
Journal: J Clin Pathol Date: 2006-03 Impact factor: 3.411

4. Stromal upregulation of lateral epithelial adhesions: gene expression analysis of signalling pathways in prostate epithelium.

Authors: Karen F Chambers; Joanna F Pearson; Davide Pellacani; Naveed Aziz; Miodrag Gužvić; Christoph A Klein; Shona H Lang
Journal: J Biomed Sci Date: 2011-06-22 Impact factor: 8.410

Review 5. Prediction of individual response to anticancer therapy: historical and future perspectives.

Authors: Florian T Unger; Irene Witte; Kerstin A David
Journal: Cell Mol Life Sci Date: 2014-11-12 Impact factor: 9.261

6. The Sweden Cancerome Analysis Network - Breast (SCAN-B) Initiative: a large-scale multicenter infrastructure towards implementation of breast cancer genomic analyses in the clinical routine.

Authors: Lao H Saal; Johan Vallon-Christersson; Jari Häkkinen; Cecilia Hegardt; Dorthe Grabau; Christof Winter; Christian Brueffer; Man-Hung Eric Tang; Christel Reuterswärd; Ralph Schulz; Anna Karlsson; Anna Ehinger; Janne Malina; Jonas Manjer; Martin Malmberg; Christer Larsson; Lisa Rydén; Niklas Loman; Åke Borg
Journal: Genome Med Date: 2015-02-02 Impact factor: 11.117

Review 7. Functional genomic analysis of drug sensitivity pathways to guide adjuvant strategies in breast cancer.

Authors: Charles Swanton; Zoltan Szallasi; James D Brenton; Julian Downward
Journal: Breast Cancer Res Date: 2008-10-31 Impact factor: 6.466

8. Using logistic regression to improve the prognostic value of microarray gene expression data sets: application to early-stage squamous cell carcinoma of the lung and triple negative breast carcinoma.

Authors: David W Mount; Charles W Putnam; Sara M Centouri; Ann M Manziello; Ritu Pandey; Linda L Garland; Jesse D Martinez
Journal: BMC Med Genomics Date: 2014-06-10 Impact factor: 3.063

8 in total