MOTIVATION: The process of producing microarray data involves multiple steps, some of which may suffer from technical problems and seriously damage the quality of the data. Thus, it is essential to identify those arrays with low quality. This article addresses two questions: (1) how to assess the quality of a microarray dataset using the measures provided in quality control (QC) reports; (2) how to identify possible sources of the quality problems. RESULTS: We propose a novel multivariate approach to evaluate the quality of an array that examines the 'Mahalanobis distance' of its quality attributes from those of other arrays. Thus, we call it Mahalanobis Distance Quality Control (MDQC) and examine different approaches of this method. MDQC flags problematic arrays based on the idea of outlier detection, i.e. it flags those arrays whose quality attributes jointly depart from those of the bulk of the data. Using two case studies, we show that a multivariate analysis gives substantially richer information than analyzing each parameter of the QC report in isolation. Moreover, once the QC report is produced, our quality assessment method is computationally inexpensive and the results can be easily visualized and interpreted. Finally, we show that computing these distances on subsets of the quality measures in the report may increase the method's ability to detect unusual arrays and helps to identify possible reasons of the quality problems. AVAILABILITY: The library to implement MDQC will soon be available from Bioconductor.
MOTIVATION: The process of producing microarray data involves multiple steps, some of which may suffer from technical problems and seriously damage the quality of the data. Thus, it is essential to identify those arrays with low quality. This article addresses two questions: (1) how to assess the quality of a microarray dataset using the measures provided in quality control (QC) reports; (2) how to identify possible sources of the quality problems. RESULTS: We propose a novel multivariate approach to evaluate the quality of an array that examines the 'Mahalanobis distance' of its quality attributes from those of other arrays. Thus, we call it Mahalanobis Distance Quality Control (MDQC) and examine different approaches of this method. MDQC flags problematic arrays based on the idea of outlier detection, i.e. it flags those arrays whose quality attributes jointly depart from those of the bulk of the data. Using two case studies, we show that a multivariate analysis gives substantially richer information than analyzing each parameter of the QC report in isolation. Moreover, once the QC report is produced, our quality assessment method is computationally inexpensive and the results can be easily visualized and interpreted. Finally, we show that computing these distances on subsets of the quality measures in the report may increase the method's ability to detect unusual arrays and helps to identify possible reasons of the quality problems. AVAILABILITY: The library to implement MDQC will soon be available from Bioconductor.
Authors: Ireen Klemp; Anne Hoffmann; Luise Müller; Tobias Hagemann; Kathrin Horn; Kerstin Rohde-Zimmermann; Anke Tönjes; Joachim Thiery; Markus Löffler; Ralph Burkhardt; Yvonne Böttcher; Michael Stumvoll; Matthias Blüher; Knut Krohn; Markus Scholz; Ronny Baber; Paul W Franks; Peter Kovacs; Maria Keller Journal: Clin Transl Med Date: 2022-06
Authors: Casey P Shannon; Zsuzsanna Hollander; Janet Wilson-McManus; Robert Balshaw; Raymond T Ng; Robert McMaster; Bruce M McManus; Paul A Keown; Scott J Tebbutt Journal: Bioinform Biol Insights Date: 2012-04-10
Authors: Oliver P Günther; Virginia Chen; Gabriela Cohen Freue; Robert F Balshaw; Scott J Tebbutt; Zsuzsanna Hollander; Mandeep Takhar; W Robert McMaster; Bruce M McManus; Paul A Keown; Raymond T Ng Journal: BMC Bioinformatics Date: 2012-12-08 Impact factor: 3.169