Arief Gusnanto1, Stefano Calza, Yudi Pawitan. 1. Medical Research Council - Biostatistics Unit, Institute of Public Health, Cambridge, UK. Arief.Gusnanto@mrc.cam.ac.uk
Abstract
PURPOSE OF REVIEW: To highlight the development in microarray data analysis for the identification of differentially expressed genes, particularly via control of false discovery rate. RECENT FINDINGS: The emergence of high-throughput technology such as microarrays raises two fundamental statistical issues: multiplicity and sensitivity. We focus on the biological problem of identifying differentially expressed genes. First, multiplicity arises due to testing tens of thousands of hypotheses, rendering the standard P value meaningless. Second, known optimal single-test procedures such as the t-test perform poorly in the context of highly multiple tests. The standard approach of dealing with multiplicity is too conservative in the microarray context. The false discovery rate concept is fast becoming the key statistical assessment tool replacing the P value. We review the false discovery rate approach and argue that it is more sensible for microarray data. We also discuss some methods to take into account additional information from the microarrays to improve the false discovery rate. SUMMARY: There is growing consensus on how to analyse microarray data using the false discovery rate framework in place of the classical P value. Further research is needed on the preprocessing of the raw data, such as the normalization step and filtering, and on finding the most sensitive test procedure.
PURPOSE OF REVIEW: To highlight the development in microarray data analysis for the identification of differentially expressed genes, particularly via control of false discovery rate. RECENT FINDINGS: The emergence of high-throughput technology such as microarrays raises two fundamental statistical issues: multiplicity and sensitivity. We focus on the biological problem of identifying differentially expressed genes. First, multiplicity arises due to testing tens of thousands of hypotheses, rendering the standard P value meaningless. Second, known optimal single-test procedures such as the t-test perform poorly in the context of highly multiple tests. The standard approach of dealing with multiplicity is too conservative in the microarray context. The false discovery rate concept is fast becoming the key statistical assessment tool replacing the P value. We review the false discovery rate approach and argue that it is more sensible for microarray data. We also discuss some methods to take into account additional information from the microarrays to improve the false discovery rate. SUMMARY: There is growing consensus on how to analyse microarray data using the false discovery rate framework in place of the classical P value. Further research is needed on the preprocessing of the raw data, such as the normalization step and filtering, and on finding the most sensitive test procedure.
Authors: Felix L Struebing; Rebecca King; Ying Li; Micah A Chrenek; Polina N Lyuboslavsky; Curran S Sidhu; P Michael Iuvone; Eldon E Geisert Journal: J Neurotrauma Date: 2017-08-18 Impact factor: 5.269
Authors: Mikhail G Dozmorov; Robert E Hurst; Daniel J Culkin; Bradley P Kropp; Mark Barton Frank; Jeanette Osban; Trevor M Penning; Hsueh-Kung Lin Journal: Prostate Date: 2009-07-01 Impact factor: 4.104
Authors: Ting Wen; Emily M Stucke; Tommie M Grotjan; Katherine A Kemme; J Pablo Abonia; Philip E Putnam; James P Franciosi; Jose M Garza; Ajay Kaul; Eileen C King; Margaret H Collins; Jonathan P Kushner; Marc E Rothenberg Journal: Gastroenterology Date: 2013-08-23 Impact factor: 22.682
Authors: Elisabeth P Nacheva; Diana Brazma; Anna Virgili; Julie Howard-Reeves; Anastasios Chanalaris; Katya Gancheva; Margarita Apostolova; Mikel Valgañon; Helen Mazzullo; Colin Grace Journal: BMC Genomics Date: 2010-01-18 Impact factor: 3.969