V M Lourenço1, A M Pires, M Kirst. 1. Department of Mathematics, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, 2829-516 Caparica, Portugal. vmml@fct.unl.pt
Abstract
MOTIVATION: It is well known that data deficiencies, such as coding/rounding errors, outliers or missing values, may lead to misleading results for many statistical methods. Robust statistical methods are designed to accommodate certain types of those deficiencies, allowing for reliable results under various conditions. We analyze the case of statistical tests to detect associations between genomic individual variations (SNP) and quantitative traits when deviations from the normality assumption are observed. We consider the classical analysis of variance tests for the parameters of the appropriate linear model and a robust version of those tests based on M-regression. We then compare their empirical power and level using simulated data with several degrees of contamination. RESULTS: Data normality is nothing but a mathematical convenience. In practice, experiments usually yield data with non-conforming observations. In the presence of this type of data, classical least squares statistical methods perform poorly, giving biased estimates, raising the number of spurious associations and often failing to detect true ones. We show through a simulation study and a real data example, that the robust methodology can be more powerful and thus more adequate for association studies than the classical approach. AVAILABILITY: The code of the robustified version of function lmekin() from the R package kinship is provided as Supplementary Material.
MOTIVATION: It is well known that data deficiencies, such as coding/rounding errors, outliers or missing values, may lead to misleading results for many statistical methods. Robust statistical methods are designed to accommodate certain types of those deficiencies, allowing for reliable results under various conditions. We analyze the case of statistical tests to detect associations between genomic individual variations (SNP) and quantitative traits when deviations from the normality assumption are observed. We consider the classical analysis of variance tests for the parameters of the appropriate linear model and a robust version of those tests based on M-regression. We then compare their empirical power and level using simulated data with several degrees of contamination. RESULTS: Data normality is nothing but a mathematical convenience. In practice, experiments usually yield data with non-conforming observations. In the presence of this type of data, classical least squares statistical methods perform poorly, giving biased estimates, raising the number of spurious associations and often failing to detect true ones. We show through a simulation study and a real data example, that the robust methodology can be more powerful and thus more adequate for association studies than the classical approach. AVAILABILITY: The code of the robustified version of function lmekin() from the R package kinship is provided as Supplementary Material.
Authors: Albert M Levin; Rasika A Mathias; Lili Huang; Lindsey A Roth; Denise Daley; Rachel A Myers; Blanca E Himes; Isabelle Romieu; Mao Yang; Celeste Eng; Julie E Park; Karla Zoratti; Christopher R Gignoux; Dara G Torgerson; Joshua M Galanter; Scott Huntsman; Elizabeth A Nguyen; Allan B Becker; Moira Chan-Yeung; Anita L Kozyrskyj; Pui-Yan Kwok; Frank D Gilliland; W James Gauderman; Eugene R Bleecker; Benjamin A Raby; Deborah A Meyers; Stephanie J London; Fernando D Martinez; Scott T Weiss; Esteban G Burchard; Dan L Nicolae; Carole Ober; Kathleen C Barnes; L Keoki Williams Journal: J Allergy Clin Immunol Date: 2012-11-10 Impact factor: 10.793
Authors: D Zhi; M R Irvin; C C Gu; A J Stoddard; R Lorier; A Matter; D C Rao; V Srinivasasainagendra; H K Tiwari; A Turner; U Broeckel; D K Arnett Journal: Front Genet Date: 2012-05-28 Impact factor: 4.599
Authors: Marguerite R Irvin; Degui Zhi; Stella Aslibekyan; Steven A Claas; Devin M Absher; Jose M Ordovas; Hemant K Tiwari; Steve Watkins; Donna K Arnett Journal: PLoS One Date: 2014-06-06 Impact factor: 3.240
Authors: Marguerite R Irvin; May E Montasser; Tobias Kind; Sili Fan; Dinesh K Barupal; Amit Patki; Rikki M Tanner; Nicole D Armstrong; Kathleen A Ryan; Steven A Claas; Jeffrey R O'Connell; Hemant K Tiwari; Donna K Arnett Journal: Nutrients Date: 2021-11-10 Impact factor: 5.717
Authors: Tomi Akinyemiju; Anh N Do; Amit Patki; Stella Aslibekyan; Degui Zhi; Bertha Hidalgo; Hemant K Tiwari; Devin Absher; Xin Geng; Donna K Arnett; Marguerite R Irvin Journal: Clin Epigenetics Date: 2018-04-10 Impact factor: 6.551
Authors: Marguerite R Irvin; Stella Aslibekyan; Anh Do; Degui Zhi; Bertha Hidalgo; Steven A Claas; Vinodh Srinivasasainagendra; Steve Horvath; Hemant K Tiwari; Devin M Absher; Donna K Arnett Journal: Clin Epigenetics Date: 2018-04-18 Impact factor: 6.551