Literature DB >> 23734164

Beware of risk for increased false positive rates in genome-wide association studies for phenotypic variability.

Xia Shen1, Orjan Carlborg.   

Abstract

Entities:  

Year:  2013        PMID: 23734164      PMCID: PMC3659368          DOI: 10.3389/fgene.2013.00093

Source DB:  PubMed          Journal:  Front Genet        ISSN: 1664-8021            Impact factor:   4.599


× No keyword cloud information.
Performing genome-wide association studies (GWAS) to identify genes regulating the between-genotype variability, rather than the mean, is a new promising approach for dissecting the genetics of complex traits. Using this strategy, Yang et al. (2012) successfully identified and replicated the FTO locus and showed that it has a role in regulating the between-genotype variance heterogeneity of human body mass index using a parametric regression model. This finding illustrates the potential clinical contribution of this type of inheritance and that it is not only a feature of model organisms (e.g., Queitsch et al., 2002; Sangster et al., 2008; Gangaraju et al., 2011; Jimenez-Gomez et al., 2011; Christine et al., 2012; Shen et al., 2012). As it is likely that this paper will increase the interest for applying this methodology in other human and experimental populations, we think that it is important to make prospective users aware that one need to be careful when applying similar methodology to smaller datasets than those used by Yang et al. Yang et al. (2012) noticed that the mapping of variance-controlling loci is prone to inflated test statistics when the minor allele frequency (MAF) is small, but provided no further explanation for this. Here, we will briefly explain why such observation is only half true and why GWAS analyses to detect variance heterogeneity is inherently sensitive to unbalanced data, and why researchers aiming to perform similar analyses need to be careful to avoid reporting false positive signals. The basis for the sensitivity of variance-heterogeneity GWAS analyses is that the commonly applied statistical tests for variance heterogeneity, including e.g., regression using the squared Z-score, the Levene test (Levene, 1960) and the Brown–Forsythe test (Brown and Forsythe, 1974), are biased when applied to imbalanced samples. The major reason for this is that the distribution of the variance often deviates from normality as it: (1) is bounded at zero; (2) has a distribution skewed to the right; (3) has a variance depending on its mean. Such deviations leads to violations of, e.g., the Gauss–Markov assumptions in a regression model (Plackett, 1950), which could cause problems such as those highlighted here. This bias is usually not discussed in the standard statistics literature as it appears only when the samples are severely imbalanced and is not sufficiently strong to be of importance when the tests are used in situations without excessive multiple-testing. GWAS analyses, however, goes well beyond normal statistical theory by doing hundreds of thousands to millions of tests in severely imbalanced samples. As we will show below, these situations could lead to problems with type I errors, even when stringent Bonferroni-corrected thresholds are used, unless caution is taken in the design of the study and in the quality control of the results. To illustrate this inherent problem in the statistical methodology used to test for variance heterogeneity, we used simple simulations in two populations: one with two genotypes: AA and BB and one with three genotypes: AA, AB, and BB. In the simulations, the number of individuals in the minor genotype class (NMG) was varied in populations of increasing sizes. Phenotypes were simulated as pure noise from a standard normal distribution, i.e., all significant signals are false-positives as no genetic effect was simulated. We performed 1,000,000 tests for a variance difference for each combination of population-size and NMG. The number of tests that exceeded the Bonferroni-corrected significance threshold for 1,000,000 independent tests was counted to provide an estimate of the expected number of false positive signals in a genome-scan. As shown in Figure 1A, when there are only two genotype classes, the type I error rate can be very large if the NMG contains fewer than 100 observations when using regression on the squared Z-score, and this cannot be overcome by increasing the total sample-size. The Levene and Brown–Forsythe tests also show such an inflation of false positives (Figure 1B), but use of a Gamma regression model, which accounts for the fact that the squared Z-score follows a chi-square distribution, overcomes this problem. Populations with three genotypes will, in practice, be more robust when the allele substitution model implemented in most GWAS-software is used (i.e., when regression on all three genotypes is used to estimate the additive effect). Inflated type I error rates are then observed only when the intermediate-size genotype class (i.e., in practice most often the heterozygotes) contains fewer than 100 individuals (Figures 1C–E). It should be noted, however, that if the additive genetic effect is estimated as a contrast between the homozygotes (ignoring heterozygotes) or if the dominance effect is included in the model, the bias will be determined by NMG in the same way as when only two genotype classes are present in the population. In our simulations, false signals appear only when the number of observations is lower in the high-variance class (not shown). When the low-variance class has fewer observations, the test is underpowered, which is a likely reason for the lack of false positives. This asymmetry in power has earlier been discussed by Shen et al. (2012).
Figure 1

–log Different sample sizes (n) and numbers of individuals in the minor genotype class (NMG = nfAA) were simulated. 1,000,000 replicates for each combination of n and NMG were performed with phenotypes simulated as white noise from a standard normal distribution (i.e., no genetic effects). The same method as that employed by (Yang et al., 2012), regression using the squared Z-scores, was used in the analyses of (A,C–E). Two genotype classes were simulated in (A) and (B), and three were simulated in (C) (one minor class), (D) (Hardy–Weinberg equilibrium) and (E) (two minor classes). The dashed horizontal lines show the Bonferroni corrected threshold for 1,000,000 tests. The black dot on each bar indicates the median of the 1,000,000 scores, and the top ends of the bars with different widths indicate 85, 95, 99, and 100% (maximum) quantiles of the scores. The labels on top of the bars are the corresponding numbers of false positives (NFP) above the threshold. f, frequency; p, minor allele frequency; q, 1–p; GLM, generalized linear model.

–log Different sample sizes (n) and numbers of individuals in the minor genotype class (NMG = nfAA) were simulated. 1,000,000 replicates for each combination of n and NMG were performed with phenotypes simulated as white noise from a standard normal distribution (i.e., no genetic effects). The same method as that employed by (Yang et al., 2012), regression using the squared Z-scores, was used in the analyses of (A,C–E). Two genotype classes were simulated in (A) and (B), and three were simulated in (C) (one minor class), (D) (Hardy–Weinberg equilibrium) and (E) (two minor classes). The dashed horizontal lines show the Bonferroni corrected threshold for 1,000,000 tests. The black dot on each bar indicates the median of the 1,000,000 scores, and the top ends of the bars with different widths indicate 85, 95, 99, and 100% (maximum) quantiles of the scores. The labels on top of the bars are the corresponding numbers of false positives (NFP) above the threshold. f, frequency; p, minor allele frequency; q, 1–p; GLM, generalized linear model. In practice this means that researchers aiming to perform a GWAS for detection of genes affecting the between-genotype variance difference need to be aware that they may take a considerable risk of obtaining excessive numbers of false positives when the allele-frequencies differ and the NMG is associated with the high-variance estimate. This applies even when stringent multiple-testing corrections are used. We therefore advise that results should be interpreted with caution when (i) the genetic effect in the model is a contrast between two genotype classes and there are less than 100 observations in the minor genotype class, or (ii) the genetic effect in the model is estimated using observations from three genotype classes and there are less than 100 observations in the intermediate-size genotype class. In such situations, a Gamma generalized linear models (GLM) should be applied to further examine the results.
  8 in total

1.  HSP90 affects the expression of genetic variation and developmental stability in quantitative traits.

Authors:  Todd A Sangster; Neeraj Salathia; Soledad Undurraga; Ron Milo; Kurt Schellenberg; Susan Lindquist; Christine Queitsch
Journal:  Proc Natl Acad Sci U S A       Date:  2008-02-19       Impact factor: 11.205

2.  Some theorems in least squares.

Authors:  R L PLACKETT
Journal:  Biometrika       Date:  1950-06       Impact factor: 2.445

3.  Hsp90 as a capacitor of phenotypic variation.

Authors:  Christine Queitsch; Todd A Sangster; Susan Lindquist
Journal:  Nature       Date:  2002-05-12       Impact factor: 49.962

4.  Inheritance beyond plain heritability: variance-controlling genes in Arabidopsis thaliana.

Authors:  Xia Shen; Mats Pettersson; Lars Rönnegård; Örjan Carlborg
Journal:  PLoS Genet       Date:  2012-08-02       Impact factor: 5.917

5.  Drosophila Piwi functions in Hsp90-mediated suppression of phenotypic variation.

Authors:  Vamsi K Gangaraju; Hang Yin; Molly M Weiner; Jianquan Wang; Xiao A Huang; Haifan Lin
Journal:  Nat Genet       Date:  2010-12-26       Impact factor: 38.330

6.  Lessons from model organisms: phenotypic robustness and missing heritability in complex disease.

Authors:  Christine Queitsch; Keisha D Carlson; Santhosh Girirajan
Journal:  PLoS Genet       Date:  2012-11-15       Impact factor: 5.917

7.  Genomic analysis of QTLs and genes altering natural variation in stochastic noise.

Authors:  Jose M Jimenez-Gomez; Jason A Corwin; Bindu Joseph; Julin N Maloof; Daniel J Kliebenstein
Journal:  PLoS Genet       Date:  2011-09-29       Impact factor: 5.917

8.  FTO genotype is associated with phenotypic variability of body mass index.

Authors:  Jian Yang; Ruth J F Loos; Joseph E Powell; Sarah E Medland; Elizabeth K Speliotes; Daniel I Chasman; Lynda M Rose; Gudmar Thorleifsson; Valgerdur Steinthorsdottir; Reedik Mägi; Lindsay Waite; Albert Vernon Smith; Laura M Yerges-Armstrong; Keri L Monda; David Hadley; Anubha Mahajan; Guo Li; Karen Kapur; Veronique Vitart; Jennifer E Huffman; Sophie R Wang; Cameron Palmer; Tõnu Esko; Krista Fischer; Jing Hua Zhao; Ayşe Demirkan; Aaron Isaacs; Mary F Feitosa; Jian'an Luan; Nancy L Heard-Costa; Charles White; Anne U Jackson; Michael Preuss; Andreas Ziegler; Joel Eriksson; Zoltán Kutalik; Francesca Frau; Ilja M Nolte; Jana V Van Vliet-Ostaptchouk; Jouke-Jan Hottenga; Kevin B Jacobs; Niek Verweij; Anuj Goel; Carolina Medina-Gomez; Karol Estrada; Jennifer Lynn Bragg-Gresham; Serena Sanna; Carlo Sidore; Jonathan Tyrer; Alexander Teumer; Inga Prokopenko; Massimo Mangino; Cecilia M Lindgren; Themistocles L Assimes; Alan R Shuldiner; Jennie Hui; John P Beilby; Wendy L McArdle; Per Hall; Talin Haritunians; Lina Zgaga; Ivana Kolcic; Ozren Polasek; Tatijana Zemunik; Ben A Oostra; M Juhani Junttila; Henrik Grönberg; Stefan Schreiber; Annette Peters; Andrew A Hicks; Jonathan Stephens; Nicola S Foad; Jaana Laitinen; Anneli Pouta; Marika Kaakinen; Gonneke Willemsen; Jacqueline M Vink; Sarah H Wild; Gerjan Navis; Folkert W Asselbergs; Georg Homuth; Ulrich John; Carlos Iribarren; Tamara Harris; Lenore Launer; Vilmundur Gudnason; Jeffrey R O'Connell; Eric Boerwinkle; Gemma Cadby; Lyle J Palmer; Alan L James; Arthur W Musk; Erik Ingelsson; Bruce M Psaty; Jacques S Beckmann; Gerard Waeber; Peter Vollenweider; Caroline Hayward; Alan F Wright; Igor Rudan; Leif C Groop; Andres Metspalu; Kay Tee Khaw; Cornelia M van Duijn; Ingrid B Borecki; Michael A Province; Nicholas J Wareham; Jean-Claude Tardif; Heikki V Huikuri; L Adrienne Cupples; Larry D Atwood; Caroline S Fox; Michael Boehnke; Francis S Collins; Karen L Mohlke; Jeanette Erdmann; Heribert Schunkert; Christian Hengstenberg; Klaus Stark; Mattias Lorentzon; Claes Ohlsson; Daniele Cusi; Jan A Staessen; Melanie M Van der Klauw; Peter P Pramstaller; Sekar Kathiresan; Jennifer D Jolley; Samuli Ripatti; Marjo-Riitta Jarvelin; Eco J C de Geus; Dorret I Boomsma; Brenda Penninx; James F Wilson; Harry Campbell; Stephen J Chanock; Pim van der Harst; Anders Hamsten; Hugh Watkins; Albert Hofman; Jacqueline C Witteman; M Carola Zillikens; André G Uitterlinden; Fernando Rivadeneira; M Carola Zillikens; Lambertus A Kiemeney; Sita H Vermeulen; Goncalo R Abecasis; David Schlessinger; Sabine Schipf; Michael Stumvoll; Anke Tönjes; Tim D Spector; Kari E North; Guillaume Lettre; Mark I McCarthy; Sonja I Berndt; Andrew C Heath; Pamela A F Madden; Dale R Nyholt; Grant W Montgomery; Nicholas G Martin; Barbara McKnight; David P Strachan; William G Hill; Harold Snieder; Paul M Ridker; Unnur Thorsteinsdottir; Kari Stefansson; Timothy M Frayling; Joel N Hirschhorn; Michael E Goddard; Peter M Visscher
Journal:  Nature       Date:  2012-09-16       Impact factor: 49.962

  8 in total
  8 in total

Review 1.  Pharmacogenetics of major depressive disorder: top genes and pathways toward clinical applications.

Authors:  Chiara Fabbri; Alessandro Serretti
Journal:  Curr Psychiatry Rep       Date:  2015-07       Impact factor: 5.285

2.  An accurate prediction model of digenic interaction for estimating pathogenic gene pairs of human diseases.

Authors:  Yangyang Yuan; Liubin Zhang; Qihan Long; Hui Jiang; Miaoxin Li
Journal:  Comput Struct Biotechnol J       Date:  2022-07-07       Impact factor: 6.155

3.  Genetic control of variability in subcortical and intracranial volumes.

Authors:  Aldo Córdova-Palomera; Dennis van der Meer; Tobias Kaufmann; Francesco Bettella; Yunpeng Wang; Dag Alnæs; Nhat Trung Doan; Ingrid Agartz; Alessandro Bertolino; Jan K Buitelaar; David Coynel; Srdjan Djurovic; Erlend S Dørum; Thomas Espeseth; Leonardo Fazio; Barbara Franke; Oleksandr Frei; Asta Håberg; Stephanie Le Hellard; Erik G Jönsson; Knut K Kolskår; Martina J Lund; Torgeir Moberget; Jan E Nordvik; Lars Nyberg; Andreas Papassotiropoulos; Giulio Pergola; Dominique de Quervain; Antonio Rampino; Genevieve Richard; Jaroslav Rokicki; Anne-Marthe Sanders; Emanuel Schwarz; Olav B Smeland; Vidar M Steen; Jostein Starrfelt; Ida E Sønderby; Kristine M Ulrichsen; Ole A Andreassen; Lars T Westlye
Journal:  Mol Psychiatry       Date:  2020-02-11       Impact factor: 15.992

4.  Natural CMT2 variation is associated with genome-wide methylation changes and temperature seasonality.

Authors:  Xia Shen; Jennifer De Jonge; Simon K G Forsberg; Mats E Pettersson; Zheya Sheng; Lars Hennig; Örjan Carlborg
Journal:  PLoS Genet       Date:  2014-12-11       Impact factor: 5.917

5.  Identification of Novel Genomic Associations and Gene Candidates for Grain Starch Content in Sorghum.

Authors:  Sirjan Sapkota; J Lucas Boatwright; Kathleen Jordan; Richard Boyles; Stephen Kresovich
Journal:  Genes (Basel)       Date:  2020-12-02       Impact factor: 4.096

6.  Genetic Architecture of Novel Sources for Reproductive Cold Tolerance in Sorghum.

Authors:  Subhadra Chakrabarty; Natalja Kravcov; André Schaffasz; Rod J Snowdon; Benjamin Wittkop; Steffen Windpassinger
Journal:  Front Plant Sci       Date:  2021-11-24       Impact factor: 5.753

7.  QTL Mapping on a Background of Variance Heterogeneity.

Authors:  Robert W Corty; William Valdar
Journal:  G3 (Bethesda)       Date:  2018-12-10       Impact factor: 3.154

8.  Interaction Effects of DRD2 Genetic Polymorphism and Interpersonal Stress on Problematic Gaming in College Students.

Authors:  Esther Kim; Dojin Lee; KyuMi Do; Jueun Kim
Journal:  Genes (Basel)       Date:  2022-02-28       Impact factor: 4.096

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.