Literature DB >> 24555098

Issues with data transformation in genome-wide association studies for phenotypic variability.

Abstract

The purpose of this correspondence is to discuss and clarify a few points about data transformation used in genome-wide association studies, especially for phenotypic variability. By commenting on the recent publication by Sun et al. in the American Journal of Human Genetics, we emphasize the importance of statistical power in detecting functional loci and the real meaning of the scale of the phenotype in practice.

Entities: Chemical Gene Species

Year: 2013 PMID： 24555098 PMCID： PMC3869493 DOI： 10.12688/f1000research.2-200.v1

Source DB: PubMed Journal: F1000Res ISSN： 2046-1402

Correspondence

Recently, Sun et al. [1] raised an interesting suggestion concerning the use of variance-stabilization transformations in genome-wide association studies (GWAS) for phenotypic variability. Specifically, Sun et al. revisited Yang et al.’s [2] results on the variability-controlling locus FTO for human body mass index (BMI) and claimed that the underlying variability across genotypes might not be as large as Yang et al. had seen. Although it was an important point that Sun et al. discussed, especially when quantitatively studying phenotypic variability has become such a hot topic, it is our opinion that there are some issues with the transformation approach that Sun et al. proposed. First of all, if we take Sun et al.’s transformation according to Yang et al.’s phenotypic mean and variance per FTO genotype class, i.e. a one-to-one map through an inverse hyperbolic sine function, the BMI scale will become rather different compared with the ordinary measurement that we normally use ( Figure 1). On the transformed scale of BMI, the difference between two persons who have a BMI of 24 and 25 kg/m 2 is much larger than that between two BMIs of 20 and 21 kg/m 2, which is strange in reality since the original BMI scale is what we commonly use and also what we care about. Sun et al.’s main argument here is that nearly all the measurement units are manmade. However, considering one of the traits of most interest, e.g. height, why should we regard the difference between 160cm and 170cm different from 170cm and 180cm? Although the definitions of most units can be arbitrary, some measurement scales do have meaning in real life.

Figure 1.

Comparison of the original scale of body mass index (BMI) and the transformed scale using Sun et al.’s [1] transformation.

The transformation was determined by the phenotypic distribution across FTO genotypes reported by Yang et al. [2].

Comparison of the original scale of body mass index (BMI) and the transformed scale using Sun et al.’s [1] transformation.

The transformation was determined by the phenotypic distribution across FTO genotypes reported by Yang et al. [2]. Secondly, a key problem with Sun et al.’s transformation in practice is that such a transformation is marker-specific. Namely, when performing a GWAS, one needs to transform the phenotypic records differently for different markers, according to the phenotypic distribution across the genotypes per marker. This does not make much sense in practical analyses, because if there is a "best" scale of the phenotype, it should be used for all the markers across the genome, before testing the association between the phenotype and the markers. Using the tested marker to determine the transformation of the phenotype is strange. If a marker-specific transformation can be estimated, one should estimate a genome-specific transformation for GWAS, instead of doing different transformations marker-by-marker. Thirdly, if the transformation of the phenotype is determined by one marker showing a significant effect on the phenotypic variability before testing the other markers, another significant effect on the phenotypic variability might be created due to such a transformation. In such a situation, it is problematic to decide which phenotypic scale we should choose. Fourthly, several recent studies discussed that gene-gene or gene-environment interactions could cause significant variance heterogeneity across genotypes [3– 6], which makes testing variance-controlling loci a powerful tool to reveal potential interaction effects. Reducing the difference in variance across genotypes using a marker-specific variance-stabilization transformation would dramatically reduce such power. Regarding the biological sense of genetically regulated variance heterogeneity, empirical evidence has shown that a single causal locus could show a much higher significant effect on variance compared with the mean [6]. In a particular population, such a locus may only be mappable through testing the variability rather than the magnitude of the phenotype. The above issues cause us to question Sun et al.’s transformation in practice. The scale of the phenotype is certainly an important concern when interpreting an effect on phenotypic variability [7]. However, one needs to be careful for the points above before applying any transformation on the data. In particular, the statistical power in detecting functional loci and the real meaning of the scale used should be emphasized. We agree with criticism raised by Shen and Ronnegard in their points 2 and 3 concerning the application of the transformation of Sun et al. in the context of whole-genome scans. Indeed, applying this transformation in SNP-specific manner is difficult to adopt conceptually. Sun et al. rightly suggest that “the scales on which we measure interval-scale quantitative traits are man-made and have little intrinsic biological relevance”, but the underlying intrinsic scale, and the function reflecting this scale into the observed, is likely to be unique and does not change with SNP. In that, the transformation applied to a trait should not change through the markers studied. Practically, this is not very difficult to implement, and as a simplest option one could think of the estimation of Sun’s transformation parameters from upper, middle and lower tertiles of the total phenotypic distribution. A more general approach (without restricting the data into three groups, but modelling the variance as a function of the mean) should be straightforward to implement. We also understand the reasoning behind the Shen and Ronnegard’s points 1 and 4, but here we are less certain that the problem raised could be easily addressed. Specifically, one could argue with point 1 (“why should we regard the difference between 160cm and 170cm different from 170cm and 180cm?”): it is not that hard to imagine a biologically relevant model in which same changes of an “intrinsic scale” lead to different changes on the observed scale as the mean advances (an example would be Michaelis–Menten kinetics). Also both points 1 and 4 (losing power after transformation) relate not only to Sun et al.’s transformation, but to almost any transformation in wide use (e.g. Log, Box-Cox, Gaussenization/inverse-normal). While it is true that analysis of transformed trait may lead to reduced power (and specifically in case of Sun’s transformation applied in marker-specific manner to the analysis of variance heterogeneity it should), we have a feeling that one still would like to check whether the variance heterogeneity found can be modeled as a function of the mean (in which case any SNP affecting the mean is likely to show “control” of the variance as well). Finally, we fully agree with comment of William Hill and Ian White who criticize Sun et al.'s statement that “‘In the absence of genotypic mean differences, we can hardly infer that differences in variances are per se of biological interest”. We think that the differences in variance per se are biologically and genetically plausible and interesting. We have read this submission. We believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard. Shen and Rönnegård (SR) comment critically and succinctly on the paper by Sun et al. published in AJHG which advocates that, before any claim of differences in variance among genotypes in a GWAS or similar study, a check should first be made whether these can be removed by a monotonic transformation. Each of SR’s four criticisms seems well justified. As 105 or more SNPs may be fitted in a GWAS study, what biological interpretation could be given to that number of different transformations or even on a limited subset of loci showing possible variance differences? If some loci give signals of mean but not variance difference, should these then be transformed to eliminate the scale effect on mean and perhaps reveal variance differences? Any concept of an original scale of measurement is lost, as SR point out. It is not obvious why the mere existence of a transformation designed to minimise differences in variance should prevent discussion of variance heterogeneity on the chosen scale. Equivalently, if we considered means of the three genotypes at the locus rather than just average effects, would our ability to transform the data at each locus such that heterozygotes were intermediate imply there was no dominance, or only that it was on a particular scale? On a further point. Sun et al. (p395) comment: ‘In the absence of genotypic mean differences, we can hardly infer that differences in variances are per se of biological interest.’ That is to take too narrow a view: the mean and phenotypic variance (or CV) of a quantitative trait in any species take typical values, e.g. the CV for adult human height is ca. 4% and for BMI ca. 16% . There is direct evidence of genetic differences within species in environmental variance, with GWAS and other single gene studies, that cannot be removed by scale, so the level of the environmental variance is subject to evolutionary forces (e.g. Hill & Mulder 2010 Genet. Res. 92:381). To view variance as a biological phenomenon which is just some adjunct to the mean seems simplistic, as SR argue. Indeed one has to ask whether scale transformations have value unless there is a biological basis, such as a log transformation to account for multiplicative genetic effects; but that must then apply across all loci. We have read this submission. We believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

7 in total

1. What is the significance of difference in phenotypic variability across SNP genotypes?

Authors: Xiangqing Sun; Robert Elston; Nathan Morris; Xiaofeng Zhu
Journal: Am J Hum Genet Date: 2013-08-01 Impact factor: 11.025

2. On the use of variance per genotype as a tool to identify quantitative trait interaction effects: a report from the Women's Genome Health Study.

Authors: Guillaume Paré; Nancy R Cook; Paul M Ridker; Daniel I Chasman
Journal: PLoS Genet Date: 2010-06-17 Impact factor: 5.917

3. Variance heterogeneity analysis for detection of potentially interacting genetic loci: method and its limitations.

Authors: Maksim V Struchalin; Abbas Dehghan; Jacqueline Cm Witteman; Cornelia van Duijn; Yurii S Aulchenko
Journal: BMC Genet Date: 2010-10-13 Impact factor: 2.797

4. Inheritance beyond plain heritability: variance-controlling genes in Arabidopsis thaliana.

Authors: Xia Shen; Mats Pettersson; Lars Rönnegård; Örjan Carlborg
Journal: PLoS Genet Date: 2012-08-02 Impact factor: 5.917

5. Recent developments in statistical methods for detecting genetic loci affecting phenotypic variability.

Authors: Lars Rönnegård; William Valdar
Journal: BMC Genet Date: 2012-07-24 Impact factor: 2.797

6. Detecting major genetic loci controlling phenotypic variability in experimental crosses.

Authors: Lars Rönnegård; William Valdar
Journal: Genetics Date: 2011-04-05 Impact factor: 4.562

7. FTO genotype is associated with phenotypic variability of body mass index.

Authors: Jian Yang; Ruth J F Loos; Joseph E Powell; Sarah E Medland; Elizabeth K Speliotes; Daniel I Chasman; Lynda M Rose; Gudmar Thorleifsson; Valgerdur Steinthorsdottir; Reedik Mägi; Lindsay Waite; Albert Vernon Smith; Laura M Yerges-Armstrong; Keri L Monda; David Hadley; Anubha Mahajan; Guo Li; Karen Kapur; Veronique Vitart; Jennifer E Huffman; Sophie R Wang; Cameron Palmer; Tõnu Esko; Krista Fischer; Jing Hua Zhao; Ayşe Demirkan; Aaron Isaacs; Mary F Feitosa; Jian'an Luan; Nancy L Heard-Costa; Charles White; Anne U Jackson; Michael Preuss; Andreas Ziegler; Joel Eriksson; Zoltán Kutalik; Francesca Frau; Ilja M Nolte; Jana V Van Vliet-Ostaptchouk; Jouke-Jan Hottenga; Kevin B Jacobs; Niek Verweij; Anuj Goel; Carolina Medina-Gomez; Karol Estrada; Jennifer Lynn Bragg-Gresham; Serena Sanna; Carlo Sidore; Jonathan Tyrer; Alexander Teumer; Inga Prokopenko; Massimo Mangino; Cecilia M Lindgren; Themistocles L Assimes; Alan R Shuldiner; Jennie Hui; John P Beilby; Wendy L McArdle; Per Hall; Talin Haritunians; Lina Zgaga; Ivana Kolcic; Ozren Polasek; Tatijana Zemunik; Ben A Oostra; M Juhani Junttila; Henrik Grönberg; Stefan Schreiber; Annette Peters; Andrew A Hicks; Jonathan Stephens; Nicola S Foad; Jaana Laitinen; Anneli Pouta; Marika Kaakinen; Gonneke Willemsen; Jacqueline M Vink; Sarah H Wild; Gerjan Navis; Folkert W Asselbergs; Georg Homuth; Ulrich John; Carlos Iribarren; Tamara Harris; Lenore Launer; Vilmundur Gudnason; Jeffrey R O'Connell; Eric Boerwinkle; Gemma Cadby; Lyle J Palmer; Alan L James; Arthur W Musk; Erik Ingelsson; Bruce M Psaty; Jacques S Beckmann; Gerard Waeber; Peter Vollenweider; Caroline Hayward; Alan F Wright; Igor Rudan; Leif C Groop; Andres Metspalu; Kay Tee Khaw; Cornelia M van Duijn; Ingrid B Borecki; Michael A Province; Nicholas J Wareham; Jean-Claude Tardif; Heikki V Huikuri; L Adrienne Cupples; Larry D Atwood; Caroline S Fox; Michael Boehnke; Francis S Collins; Karen L Mohlke; Jeanette Erdmann; Heribert Schunkert; Christian Hengstenberg; Klaus Stark; Mattias Lorentzon; Claes Ohlsson; Daniele Cusi; Jan A Staessen; Melanie M Van der Klauw; Peter P Pramstaller; Sekar Kathiresan; Jennifer D Jolley; Samuli Ripatti; Marjo-Riitta Jarvelin; Eco J C de Geus; Dorret I Boomsma; Brenda Penninx; James F Wilson; Harry Campbell; Stephen J Chanock; Pim van der Harst; Anders Hamsten; Hugh Watkins; Albert Hofman; Jacqueline C Witteman; M Carola Zillikens; André G Uitterlinden; Fernando Rivadeneira; M Carola Zillikens; Lambertus A Kiemeney; Sita H Vermeulen; Goncalo R Abecasis; David Schlessinger; Sabine Schipf; Michael Stumvoll; Anke Tönjes; Tim D Spector; Kari E North; Guillaume Lettre; Mark I McCarthy; Sonja I Berndt; Andrew C Heath; Pamela A F Madden; Dale R Nyholt; Grant W Montgomery; Nicholas G Martin; Barbara McKnight; David P Strachan; William G Hill; Harold Snieder; Paul M Ridker; Unnur Thorsteinsdottir; Kari Stefansson; Timothy M Frayling; Joel N Hirschhorn; Michael E Goddard; Peter M Visscher
Journal: Nature Date: 2012-09-16 Impact factor: 49.962

7 in total

6 in total

1. Heritable environmental variance causes nonlinear relationships between traits: application to birth weight and stillbirth of pigs.

Authors: Herman A Mulder; William G Hill; Egbert F Knol
Journal: Genetics Date: 2015-01-27 Impact factor: 4.562

2. Major histocompatibility complex harbors widespread genotypic variability of non-additive risk of rheumatoid arthritis including epistasis.

Authors: Wen-Hua Wei; John Bowes; Darren Plant; Sebastien Viatte; Annie Yarwood; Jonathan Massey; Jane Worthington; Stephen Eyre
Journal: Sci Rep Date: 2016-04-25 Impact factor: 4.379

3. The Statistical Scale Effect as a Source of Positive Genetic Correlation Between Mean and Variability: A Simulation Study.

Authors: Adile Tatliyer; Isabel Cervantes; Nora Formoso-Rafferty; Juan Pablo Gutiérrez
Journal: G3 (Bethesda) Date: 2019-09-04 Impact factor: 3.154