Literature DB >> 23825477

How the quality of GWAS of human lifespan and health span can be improved.

Anatoliy I Yashin¹, Konstantin G Arbeev, Deqing Wu, Liubov S Arbeeva, Alexander M Kulminski, Igor Akushevich, Irina Culminskaya, Eric Stallard, Svetlana V Ukraintseva.

Abstract

Entities: Disease Gene Species

Year: 2013 PMID： 23825477 PMCID： PMC3695378 DOI： 10.3389/fgene.2013.00125

Source DB: PubMed Journal: Front Genet ISSN： 1664-8021 Impact factor: 4.599

× No keyword cloud information.

Introduction

Most connections between phenotype and genetic variants detected in genome wide association studies (GWAS) of human longevity-related traits did not reach the genome-wide level of statistical significance. These estimates also suffer from the lack of replication of research findings in studies of independent populations (Deelen et al., 2011; Nebel et al., 2011). Critical analysis of these studies reveals underused reserves in the data that may improve the accuracy of genetic estimates. One such reserve deals with the proper use of genetic information contained in the age structure of study participants at the time of bio-specimen collection. As we will see later, this information has been ignored in genetic analyses of follow-up data. In this paper, we explain how such information can be used in analyses of follow up data and elucidate the benefits of joint analyses of both types of data. The proposed approach exploits the fact that participants in prospective studies often have different ages at baseline. The bio-specimen collection is often (but not always) done at baseline. Studies included in the CHARGE Consortium [except for the Original cohort of the Framingham Heart Study (FHS) where the bio-specimen collection was done some time after baseline] are typical examples. Since data on lifespan or other durations are often incomplete (e.g., censored), the Cox's type regression models is usually implemented in GWAS of these data (where, in addition to other covariates, conditioning on age at baseline is used). Note that using follow up data alone and conditioning on age at baseline may be required by the goal of the study (e.g., in the search for genes responsible for longest survival after reaching certain age, e.g., after 95 or 100 years). However, conditioning on the ages at bio-specimen collection when some of these ages are high enough may diminish or even totally exclude a substantial part of the genetic variation in longevity, i.e., eliminate the effects we are looking for. This is because the oldest old individuals participating in bio-specimen collection are precisely those who passed the process of mortality selection in the genetically heterogeneous population, and, therefore, are likely to carry genetic variants linked with “longevity” alleles (assuming that such alleles exist). Thus conditioning on the age at genotyping, especially when the oldest old study participants are in the sample, may leave little hope that associations of remaining genetic variants with human longevity will reach genome-wide significance. Such conditioning, however, seems to be a common practice in GWAS of human longevity-related traits dealing with prospective data. For example, a large group of researchers performed comprehensive genetic analyses of human lifespan and free of major diseases lifespan using data from nine studies of the CHARGE Consortium (Walter et al., 2011). The authors “conducted a survival analysis, adjusted for age at baseline and sex, to model continuous time to death or end of follow-up” [Walter et al. (2011), section “Methods”]. The Cox proportional hazards model was used to describe the connections between genetic variants and time to event. Note that, in addition to eliminating useful associations, adjusting for age at baseline may produce a bias in the analyses if the bio-specimen collection has been performed well after the time of the first examination (baseline). Such a situation characterizes the FHS data, which is the part of the CHARGE Consortium. It turns out, however, that in cases when the ages at bio-specimen collection include young adults and the oldest old individuals, the additional information about the role of genetic variants in lifespan can be obtained from the age patterns of genetic frequencies evaluated for any genetic variant even without using the follow up data. The approach based on comparison of genetic frequencies among individuals of different age categories is typically used in genetic studies of centenarians (Weir, 1996; Yashin et al., 1999, 2000; Tan et al., 2004). A monotonic increase in genetic frequency with age indicates that the corresponding variant is associated with lifespan increase (“longevity” allele). A monotonic decline in such frequency indicates that this variant contributes to shortening lifespan (deleterious, or “frailty” allele). In cases when genotyping involves individuals from a large spectrum of ages including young adults, old, and oldest old ages, there is an additional opportunity to improve the quality of genetic analyses which typically is not used in genetic association studies of follow up data. To realize such potential, new methods of association analyses are required.

The approach to improve the quality of GWAS of human longevity

One such method for joint analyses of the two types of data is based on maximizing the joint likelihood function of the combined data comprised of the cross-sectional age patterns and follow-up data. This method is a special case of a more general approach (Arbeev et al., 2011) that also includes follow-up data on non-genotyped individuals which can also be added to the likelihood function whenever such information is available. The total likelihood of the data is the product of the two likelihood functions representing each subset of available data. The benefits of such analyses are that both likelihoods are functions of the same parameters describing the relationship between genetic factors and the phenotype of interest. The total likelihood is a function of the mortality rates for carriers, μ(x|G = 1), and non-carriers, μ(x|G = 0), and the initial proportion of the index variant, P0 = P(G = 1) (see Arbeev et al., 2011). By maximizing this likelihood function, one can estimate the respective quantities and test the null hypothesis on coincidence of survival functions for carriers and non-carriers of the minor allele using the likelihood ratio test. The nature of the effect of such genetic variants on survival (e.g., the protective effect so that the survival curve for carriers is shifted to the right compared to non-carriers, or the deleterious one so that the curve for carriers is shifted to the left, or the trade-off so that survival curves intersect) can be understood by inspecting the respective estimates of parameters and/or visualizing the estimated survival curves.

Example

We performed a simulation study to illustrate how the approach combining information on follow-up and information on ages at biospecimen collection improves the accuracy and power of parameter estimates compared to the analyses that use only data on follow-up. To make the simulations close to reality, we generated data structure resembling the Long Life Family Study (LLFS) data (see description, e.g., in Yashin et al., 2010). This study deals with follow-up data on mortality, involves GWAS, and has a relatively short follow-up period. We assumed that carriers and non-carriers of some hypothetical allele in a population have mortality rates μ(x|G) = μ0(x)eγ , where G = 0 for non-carriers and G = 1 for carriers, and the baseline mortality μ0(x) is the Gompertz function, i.e., ln μ0(x) = ln a + bx. The Gompertz parameters a and b correspond to the initial mortality at birth and the rate of exponential growth in mortality with age in non-carriers. The regression parameter γ modifies the initial mortality for carriers so that it becomes aeγ. We used the Gompertz parameters ln a = −9.0 and b = 0.08 to produce reasonable survival patterns corresponding to human populations, and the proportion of carriers at birth P0 = 0.25. The parameter γ varied from −0.5 to 0.5 with the interval 0.05 to simulate scenarios with different effect sizes. First, we generated a large (10,000,000 individuals) “general population” assigning the genetic status (carrier/non-carrier of a hypothetical allele) to individuals in the general population in accordance with the initial proportion P0. Then we generated life spans for all these individuals from the respective probability distributions [i.e., those corresponding to the hazard μ0(x)eγ for carriers and μ0(x) for non-carriers, with the parameters defined above]. Then we assigned the hypothetical “age at entry” into the study to each individual in the population generated as a discrete random variable uniformly distributed over the interval 40–100 years. Such “ages at entry” were assumed to be the same as age at biospecimen collection, in line with the LLFS design. We collected a sample of 4500 individuals (close to the actual number of genotyped individuals in the LLFS data) whose life spans exceeded their hypothetical “age at entry.” Individuals with simulated life spans exceeding “age at entry” plus 6 years were considered censored at that age. The above procedure was repeated 1000 times (in each scenario with respective γ) to generate 1000 datasets which were subsequently estimated using the likelihood that uses only follow-up information and the likelihood that takes into account information on ages at biospecimen collection in addition to follow-up data. We calculated the average values and standard deviations of parameter estimates in 1000 simulated datasets in both methods (i.e., with only follow-up data and with follow-up data and information on ages at biospecimen collection) for each γ. The results showed that the standard deviations of γ were smaller in the method that takes into account ages at biospecimen collection (0.057 vs. 0.085, in average; a 33.2% relative decrease). Although such a difference may look small in absolute value and not worth mentioning at first sight, it corresponds to a substantial gain in power, if we translate these observations into power curves and curves for the level of the test that yields specific power as a function of the effect size (regression parameter). Figure 1A illustrates the empirical power in these two methods (with follow-up only, “FU”, and follow-up and ages at biospecimen collection, “FU + A”) for different effect sizes (i.e., values of the parameter γ) and α = 0.05. We also fitted these empirical values with the power curves of a one-sample Z-test of the mean and found the values of the standard deviations that produced the best fit to the empirical power curves for each method (0.056 for “FU + A” and 0.084 for “FU”), see solid and dashed lines in Figure 1A. Figure 1B shows the level of the test [shown as − log10(α) for better visibility] that yields power w = 0.8, as a function of the effect size in both methods (the curves were calculated using the abovementioned values of standard deviations). Figure 1C shows an example of distributions of the estimates in simulated datasets in both methods and their fits by normal distributions (this example corresponds to γ = 0.4 for which the standard deviations and the relative decrease are close to the average ones: 0.058, 0.087, and 33.6%).

Figure 1

(A) Power in two methods described in the text (with follow-up only, “FU,” and follow-up and ages at biospecimen collection, “FU + A”) for different effect sizes (i.e., values of the regression parameter γ) and fixed α = 0.05; the lines denote the fit of the empirical curves by the power curves of a one-sample Z-test of the mean (the standard deviations that produced the best fit are 0.056 for “FU + A” and 0.084 for “FU”). (B) The level of the test [shown as − log10(α) for better visibility] that yields power w = 0.8, as a function of the effect size in both methods (the curves are calculated using the abovementioned values of standard deviations). (C) Example of distributions of the estimates of the regression parameter (shown by histograms) for the scenario with γ = 0.4 in both methods. The lines show their fits by normal distributions (the Shapiro-Wilk p-values are 0.34 and 0.64 for “FU” and “FU + A,” respectively). The pentagram denotes the true value of the regression parameter.

Discussion

The results shown in Figure 1 indicate that the information on ages at biospecimen collection in addition to follow-up data gives a substantial increase in power compared to the traditional approach that uses the follow-up data only. It also follows from Figure 1 that for the effect size equal to 0.3 p-value reduces from 10−2 to 10−5 and for the effect size equal to 0.4 p-value drops from 10−4 to 10−9. This means that many genetic variants which would be not genome-wide significant in GWAS of follow-up data using the traditional Cox-type approach would become highly significant if the proposed approach is used. Our simulations with different follow-up periods (data not shown) reveal that this effect diminishes with an increase of a follow-up period. It is clear intuitively that, in the case of a growing follow-up period, information from this long follow-up makes an increasing contribution compared to information hidden in the distributions of ages at biospecimen collection. Conversely, in the case of a shorter follow-up period, distributions of ages at biospecimen collection play a more important role in differentiating the allele- or genotype-specific survival patterns from the data. Thus our results show that the additional use of information on ages at biospecimen collection may have important implications for GWA studies of longevity especially in cases with relatively short follow-up periods. The LLFS data provide a good example of a study which could potentially benefit from the addition of information on ages at biospecimen collection in its GWAS of longevity (as well as healthspan for which similar conclusions can be made). As longevity is known to run in families, family-based genome-wide association studies can provide additional advantages compared to analyses of independent samples. However, performing GWAS in cohort studies that contain both unrelated individuals and family members requires a special consideration of the analytical approach to take the full advantage of such data (Manichaikul et al., 2012).

9 in total

1. Genes, demography, and life span: the contribution of demographic data in genetic studies on aging and longevity.

Authors: A I Yashin; G De Benedictis; J W Vaupel; Q Tan; K F Andreev; I A Iachine; M Bonafe; M DeLuca; S Valensin; L Carotenuto; C Franceschi
Journal: Am J Hum Genet Date: 1999-10 Impact factor: 11.025

2. Genes and longevity: lessons from studies of centenarians.

Authors: A I Yashin; G De Benedictis; J W Vaupel; Q Tan; K F Andreev; I A Iachine; M Bonafe; S Valensin; M De Luca; L Carotenuto; C Franceschi
Journal: J Gerontol A Biol Sci Med Sci Date: 2000-07 Impact factor: 6.053

3. Assessing genetic association with human survival at multi-allelic loci.

Authors: Qihua Tan; G De Benedictis; A I Yashin; L Bathum; L Christiansen; J Dahlgaard; N Frizner; W Vach; J W Vaupel; K Christensen; T A Kruse
Journal: Biogerontology Date: 2004 Impact factor: 4.277

4. A genome-wide association study confirms APOE as the major gene influencing survival in long-lived individuals.

Authors: Almut Nebel; Rabea Kleindorp; Amke Caliebe; Michael Nothnagel; Hélène Blanché; Olaf Junge; Michael Wittig; David Ellinghaus; Friederike Flachsbart; Heinz-Erich Wichmann; Thomas Meitinger; Susanna Nikolaus; Andre Franke; Michael Krawczak; Mark Lathrop; Stefan Schreiber
Journal: Mech Ageing Dev Date: 2011-06-29 Impact factor: 5.432

5. Analysis of family- and population-based samples in cohort genome-wide association studies.

Authors: Ani Manichaikul; Wei-Min Chen; Kayleen Williams; Quenna Wong; Michèle M Sale; James S Pankow; Michael Y Tsai; Jerome I Rotter; Stephen S Rich; Josyf C Mychaleckyj
Journal: Hum Genet Date: 2011-07-30 Impact factor: 4.132

6. "Predicting" parental longevity from offspring endophenotypes: data from the Long Life Family Study (LLFS).

Authors: Anatoli I Yashin; Konstantin G Arbeev; Alexander Kulminski; Ingrid Borecki; Kaare Christensen; Michael Barmada; Evan Hadley; Winifred Rossi; Joseph H Lee; Rong Cheng; Irma T Elo
Journal: Mech Ageing Dev Date: 2010-02-23 Impact factor: 5.432

7. Evaluation of genotype-specific survival using joint analysis of genetic and non-genetic subsamples of longitudinal data.

Authors: Konstantin G Arbeev; Svetlana V Ukraintseva; Liubov S Arbeeva; Igor Akushevich; Alexander M Kulminski; Anatoliy I Yashin
Journal: Biogerontology Date: 2010-12-31 Impact factor: 4.277

8. A genome-wide association study of aging.

Authors: Stefan Walter; Gil Atzmon; Ellen W Demerath; Melissa E Garcia; Robert C Kaplan; Meena Kumari; Kathryn L Lunetta; Yuri Milaneschi; Toshiko Tanaka; Gregory J Tranah; Uwe Völker; Lei Yu; Alice Arnold; Emelia J Benjamin; Reiner Biffar; Aron S Buchman; Eric Boerwinkle; David Couper; Philip L De Jager; Denis A Evans; Tamara B Harris; Wolfgang Hoffmann; Albert Hofman; David Karasik; Douglas P Kiel; Thomas Kocher; Maris Kuningas; Lenore J Launer; Kurt K Lohman; Pamela L Lutsey; Johan Mackenbach; Kristin Marciante; Bruce M Psaty; Eric M Reiman; Jerome I Rotter; Sudha Seshadri; Michelle D Shardell; Albert V Smith; Cornelia van Duijn; Jeremy Walston; M Carola Zillikens; Stefania Bandinelli; Sebastian E Baumeister; David A Bennett; Luigi Ferrucci; Vilmundur Gudnason; Mika Kivimaki; Yongmei Liu; Joanne M Murabito; Anne B Newman; Henning Tiemeier; Nora Franceschini
Journal: Neurobiol Aging Date: 2011-07-22 Impact factor: 4.673

9. Genome-wide association study identifies a single major locus contributing to survival into old age; the APOE locus revisited.

Authors: Joris Deelen; Marian Beekman; Hae-Won Uh; Quinta Helmer; Maris Kuningas; Lene Christiansen; Dennis Kremer; Ruud van der Breggen; H Eka D Suchiman; Nico Lakenberg; Erik B van den Akker; Willemijn M Passtoors; Henning Tiemeier; Diana van Heemst; Anton J de Craen; Fernando Rivadeneira; Eco J de Geus; Markus Perola; Frans J van der Ouderaa; David A Gunn; Dorret I Boomsma; André G Uitterlinden; Kaare Christensen; Cornelia M van Duijn; Bastiaan T Heijmans; Jeanine J Houwing-Duistermaat; Rudi G J Westendorp; P Eline Slagboom
Journal: Aging Cell Date: 2011-05-06 Impact factor: 9.304

9 in total

10 in total

1. Genetic Structures of Population Cohorts Change with Increasing Age: Implications for Genetic Analyses of Human aging and Life Span.

Authors: Anatoliy I Yashin; Deqing Wu; Konstantin G Arbeev; Liubov S Arbeeva; Igor Akushevich; Alexander Kulminski; Irina Culminskaya; Eric Stallard; Svetlana V Ukraintseva
Journal: Ann Gerontol Geriatr Res Date: 2014

2. How Genes Modulate Patterns of Aging-Related Changes on the Way to 100: Biodemographic Models and Methods in Genetic Analyses of Longitudinal Data.

Authors: Anatoliy I Yashin; Konstantin G Arbeev; Deqing Wu; Liubov Arbeeva; Alexander Kulminski; Irina Kulminskaya; Igor Akushevich; Svetlana V Ukraintseva
Journal: N Am Actuar J Date: 2016-06-22

Review 3. How the effects of aging and stresses of life are integrated in mortality rates: insights for genetic studies of human health and longevity.

Authors: Anatoliy I Yashin; Konstantin G Arbeev; Liubov S Arbeeva; Deqing Wu; Igor Akushevich; Mikhail Kovtun; Arseniy Yashkin; Alexander Kulminski; Irina Culminskaya; Eric Stallard; Miaozhu Li; Svetlana V Ukraintseva
Journal: Biogerontology Date: 2015-08-18 Impact factor: 4.277

4. Birth Cohort, Age, and Sex Strongly Modulate Effects of Lipid Risk Alleles Identified in Genome-Wide Association Studies.

Authors: Alexander M Kulminski; Irina Culminskaya; Konstantin G Arbeev; Liubov Arbeeva; Svetlana V Ukraintseva; Eric Stallard; Deqing Wu; Anatoliy I Yashin
Journal: PLoS One Date: 2015-08-21 Impact factor: 3.240

5. Biodemographic Analyses of Longitudinal Data on Aging, Health, and Longevity: Recent Advances and Future Perspectives.

Authors: Konstantin G Arbeev; Igor Akushevich; Alexander M Kulminski; Svetlana V Ukraintseva; Anatoliy I Yashin
Journal: Adv Geriatr Date: 2014

6. Explicating heterogeneity of complex traits has strong potential for improving GWAS efficiency.

Authors: Alexander M Kulminski; Yury Loika; Irina Culminskaya; Konstantin G Arbeev; Svetlana V Ukraintseva; Eric Stallard; Anatoliy I Yashin
Journal: Sci Rep Date: 2016-10-14 Impact factor: 4.379

7. How Well Does the Family Longevity Selection Score Work: A Validation Test Using the Utah Population Database.

Authors: Liubov S Arbeeva; Heidi A Hanson; Konstantin G Arbeev; Alexander M Kulminski; Eric Stallard; Svetlana V Ukraintseva; Deqing Wu; Robert M Boudreau; Michael A Province; Ken R Smith; Anatoliy I Yashin
Journal: Front Public Health Date: 2018-10-01

8. Genetics of aging, health, and survival: dynamic regulation of human longevity related traits.

Authors: Anatoliy I Yashin; Deqing Wu; Liubov S Arbeeva; Konstantin G Arbeev; Alexander M Kulminski; Igor Akushevich; Mikhail Kovtun; Irina Culminskaya; Eric Stallard; Miaozhu Li; Svetlana V Ukraintseva
Journal: Front Genet Date: 2015-04-13 Impact factor: 4.599

Review 9. Genes and quantitative genetic variation involved with senescence in cells, organs, and the whole plant.

Authors: Benoit Pujol
Journal: Front Genet Date: 2015-02-23 Impact factor: 4.599

10. Composite Measure of Physiological Dysregulation as a Predictor of Mortality: The Long Life Family Study.

Authors: Konstantin G Arbeev; Olivia Bagley; Svetlana V Ukraintseva; Hongzhe Duan; Alexander M Kulminski; Eric Stallard; Deqing Wu; Kaare Christensen; Mary F Feitosa; Bharat Thyagarajan; Joseph M Zmuda; Anatoliy I Yashin
Journal: Front Public Health Date: 2020-03-06

10 in total