Literature DB >> 27980655

Multipoint association mapping for longitudinal family data: an application to hypertension phenotypes.

Yen-Feng Chiu1, Chun-Yi Lee1, Fang-Chi Hsu2.   

Abstract

It is essential to develop adequate statistical methods to fully utilize information from longitudinal family studies. We extend our previous multipoint linkage disequilibrium approach-simultaneously accounting for correlations between markers and repeat measurements within subjects, and the correlations between subjects in families-to detect loci relevant to disease through gene-based analysis. Estimates of disease loci and their genetic effects along with their 95 % confidence intervals (or significance levels) are reported. Four different phenotypes-ever having hypertension at 4 visits, incidence of hypertension, hypertension status at baseline only, and hypertension status at 4 visits-are studied using the proposed approach. The efficiency of estimates of disease locus positions (inverse of standard error) improves when using the phenotypes from 4 visits rather than using baseline only.

Entities:  

Year:  2016        PMID: 27980655      PMCID: PMC5133529          DOI: 10.1186/s12919-016-0049-2

Source DB:  PubMed          Journal:  BMC Proc        ISSN: 1753-6561


Background

Approaches for analyzing longitudinal family data have been categorized into 2 groups [1]: (a) first summarizing repeated measurements into 1 statistic (eg, a mean or slope per subject) and then using the summarized statistic as a standard outcome for genetic analysis; or (b) simultaneous modeling of genetic and longitudinal parameters. In general, joint modeling is appealing because (a) all parameter estimates are mutually adjusted, and (b) within- and between-individual variability at the levels of gene markers, repeat measurements, and family characteristics are correctly accounted for [1]. The semiparametric linkage disequilibrium mapping for the hybrid family design we developed previously [2] uses all markers simultaneously to localize the disease locus without making an assumption about genetic mechanism, except that only 1 disease gene lies in the region under study. The advantages of this approach are (a) it does not require the specification of an underlying genetic model, so estimating the position of a disease locus and its standard error is robust to a wide variety of genetic mechanisms; (b) it provides estimates of disease locus positions, along with a confidence interval for further fine mapping; and (c) it uses linkage disequilibrium between markers to localize the disease locus, which may not have been typed. We extended this approach to map susceptibility genes using longitudinal nuclear family data with an application to hypertension. Four different outcomes were used based on the proposed method: (I) ever having hypertension (“Ever”), (II) incidence event with status changed from unaffected to affected (“Progression”), (III) first available visit as baseline only (“Baseline”), and (IV) all available time points (“Longitudinal”). We compared the estimates of the disease locus positions, their standard errors, the genetic effect estimate at the disease loci, and their significance for the 4 phenotypes to examine the efficiency gained from using repeated longitudinal phenotypes.

Methods

Genome-wide genotypes and phenotype data

Association mapping was conducted on chromosome 3 of the genome-wide association study (GWAS) data. A total of 65,519 single-nucleotide polymorphisms (SNPs) included in 1095 genes were genotyped on chromosome 3 for 959 individuals from 20 original pedigrees in Genetic Analysis Workshop 19 (GAW19). Of these individuals, there were 178 (38 %) affected offspring out of 469 offspring for phenotype (I) “Ever”; 130 (31 %) out of 421 offspring for phenotype (II) “Progression”; 64 (11 %) out of 600 offspring for phenotype (III) “Baseline”; and 60 (11 %) out of 565 offspring to approximately 85 (45 %) out of 189 offspring across the 4 visits (or 87 [21.63 %] out of 402 offspring on average) for phenotype (IV) “Longitudinal” (Table 1). To compare phenotypes (I) and (II), only individuals with at least 2 measurements were included in the “ever” phenotype. PedCut [3] was used to split large pedigrees with members more than 20 members into nuclear pedigrees. Consequently, we analyzed a total of 138 pedigrees with 1,495 individuals (the IDs for missing parents were added to form trios). In divided pedigrees, the nuclear families contained between 3 and 25 individuals. Five SNPs were removed because they failed the test of Hardy-Weinberg equilibrium (HWE) (p value < 10−4). The HWE test was performed using PLINK 1.07 [4] based on 56 unrelated subjects. (For information on PLINK, see http://pngu.mgh.harvard.edu/purcell/plink/.) A total of 22,056 genotypes from various SNPs with genotyping errors (genotyping error rate was around 3.51 × 10−4) were further excluded by the MERLIN 1.1.2 computing package (see http://www.sph.umich.edu/csg/abecasis/merlin/tour/linkage.html). None of the covariates was adjusted for in this approach.
Table 1

Number of offspring for different phenotypes

EverProgressionBaselineVisit 1Visit 2Visit 3Visit 4
Affected offspring17813064607812585
All offspring469421600565426429189
Percentage0.380.310.110.110.180.290.45
Number of nuclear families17414921320316816579
Number of offspring for different phenotypes

Multipoint linkage disequilibrium mapping

Suppose M markers were genotyped in the region R at locations of 0 ≤ t 1 < t 2 < … < t  ≤ T. We assume there are 2 alleles per marker. With H (t) being the target allele at marker position t, and h (t) being the nontarget allele, we define for the affected offspring , and for the unaffected offspring . Then, we define the preferential transmission statistic for the paternal side and for the maternal side for a trio; similarly, the preferential transmission statistic and for an unaffected trio for both parental sides, respectively, where k  = 1, …, N 1 (for unaffected), N 1 (N 2) is the number of affected (unaffected) offspring in the family i at the l th time point, i = 1, … n, l = 1, …, L (L = 1 or 4 in this study). The expectation of the statistic is for case-parent trios and for control-parent trios, where is the recombination fraction between marker position t and disease locus position τ, the recombination fraction Θ is a parametric function of the parameter of primary interest (τ, the physical position of the functional variant), N is the number of generations since the initiation of the disease variant, Φ 1 denotes the event that the offspring is affected, Φ 2 represents the event that the offspring is unaffected, , is the vector of parameters, and π  = Pr [h(t ) |h(τ)]. is the probability for an affected offspring to receive a target allele, and is the probability for an unaffected offspring to receive a target allele. The statistic and were used to estimate the parameters. The estimating equations used to solve for parameters δ are: where is the average of nontransmitted parental alleles in the sample. The estimating equations were solved iteratively for parameters τ, N, C, and C*, where τ and C are the 2 parameters of interest. The variance of the disease locus position estimate was estimated to make inferences about the disease locus position (τ) and its genetic effect (C) [2]. Theoretically, the genetic effect of τ, characterized by C, is the transmission probability that the affected offspring will carry the disease allele, H, at τ. Detailed derivations for case-parent trios in a cross-sectional design can be found in Chiu et al. [2, 5]. We will present the details of this proposed methodology elsewhere. Gene-based association mapping was conducted for all SNPs on chromosome 3. This approach accounts for correlations between markers and repeated phenotypes within subjects, and correlations between subjects per family. The consistent estimates of hypertension locus position using “Ever” and “Progression” are shown in Table 2 and Fig. 1, while the consistent estimates of hypertension locus position using baseline and longitudinal data (at all 4 visits) are listed in Table 3 and Fig. 2.
Table 2

Significant and consistent estimates of disease locus positions and their genetic effects using “Ever” and “Progression” phenotypes

Gene*EverProgressionPrevious hits
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \widehat{\tau} $$\end{document}τ^ ± SE Ĉ p Value \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \widehat{\tau} $$\end{document}τ^ ± SE Ĉ p Value
FBLN2 13.6464 ± 0.000260.808.85 × 10−7 13.6462 ± 0.000300.882.47 × 10−6 L
C3orf19 14.6810 ± 0.000990.341.61 × 10−12 14.6802 ± 0.00100.346.87 × 10−11 L
C3orf20 14.7245 ± 0.000770.511.31 × 10−6 14.7244 ± 0.000910.452.81 × 10−7 L
OSBPL10 31.6853 ± 0.000510.417.69 × 10−8 31.6856 ± 0.000240.623.95 × 10−6 LG
CMTM8 32.3186 ± 0.000800.566.01 × 10−6 32.3183 ± 0.000520.701.20 × 10−5
BSN 49.6596 ± 0.000620.833.42 × 10−8 49.6594 ± 0.000770.773.58 × 10−6
RFT1 53.1117 ± 0.00120.374.07 × 10−6 53.1111 ± 0.00120.362.54 × 10−5
ADAMTS9 64.5214 ± 0.000280.531.05 × 10−11 64.5216 ± 0.000300.542.54 × 10−11 L
EPHA3 89.6014 ± 0.000420.801.41 × 10−6 89.6018 ± 0.000400.891.19 × 10−5
EPHA6 98.2999 ± 0.000470.413.57 × 10−8 98.2997 ± 0.000520.487.3 × 10−7 L
C3orf52 113.3097 ± 0.00260.626.55 × 10−9 113.3088 ± 0.00300.587.29 × 10−6 L
SIDT1 114.7743 ± 0.000390.788.45 × 10−7 114.7741 ± 0.000710.678.46 × 10−6 L
IFT122 130.7107 ± 0.00120.571.10 × 10−5 130.7118 ± 0.000600.714.90 × 10−7
RBP1 140.7325 ± 0.000190.656.40 × 10−7 140.7345 ± 0.000330.423.89 × 10−11 L
PLOD2 147.3469 ± 0.000980.343.56 × 10−6 147.3469 ± 0.00160.343.02 × 10−5 L
LEKR1 158.2181 ± 0.000360.721.66 × 10−10 158.2183 ± 0.000430.774.35 × 10−10 L
RSRC1 159.4005 ± 0.000590.514.35 × 10−6 159.4003 ± 0.000640.511.64 × 10−5 L
ECT2 174.0021 ± 0.000640.881.91 × 10−6 174.0022 ± 0.000631.002.92 × 10−7 L
PEX5L 181.0080 ± 0.00780.292.99 × 10−5 181.0145 ± 0.0130.235.23 × 10−7 LG
LPP 189.5573 ± 0.000350.501.71 × 10−6 189.5574 ± 0.000220.537.05 × 10−6
OSTN 192.4272 ± 0.00180.735.33 × 10−14 192.4301 ± 0.00120.804.11 × 10−9 G

Ĉ, the genetic effect estimate; G, previous GWAS hits; L, previous linkage hits; , the disease locus position estimate in cM

*Because of space limitations, we list only the 2 phenotypes with consistent estimates for the disease locus positions (the difference between the 2 for both phenotypes is less than 10−2 cM) and significant estimates for the genetic effects (both with P < 4.57 × 10−5, Bonferroni)

Fig. 1

Length of 95 % confidence intervals (CIs) for the estimate of the disease locus position for “Ever” and “Progression” phenotypes

Table 3

Significant and consistent estimates of disease locus positions and their genetic effects using “Baseline” and “Longitudinal” phenotypes

Gene*BaselineLongitudinalPrevious hits
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \widehat{\tau} $$\end{document}τ^ ± SE Ĉ p Value \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \widehat{\tau} $$\end{document}τ^ ± SE Ĉ p Value
GRM7 7.4917 ± 0.000480.442.87 × 10−5 7.4871 ± 0.00150.756.04 × 10−14 LG
SLC4A7 27.4521 ± 0.0000450.300.01427.4520 ± 0.0000670.300.0024LG
SCN10A 38.7559 ± 0.00890.0880.01938.7611 ± 0.00180.730.0022
AC092058.3 39.5105 ± 0.00200.0760.03639.5102 ± 0.000240.210.00022
LTF 46.4731 ± 0.000590.170.04646.4733 ± 0.000450.310.0099
NEK4 52.7326 ± 0.000710.830.0001052.7277 ± 0.00240.860.00024
FAM116A 57.6101 ± 0.000230.692.58 × 10−6 57.6107 ± 0.000320.610.011
LRIG1 66.5968 ± 0.00260.280.01866.5961 ± 0.000640.600.0022L
TBC1D23 101.5084 ± 0.00110.460.026101.5148 ± 0.00100.730.0011L
ALCAM 106.7625 ± 0.000690.830.028106.7598 ± 0.000410.620.00013L
PLCXD2 112.9440 ± 0.000870.500.0016112.9422 ± 0.00620.480.00020L
LSAMP 117.0676 ± 0.000600.430.00022117.0671 ± 0.000250.860.00012L
ILDR1 123.2009 ± 0.00110.700.013123.2008 ± 0.000980.910.023
PDIA5 124.3194 ± 0.00760.0650.0028124.3225 ± 0.00200.680.0086
HPS3 150.3484 ± 0.00160.141.65 × 10−5 150.3521 ± 0.000630.770.0080L
CASRL1 157.2304 ± 0.00370.190.012157.2295 ± 0.000940.280.031L
C3orf55 158.7595 ± 0.000740.900.0051158.7634 ± 0.00120.913.90 × 10−6 L
IGF2BP2 186.9725 ± 0.000180.740.041186.9719 ± 0.000311.000.031
FETUB 187.8470 ± 0.000310.380.0012187.8503 ± 0.0170.0420.0021
IL1RAP 191.8193 ± 0.0120.0740.0060191.8203 ± 0.00120.794.75 × 10−6
C3orf21 196.2815 ± 0.00360.62<10−18 196.2821 ± 0.00110.970.00057
KIAA0226 198.9161 ± 0.00380.0710.024198.9168 ± 0.000760.150.022

Ĉ, the genetic effect estimate; G, previous GWAS hits; L, previous linkage hits; , the disease locus position estimate in cM

*Displayed are all genes where p ≤ 0.05

†The gene is significant with the Bonferroni correction (P < 4.57 × 10−5) and its P values are 2.31 × 10−6 and 0.00044 for “Ever” and “Progression,” respectively

‡ The same genes for the “Ever” and “Progression” phenotypes had P values <0.05 but > 4.57 × 10−5 for the genetic effect estimate

Fig. 2

Length of 95 % confidence intervals (CIs) for the estimate of the disease locus position for “Baseline” and “Longitudinal” phenotypes

Significant and consistent estimates of disease locus positions and their genetic effects using “Ever” and “Progression” phenotypes Ĉ, the genetic effect estimate; G, previous GWAS hits; L, previous linkage hits; , the disease locus position estimate in cM *Because of space limitations, we list only the 2 phenotypes with consistent estimates for the disease locus positions (the difference between the 2 for both phenotypes is less than 10−2 cM) and significant estimates for the genetic effects (both with P < 4.57 × 10−5, Bonferroni) Length of 95 % confidence intervals (CIs) for the estimate of the disease locus position for “Ever” and “Progression” phenotypes Significant and consistent estimates of disease locus positions and their genetic effects using “Baseline” and “Longitudinal” phenotypes Ĉ, the genetic effect estimate; G, previous GWAS hits; L, previous linkage hits; , the disease locus position estimate in cM *Displayed are all genes where p ≤ 0.05 †The gene is significant with the Bonferroni correction (P < 4.57 × 10−5) and its P values are 2.31 × 10−6 and 0.00044 for “Ever” and “Progression,” respectively ‡ The same genes for the “Ever” and “Progression” phenotypes had P values <0.05 but > 4.57 × 10−5 for the genetic effect estimate Length of 95 % confidence intervals (CIs) for the estimate of the disease locus position for “Baseline” and “Longitudinal” phenotypes

Results and discussion

A total of 119 (11 %), 79 (7 %), 49 (4 %), and 42 (4 %) of 1095 genes had a significant genetic effect (P < 4.57 × 10−5 with Bonferroni correction) based on hypertension status at “Ever,” “Progression,” baseline (“Baseline”), and 4 visits (“Longitudinal”), respectively. There are only 3 significantly associated genes (P ≤ 0.05) for baseline and longitudinal phenotypes duplicated with the significantly associated genes for “Ever” and “Progression” outcomes: FETUB, IL1RAP, and C3orf21. Several hits identified here have been reported from linkage or GWAS studies. Table 2 shows genes with a significant genetic effect (P < 4.57 × 10−5). Table 3 presents the genes that are significant at a significance level of 0.05. Only 1 gene, GRM7, is significant at the level of P < 4.57 × 10−5. Figures 1 and 2 display the 95 % confidence intervals for the estimate of the hypertension locus position for the 4 phenotypes centered at the estimated disease locus position. The comparison is shown for the genes listed in Tables 2 and 3. The standard errors of the estimates for the disease locus position are smaller in 64 % of the genes based on longitudinal data (Table 3) compared to those based on baseline data. This is because those incidence cases included in “Progression” were also included in the analysis of “Ever.” Only prevalent cases, a relatively small proportion, are additionally included in the analysis of “Ever.” Thus, the results from “Progression” and “Ever” are similar.

Conclusions

Methods of genetic analysis rely heavily on correlations among family members’ outcomes to infer genetic effects, whereas longitudinal studies allow investigators to study factors’ effects on outcomes and changes over time [1]. To retrieve full information from longitudinal family data, appropriate statistical approaches are crucial. We proposed a multipoint linkage disequilibrium approach accounting for multilevel correlations between markers per subject, within-subject longitudinal observations, and subjects within families, aiming to correctly localize the disease locus and assess its genetic effects. This approach has several advantages: it allows us to estimate the disease locus position, the disease locus’s genetic effect, and the 95 % confidence intervals without specifying a disease genetic mode and yet making full use of the markers and repeated measurements. In addition, this approach treats the genotype data as random conditional on the phenotype, eliminating the problem of ascertainment bias. We applied this approach to the baseline and longitudinal prevalence/incidence of hypertension events. The efficiency of parameter estimates was similar for the “Ever” and “Progression” categories, but was improved with repeated longitudinal outcomes compared to the use of “Baseline” only. This difference between analyses might largely result from the different total sample sizes and proportions of hypertensive subjects for different phenotypes. Several identified genes on chromosome 3 for hypertension were consistent with findings from previous linkage and association studies. Despite its advantages, this proposed approach also has limitations; for example, covariate adjustment is not available.
  5 in total

1.  Longitudinal data analysis in pedigree studies.

Authors:  W James Gauderman; Stuart Macgregor; Laurent Briollais; Katrina Scurrah; Martin Tobin; Taesung Park; Dai Wang; Shaoqi Rao; Sally John; Shelley Bull
Journal:  Genet Epidemiol       Date:  2003       Impact factor: 2.135

2.  Incorporating covariates into multipoint association mapping in the case-parent design.

Authors:  Yen-Feng Chiu; Kung-Yee Liang; Wen-Harn Pan
Journal:  Hum Hered       Date:  2010-03-24       Impact factor: 0.444

3.  PLINK: a tool set for whole-genome association and population-based linkage analyses.

Authors:  Shaun Purcell; Benjamin Neale; Kathe Todd-Brown; Lori Thomas; Manuel A R Ferreira; David Bender; Julian Maller; Pamela Sklar; Paul I W de Bakker; Mark J Daly; Pak C Sham
Journal:  Am J Hum Genet       Date:  2007-07-25       Impact factor: 11.025

4.  Analysis of family- and population-based samples using multiple linkage disequilibrium mapping.

Authors:  Yen-Feng Chiu; Chun-Yi Lee; Hui-Yi Kao; Wen-Harn Pan; Fang-Chi Hsu
Journal:  Ann Hum Genet       Date:  2013-01-18       Impact factor: 1.670

5.  An approach for cutting large and complex pedigrees for linkage analysis.

Authors:  Fan Liu; Anatoliy Kirichenko; Tatiana I Axenovich; Cornelia M van Duijn; Yurii S Aulchenko
Journal:  Eur J Hum Genet       Date:  2008-02-27       Impact factor: 4.246

  5 in total
  1 in total

1.  RNA Sequencing Reveals Novel Transcripts from Sympathetic Stellate Ganglia During Cardiac Sympathetic Hyperactivity.

Authors:  Emma N Bardsley; Harvey Davis; Olujimi A Ajijola; Keith J Buckler; Jeffrey L Ardell; Kalyanam Shivkumar; David J Paterson
Journal:  Sci Rep       Date:  2018-06-05       Impact factor: 4.379

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.