Literature DB >> 18466597

Comparison of multipoint linkage analyses for quantitative traits in the CEPH data: parametric LOD scores, variance components LOD scores, and Bayes factors.

Yun Ju Sung1, Yanming Di, Audrey Q Fu, Joseph H Rothstein, Weiva Sieh, Liping Tong, Elizabeth A Thompson, Ellen M Wijsman.   

Abstract

We performed multipoint linkage analyses with multiple programs and models for several gene expression traits in the Centre d'Etude du Polymorphisme Humain families. All analyses provided consistent results for both peak location and shape. Variance-components (VC) analysis gave wider peaks and Bayes factors gave fewer peaks. Among programs from the MORGAN package, lm_multiple performed better than lm_markers, resulting in less Markov-chain Monte Carlo (MCMC) variability between runs, and the program lm_twoqtl provided higher LOD scores by also including either a polygenic component or an additional quantitative trait locus.

Entities:  

Year:  2007        PMID: 18466597      PMCID: PMC2367516          DOI: 10.1186/1753-6561-1-s1-s93

Source DB:  PubMed          Journal:  BMC Proc        ISSN: 1753-6561


Background

Our aims were 1) to compare results from several multipoint linkage analysis programs that are available for quantitative traits and 2) to investigate the performance of MCMC-based programs on the GAW15 expression data in 14 three-generation CEPH families genotyped for clustered SNP markers [1]. We used three recently developed programs in the MORGAN package [2]: lm_markers, lm_multiple, and lm_twoqtl. These programs provide MCMC-based parametric LOD score analysis, the first two with a one-QTL (1Q) model and the last with more complex models, including a second linked (2Q) or unlinked (UQ) QTL and/or a polygenic component (P). In addition, we used Loki [3] for Bayesian oligogenic analysis and Merlin [4] for VC analysis. These analyses cover most approaches that fully use quantitative trait data from three-generation pedigrees.

Methods

Phenotypes used

For 62 traits previously reported to show evidence of linkage [5,6], we performed genome-wide VC analysis and obtained the maximum likelihood estimate (MLE) of heritability (h2). We chose six traits that showed high VC LOD scores and h2 ≥ 0.31: CHI3L2, GSTM1, PSPH, VAMP8, PPAT, and TM7SF3. The first two of these had only a single peak with VC LOD > 3, representing potentially simple traits, and the latter four had multiple peaks, representing potentially complex traits. For these six traits, we performed Bayesian oligogenic joint segregation and linkage analyses using Loki and parametric LOD score analysis with a 1Q model using lm_markers and lm_multiple. For the first four traits only, we also performed parametric LOD score analysis with more complex models using lm_twoqtl.

Genetic map and marker data

We used the Rutgers map [7] for linkage analysis. We converted Kosambi map positions to Haldane map positions for analysis, although for ease of comparison with other GAW contributions we present all results on a Kosambi scale. We also constructed a jittered map by adding 0.01 cM between markers with identical positions on this map. We excluded sex chromosomes and used the sex-averaged jittered map for all our linkage analyses because neither MORGAN nor Loki allows multiple markers at the same position. For the VC analysis, we also used the nonjittered map as a comparison. We used Merlin to identify all Mendelian-inconsistent genotypes (69 marker-family combinations) and any obligate recombinations within each cluster (166 cluster-family, or 508 marker-family combinations), where a cluster is defined as a set of markers that have the same Rutgers map position. We coded these markers as missing genotypes in all members of the families with an apparent error.

Segregation and linkage analyses

For the 62 traits, we performed genome-wide VC linkage analysis with Merlin for both the jittered and original nonjittered maps. VC LOD scores were computed only at the marker positions. We also obtained MLEs of h2 for these 62 traits with a VC polygenic model [8]. Using Merlin, we obtained MLEs of marker allele frequencies, which we used in all linkage analyses. For the six traits, we performed Bayesian oligogenic segregation analysis and oligogenic joint segregation and linkage analysis using Loki. For segregation analysis, we used every fourth iteration in a 50 k iteration run to estimate QTL models. For linkage analysis, we used every fourth iteration in a 999 k iteration run to compute Bayes factors for presence versus absence of a QTL in each 2-cM bin. We used QTL models estimated from Bayesian segregation analysis in all our LOD score analyses. We recently developed three programs in MORGAN: lm_markers, lm_multiple, and lm_twoqtl. The first two programs compute LOD scores for the 1Q model, and lm_twoqtl computes LOD scores for more complex models [9]. In addition to its MCMC-based approach, lm_markers now can also provide exact computation of LOD scores for small pedigrees with many markers. No other programs provide parametric LOD scores for quantitative traits with many markers. The program lm_multiple differs from lm_markers only in that, instead of updating only one meiosis at a time, it uses an improved sampler that simultaneously updates either a randomly chosen subset of up to eight meioses or a possibly larger subset of meioses in closely related individuals, such as siblings [10]. This multiple-meiosis updating can improve estimates of LOD scores, particularly for data with large sibships. Finally, lm_twoqtl provides LOD scores with models that include additional linked or unlinked QTLs and a polygenic component. Incorporating better modeling of complex traits into linkage analysis can provide higher LOD scores and better localization for complex traits [9]. We performed parametric linkage analysis using these three MORGAN programs. For the six traits, we obtained ten estimates of LOD scores using MCMC and both lm_markers (3 k and 30 k scans) and lm_multiple (3 k scans), to compare their performance. For comparison, we also computed exact LOD scores for the 1Q model, also using lm_markers. Parameter values for the trait model were almost identical to those for the mixed model in Table 1, except for using σ2(a) + σ2(e) as the environmental variance. For the first four traits, we also used lm_twoqtl with one linked plus one unlinked QTL (1Q + UQ) and one QTL plus a polygenic component (1Q + P) models. In addition, for VAMP8, we used lm_twoqtl with a two-linked-QTL (2Q) model. For the first three traits, the secondary QTL model was from oligogenic segregation analysis, whereas for VAMP6, the secondary QTL model was the same as the first QTL model. LOD scores at the marker positions as well as midway between two markers were evaluated for all MORGAN programs. We obtained initial starting configurations by using sequential imputation for all MORGAN programs and the locus sampler for Loki. Burn-in iterations were 150 for all MORGAN programs and 1000 for Loki. We used a 50:50 ratio of locus to meiosis sampler for all MCMC-based analyses. For lm_multiple, the probabilities for updating meioses from random subsets, individuals, full sibships, and full three-generation families were 0.2, 0.3, 0.3, and 0.2. For lm_twoqtl, we used every tenth scan in a 30 k scan run for computing LOD scores. For lm_markers and lm_multiple, we used every scan.
Table 1

Oligogenic segregation analysis results

TraitTranscriptP(A)μ(AA)μ(Aa)μ(aa)σ2(q)σ2(a)σ2(e)h2 Lokih2 MLE
1CHI3L2213060_s_at0.567.989.8410.510.960.240.220.800.69
2GSTM1204550_x_at0.778.019.179.500.350.030.150.700.68
3PSPH205048_s_at0.896.438.889.511.020.550.120.850.64
4VAMP8202546_at0.2810.2010.3610.690.030.020.070.380.38
5PPAT209433_s_at0.218.739.599.700.040.070.080.550.33
6TM7SF3217974_at0.195.206.826.960.110.170.200.560.31

P(A), frequency of allele A; μ(AA), phenotypic mean of genotype AA; σ2(q), variance due to the major QTL; σ2(a), polygenic variance; σ2(e), environmental variance; h2, heritability

Oligogenic segregation analysis results P(A), frequency of allele A; μ(AA), phenotypic mean of genotype AA; σ2(q), variance due to the major QTL; σ2(a), polygenic variance; σ2(e), environmental variance; h2, heritability

Results

VC LOD scores and heritabilities for the 62 traits

Of the 62 traits, 24 had a VC LOD score ≥ 3, with h2 ranging from 0.13 to 0.86. Five traits had a maximum VC LOD score < 1, with h2 ranging from 0 to 0.11. Most traits had only a single peak in the genome with VC LOD ≥ 3, suggesting a simple mode of inheritance. Two traits (PSPH and DDX17) had three peaks with VC LOD ≥ 3, and three traits (PPAT, HSD17B12, TUBG1) had two peaks with VC LOD ≥ 3. The jittered and nonjittered maps yielded virtually identical VC LOD scores, except for VAMP8 on chr 2, where the largest peak was slightly narrower with the nonjittered map. We chose the six traits CHI3L2, GSTM1, PPAT, PSPH, TM7SF3, and VAMP8 for further analysis. The actual locations of these genes were at the maximum VC LOD scores (CHI3L2, GSTM1, PSPH), 10 cM away (VAMP8), or 25 cM away (PPAT). Bayesian oligogenic segregation analysis for these traits provided posterior mean numbers of QTLs ranging from 2 to 3.5. Estimation of the primary QTL model was relatively straightforward (Table 1), whereas the secondary or weaker QTL models were less obvious. Heritabilities estimated from Bayesian oligogenic segregation analysis were sometimes higher than MLEs of h2 obtained from a VC polygenic model. This is not surprising because VC analysis with Merlin uses only additive genetic variance, thus providing only narrow-sense heritabilities, whereas Loki allows for dominance effects, thus providing larger broad-sense heritabilities.

Bayes factors using an oligogenic model for the 6 traits

Bayes factors generally matched the VC LOD scores in both peak location and general shape (Figure 1, Table 2), with two minor differences. First, Bayes factors provided much narrower peaks than did VC LOD scores. Second, Bayes factors did not provide several modest peaks that were obtained with VC analysis. For PSPH, Bayes factors did not provide evidence of linkage on chr 2, whereas VC LOD scores provided bimodal peaks with VC LODs of 2.6 and 2.8. Also, Bayes factors did not confirm a secondary peak obtained by VC analysis on chr 8 for PSPH and chr 2 for VAMP8. The primary QTL model estimated from segregation analysis almost always appeared on the chromosomes with the strongest linkage signals. The traits with support for linkage to more than one chr are: PSPH with a strong signal on chr 7 (Fig. 1C) and a modest signal on chr 8, TM7SF3 with moderate signals on both chr 2 and chr 12, and VAMP8 with a strong signal on chr 2 (Fig. 1D) and a weaker signal on chr 4.
Figure 1

Linkage analyses of 4 traits. A, CHI3L2 on chr 1. B, GSTM1 on chr 1. C, PSPH on chr 7. D, VAMP8 on chr 2. One linked QTL plus polygenic (magenta, long-dashed), one linked QTL plus one unlinked QTL (blue, dotted), one QTL (black, solid), VC (green, short-dashed), log 10 of Bayes factors (cyan, dot-dashed), and two linked QTLs (red, dot-dot-dashed).

Table 2

Highest LOD score or log (Bayes factor) and run time (in minutes)

CHI3L2 147 cM (chr 1)aGSTM1 142 cM (chr 1)PSPH 80 cM (chr 7)VAMP8 113 cM (chr 2)




ModelProgramScansStatbTimeStatTimeStatTimeStatTime
1QExactNA11.512296.3123410.14703.61044
lm_multiple3 k11.3–11.5446.2–6.3459.9–10.1333.6–3.643
lm_markers3 k10.7–11.6215.7–6.3218.1–10.3133.5–3.620
lm_markers30 k10.6–11.61775.7–6.31688.1–10.11103.2–3.6153
1Q + Plm_twoqtl30 k13.75637.260410.84013.8535
1Q + UQlm_twoqtl3 k13.435685.681640.45424.1808
VCMerlinNA1325.7214.3152
BayesianLoki999 k2.97072.67002.65041.9513

apeak position (± 1 cM) from all analyses and gene location except VAMP8 (120–123 cM).

bMORGAN and VC programs, the statistic (stat) is the LOD score with range (min and max) over 10 runs and time is the median of 10 runs for MCMC programs; Loki, stat is the log10 (Bayes factor) for one run.

Linkage analyses of 4 traits. A, CHI3L2 on chr 1. B, GSTM1 on chr 1. C, PSPH on chr 7. D, VAMP8 on chr 2. One linked QTL plus polygenic (magenta, long-dashed), one linked QTL plus one unlinked QTL (blue, dotted), one QTL (black, solid), VC (green, short-dashed), log 10 of Bayes factors (cyan, dot-dashed), and two linked QTLs (red, dot-dot-dashed). Highest LOD score or log (Bayes factor) and run time (in minutes) apeak position (± 1 cM) from all analyses and gene location except VAMP8 (120–123 cM). bMORGAN and VC programs, the statistic (stat) is the LOD score with range (min and max) over 10 runs and time is the median of 10 runs for MCMC programs; Loki, stat is the log10 (Bayes factor) for one run.

LOD scores using a one-QTL model for the six traits

Model-based LOD scores matched VC LOD scores in both peak location and general shape (Fig. 1, Table 2). The only minor difference was that the model-based LOD score did not provide a third peak between the two peaks that the VC LOD score provided for TM7SF3 on chr 12. For most traits several of the 14 pedigrees were almost uninformative for linkage, the model giving negligible probability that the QTL was segregating in the pedigree (Table 3). For PSPH, the low trait allele frequency led to 9 of the 14 pedigrees being uninformative.
Table 3

Exact LOD scores by family at chromosomal locations with the highest overall LOD score

Pedigree

TraitChrcM1234567891011121314All
CHI3L211470.60.762.342.34-0.031.84-0.721.091.682.02-0.62-0.050.79-0.5611.48
GSTM1114201.67-0.011.40.420000-1.091.161.480.340.896.26
PSPH7802.34002.030002.01002.03001.6410.05
VAMP8211300.080.31-0.060.620.330.47-0.171.010.380.210.49-0.03-0.083.56
PPAT4780.10.010.310.011.2300.0100.140.01-0.1100.021.493.22
TM7SF312550.010.3300.0900.02000.041.020.030001.54
Exact LOD scores by family at chromosomal locations with the highest overall LOD score For all six traits, lm_multiple runs with 3 k scans provided better results than lm_markers runs with 30 k scans. Computation time for 3 k scans with lm_multiple was about one-third that of 30 k scans with lm_markers (Table 2). In particular, for VAMP8, all 10 lm_multiple runs were an almost perfect match to the exact LOD scores, whereas lm_markers runs with 30 k scans showed moderate run-to-run variation (Fig. 2). For all six traits, lm_multiple showed the smallest run-to-run variation of the LOD scores at the peak (Table 2) as well as elsewhere on the chromosome. Runs of lm_markers with 3 k scans were not much different and showed only slightly more variability from runs with 30 k scans.
Figure 2

Linkage analyses of . 10 lm_markers runs with 30 k scans (cyan, solid), 10 lm_multiple runs with 3 k scans (magenta, solid), and exact run (black, medium-dashed).

Linkage analyses of . 10 lm_markers runs with 30 k scans (cyan, solid), 10 lm_multiple runs with 3 k scans (magenta, solid), and exact run (black, medium-dashed).

LOD scores using more complex models for the four traits

More complex trait models lead to higher LOD scores than the 1Q model (Table 2). For GSTM1, the 1Q + P model provided the highest LOD scores (Fig. 1B), while for CHI3L2 and VAMP8, LOD scores for 1Q + UQ and 1Q + P models were almost identical (Fig. 1A, D). For CHI3L2, the model labeled as 1Q + UQ in Table 2 actually included a polygenic component, i.e., 1Q + UQ + P, which increased the run time significantly. In contrast, for PSPH, the 1Q + UQ model provided strange results, with LOD scores ranging from less than -3000 to 40 (Fig. 1C). This may be due to inaccurate estimation of the secondary QTL model: the combined genetic variance from the two QTLs exceeded the total genetic variance obtained from segregation analysis. For VAMP8, the 2Q model provided two peaks, of equal magnitude (Fig. 1D), resulting from the identical model for both QTLs.

Discussion

We performed several multipoint linkage analyses for quantitative traits: VC, Bayesian oligogenic, and parametric LOD score linkage analysis with 1Q, 1Q + P, 1Q + UQ, and 2Q models. We found that all of these analyses provided similar inferences about peak location and shape, with some advantage to using the 1Q + P and 1Q + UQ models over the 1Q model. Use of parametric LOD scores also provided insights into genetic heterogeneity of the traits, which was considerable. However, models for QTLs other than the primary QTL were difficult to estimate with the Bayesian approach for these gene expression traits, suggesting the need for better segregation analysis tools for estimating parameters of complex trait models. We were able to obtain reliable results for analysis with clustered SNPs with several newly-developed MCMC programs in MORGAN. We found that lm_multiple provided better estimates of LOD scores than lm_markers with fewer scans in less time although, in general, both programs performed well with only minor differences in the variability between runs. The MCMC performance obtained here is improved relative to our results for GAW14 [11]. Factors in this improvement likely include the use of sequential imputation to obtain starting configurations [12], less missing data, and different SNP marker maps, in addition to improved algorithms and software. Finally, although our goal here was to compare our developing MCMC-based methods, we advocate use of exact computation when this is practical. On small pedigrees, such as those used here, exact analysis with a 1Q model and lm_markers or with VC methods may be best initially since this is faster than MCMC analysis. Further analyses may use lm_twoqtl, if the evidence warrants it. However, on larger pedigrees, exact multipoint computation may not be possible, in which case these MCMC options are a viable and practical alternative.

Conclusion

We showed that MCMC-based programs from the MORGAN package provide accurate LOD scores for quantitative traits with SNP markers. The program lm_multiple gives more accurate results than lm_markers, and the program lm_twoqtl expands the trait models to include two loci plus a possible polygenic component.

List of Abbreviations

1Q: One QTL 1Q + P: One QTL plus a polygenic component 1Q + UQ: One linked QTL plus one unlinked QTL 2Q: Two linked QTL CEPH: Centre d'Etude du Polymorphisme Humain chr: chromosome GAW: Genetic Analysis Workshop h2: heritability MCMC: Markov chain Monte Carlo MLE: Maximum likelihood estimate QTL: Quantitative trait locus SNP: Single-nucleotide polymorphism VC: Variance components

Competing interests

The author(s) declare that they have no competing interests.
  10 in total

1.  Merlin--rapid analysis of dense genetic maps using sparse gene flow trees.

Authors:  Gonçalo R Abecasis; Stacey S Cherny; William O Cookson; Lon R Cardon
Journal:  Nat Genet       Date:  2001-12-03       Impact factor: 38.330

2.  Mapping determinants of human gene expression by regional and genome-wide association.

Authors:  Vivian G Cheung; Richard S Spielman; Kathryn G Ewens; Teresa M Weber; Michael Morley; Joshua T Burdick
Journal:  Nature       Date:  2005-10-27       Impact factor: 49.962

3.  MCMC-based linkage analysis for complex traits on general pedigrees: multipoint analysis with a two-locus model and a polygenic component.

Authors:  Yun Ju Sung; Elizabeth A Thompson; Ellen M Wijsman
Journal:  Genet Epidemiol       Date:  2007-02       Impact factor: 2.135

4.  Multipoint linkage analysis with many multiallelic or dense diallelic markers: Markov chain-Monte Carlo provides practical approaches for genome scans on general pedigrees.

Authors:  Ellen M Wijsman; Joseph H Rothstein; Elizabeth A Thompson
Journal:  Am J Hum Genet       Date:  2006-09-20       Impact factor: 11.025

5.  Multilocus lod scores in large pedigrees: combination of exact and approximate calculations.

Authors:  Liping Tong; Elizabeth Thompson
Journal:  Hum Hered       Date:  2007-10-12       Impact factor: 0.444

6.  Markov chain Monte Carlo segregation and linkage analysis for oligogenic models.

Authors:  S C Heath
Journal:  Am J Hum Genet       Date:  1997-09       Impact factor: 11.025

7.  Genetic analysis of genome-wide variation in human gene expression.

Authors:  Michael Morley; Cliona M Molony; Teresa M Weber; James L Devlin; Kathryn G Ewens; Richard S Spielman; Vivian G Cheung
Journal:  Nature       Date:  2004-07-21       Impact factor: 49.962

8.  Genetic investigation of quantitative traits related to autism: use of multivariate polygenic models with ascertainment adjustment.

Authors:  Yun Ju Sung; Geraldine Dawson; Jeffrey Munson; Annette Estes; Gerard D Schellenberg; Ellen M Wijsman
Journal:  Am J Hum Genet       Date:  2004-11-16       Impact factor: 11.025

9.  Comparison of marker types and map assumptions using Markov chain Monte Carlo-based linkage analysis of COGA data.

Authors:  Weiva Sieh; Saonli Basu; Audrey Q Fu; Joseph H Rothstein; Paul A Scheet; William C L Stewart; Yun J Sung; Elizabeth A Thompson; Ellen M Wijsman
Journal:  BMC Genet       Date:  2005-12-30       Impact factor: 2.797

10.  Data for Genetic Analysis Workshop (GAW) 15, Problem 1: genetics of gene expression variation in humans.

Authors:  Vivian G Cheung; Richard S Spielman
Journal:  BMC Proc       Date:  2007-12-18
  10 in total
  13 in total

1.  Computationally efficient multipoint linkage analysis on extended pedigrees for trait models with two contributing major Loci.

Authors:  Ming Su; Elizabeth A Thompson
Journal:  Genet Epidemiol       Date:  2012-06-27       Impact factor: 2.135

2.  Multilocus lod scores in large pedigrees: combination of exact and approximate calculations.

Authors:  Liping Tong; Elizabeth Thompson
Journal:  Hum Hered       Date:  2007-10-12       Impact factor: 0.444

3.  Clustering and principal-components approach based on heritability for mapping multiple gene expressions.

Authors:  Yuanjia Wang; Yixin Fang; Shuang Wang
Journal:  BMC Proc       Date:  2007-12-18

4.  Joint study of genetic regulators for expression traits related to breast cancer.

Authors:  Tian Zheng; Shuang Wang; Lei Cong; Yuejing Ding; Iuliana Ionita-Laza; Shaw-Hwa Lo
Journal:  BMC Proc       Date:  2007-12-18

5.  Extracting disease risk profiles from expression data for linkage analysis: application to prostate cancer.

Authors:  G Bryce Christensen; Lisa A Cannon-Albright; Alun Thomas; Nicola J Camp
Journal:  BMC Proc       Date:  2007-12-18

6.  Mapping of trans-acting regulatory factors from microarray data.

Authors:  Jeanette N McClintick; Yunlong Liu; Howard J Edenberg
Journal:  BMC Proc       Date:  2007-12-18

7.  The role of heritability in mapping expression quantitative trait loci.

Authors:  Song Huang; David Ballard; Hongyu Zhao
Journal:  BMC Proc       Date:  2007-12-18

8.  Artificial neural networks for linkage analysis of quantitative gene expression phenotypes and evaluation of gene x gene interactions.

Authors:  Ying Liu; Weimin Duan; Justin Paschall; Nancy L Saccone
Journal:  BMC Proc       Date:  2007-12-18

9.  Statistical corrections of linkage data suggest predominantly cis regulations of gene expression.

Authors:  Jianxin Shi; David O Siegmund; Douglas F Levinson
Journal:  BMC Proc       Date:  2007-12-18

10.  Controlling for false positive findings of trans-hubs in expression quantitative trait loci mapping.

Authors:  Jie Peng; Pei Wang; Hua Tang
Journal:  BMC Proc       Date:  2007-12-18
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.