Literature DB >> 16451698

Examining the effect of linkage disequilibrium on multipoint linkage analysis.

Qiqing Huang1, Sanjay Shete, Michael Swartz, Christopher I Amos.   

Abstract

Most linkage programs assume linkage equilibrium among multiple linked markers. This assumption may lead to bias for tightly linked markers where strong linkage disequilibrium (LD) exists. We used simulated data from Genetic Analysis Workshop 14 to examine the possible effect of LD on multipoint linkage analysis. Single-nucleotide polymorphism packets from a non-disease-related region that was generated with LD were used for both model-free and parametric linkage analyses. Results showed that high LD among markers can induce false-positive evidence of linkage for affected sib-pair analysis when parental data are missing. Bias can be eliminated with parental data and can be reduced when additional markers not in LD are included in the analyses.

Entities:  

Mesh:

Substances:

Year:  2005        PMID: 16451698      PMCID: PMC1866697          DOI: 10.1186/1471-2156-6-S1-S83

Source DB:  PubMed          Journal:  BMC Genet        ISSN: 1471-2156            Impact factor:   2.797


Background

Most multipoint linkage programs assume linkage equilibrium among the markers being studied. This assumption is appropriate for the study of sparsely spaced markers with inter-marker distances exceeding a few centimorgans, because linkage equilibrium is expected over these intervals for almost all populations. However, with recent advances in high-throughput genotyping technology, much denser markers are available and linkage disequilibrium (LD) may exist among the markers. Applying linkage analyses that assume linkage equilibrium to dense markers may lead to bias. It is well known that misspecification of allele frequencies can cause inflation of LOD scores for both model-free [1] and model based [2,3] linkage approaches. However, estimating allele frequencies from the available data will generally correct this problem [4]. Rare exceptions such as unrecognized inbreeding at a high level or the presence of pronounced stratification might cause an excess of false-positive rates for linkage tests when only affected sib-pairs lacking parents are analyzed [5]. In the case of tightly linked loci, assuming linkage equilibrium for tightly linked markers causes incorrect inference of haplotype frequencies, which can lead to a bias similar to that induced by misspecification of allele frequencies for multi-allelic markers. However, accurately estimating haplotype frequencies is more difficult than estimating allele frequencies because of phase uncertainty. Many currently available programs such as ALLEGRO and GENEHUNTER do not allow the user to specify haplotype frequencies, while programs that will allow the user to specify haplotypes, including LINKAGE and LIPPED are very unwieldy to use in this case. Recently, Huang et al. [6] demonstrated that assuming linkage equilibrium between tight linked markers where strong LD exists may cause apparent over-sharing of multipoint IBD among affected sibs and thus result in false-positive evidence for linkage. Here in this workshop, Genetic Analysis Workshop 14 (GAW14), we used the simulated data to further explore the effect that LD exerts in causing an excess of false-positive results. The workshop data afforded a more realistic situation upon which to study effects of LD than was covered by Huang et al. [6], because the data were simulated to represent a complex disease model and a large set of markers were available for further examination of the possible effects that LD can have upon multipoint linkage analysis.

Methods

In order to examine the possible effect of LD on linkage analysis, we decided to study the markers from a dense marker dataset, because the inter-marker distances are smaller and the simulated LD was higher. Single-nucleotide polymorphism (SNP) packets from the non-disease related regions that were generated with LD were bought and used for the analyses. The inter-marker distance was 0.29 cM on average among these markers (20 SNPs per packet). Pedigree samples from the Aipotu population of simulated GAW14 data were used for the analyses. There were 100 nuclear families in the replicate sample and at least two sibs were affected with Kofendrerd Personality Disorder (KPD) in each family. We treated parents from each family as unrelated individuals and used them to estimate haplotype frequencies and LD. Haplotype frequencies were estimated by using the expectation maximization algorithm [7] and pair-wise LD was calculated by using standard formula [8] that are implemented in the EMLD program. We randomly selected a single sib pair from each family to ensure independence of the sib pairs. We then studied each family either including or excluding all parental genotype data. Multipoint and single-point linkage analyses of the affected sib-pair data were carried out using ALLEGRO [9]. For model-free multipoint linkage analyses, we used a Kong and Cox exponential model [10] and the score function of Spairs [11]. For the parametric linkage analyses, we assumed a simple dominant disease model with 100% penetrance in carriers and 0% penetrance in non-carriers, and we incorporated a heterogeneity parameter [12], thus allowing some but not all families to be linked.

Results

Although all the SNP packets that we examined were from regions that were generated with LD, LD was not strong in most of the regions and did not have an obvious effect on linkage analysis. However, strong LD existed between three markers in SNP packet 121: B03T2407, B03T2408, and C03R0221 with pair-wise D' > 0.95 and r2 > 0.38. The pair-wise LD as measured by D' and r2 for this packet is shown in Table 1.
Table 1

Pair-wise LD between 20 SNPs of SNP packet 121 in sample replicate 1 from Aipotu population (D' measure above the diagonal and r2 below the diagonal).

1234567891011121314151617181920
1--0.6530.2780.1300.2110.2340.0850.2140.3580.1540.0040.1380.0930.2540.0530.3920.5460.0680.0570.163
20.280--0.3090.0620.1410.2060.2270.3690.4930.0480.1090.0210.0530.0230.0490.1150.1510.1270.2280.116
30.0400.075--0.2720.1030.0180.2180.0640.1600.3490.1980.0830.1340.3930.1770.0470.3320.0460.4500.227
40.0110.0010.025--0.2230.0160.1220.7720.6900.4750.3970.1030.0840.3750.2720.1620.3710.4800.3850.051
50.0050.0030.0020.003--1.0000.4080.5240.3250.0010.0950.0000.0120.1210.0620.2250.0580.0260.0060.010
60.0270.0290.0000.0000.210--0.0810.4270.3650.2240.1960.2640.2120.1130.1940.1210.0930.0480.4080.098
70.0030.0340.0480.0040.0410.006--1.0001.0000.2160.0520.1390.2410.0050.0190.1690.1400.0680.0850.117
80.0090.0390.0020.0740.1520.0690.442 a --1.0000.0230.1020.1090.0070.1970.0230.0590.0450.1430.0070.057
90.0200.0590.0080.0490.0700.0420.3960.837--0.0290.0880.1810.1180.2190.0790.0800.1060.1690.1330.084
100.0150.0010.0340.0210.0000.0160.0130.0000.001--0.0880.5230.4900.3180.0370.1430.5040.0140.1400.114
110.0000.0060.0290.0710.0030.0280.0020.0030.0020.002--0.4240.2840.3390.0730.0490.1930.0340.3450.009
120.0090.0000.0060.0030.0000.0650.0180.0050.0110.0830.122--0.9870.0170.0240.0940.1080.0540.2090.025
130.0030.0030.0150.0020.0000.0320.0480.0000.0040.0540.0410.728--0.0230.1050.0530.0910.0830.1440.009
140.0270.0000.1260.0390.0030.0110.0000.0170.0180.0340.0700.0000.000--0.1360.0530.0820.0690.1820.008
150.0000.0010.0070.0050.0000.0090.0000.0000.0000.0010.0010.0000.0030.005--0.0360.7170.0340.4150.245
160.0450.0060.0020.0050.0070.0090.0190.0020.0030.0040.0010.0050.0020.0020.000--0.0390.1700.1930.001
170.0260.0030.0190.0080.0000.0020.0030.0010.0060.0140.0100.0020.0020.0010.0210.000--0.3000.8600.186
180.0010.0050.0010.0320.0000.0010.0020.0180.0210.0000.0010.0010.0040.0020.0010.0080.036--0.0950.091
190.0000.0040.0090.0020.0000.0080.0000.0000.0000.0030.0040.0020.0010.0020.0020.0010.0070.001--0.010
200.0190.0120.0360.0010.0000.0070.0100.0020.0040.0030.0000.0000.0000.0000.0210.0000.0040.0060.000--

aBold text indicates the three markers that are in strong linkage disequilibrium.

Single-point linkage analysis did not show any evidence of linkage both for the three markers in strong LD alone and for the whole marker set (Fig. 1). However, using the three markers that are in strong LD and affected sib-pair only data, multipoint linkage analysis showed false-positive evidence of linkage for both model-free and parametric linkage analyses that incorporated a heterogeneity parameter (Fig. 1). This confirmed the observation by Huang et al. [6]. Including parents in the multipoint analysis eliminated the false-positive evidence (data not shown). The false-positive evidence induced by LD can be gradually reduced by adding markers that are not in LD to either or both sides of the three core markers that are in strong LD, and it seemed a better "rescue" effect can be achieved by adding markers to both sides than to a single side (Table 2). With all 20 markers, there is no evidence of linkage (maximal LOD score at the peak position: 0.34 ± 0.2).
Figure 1

Linkage analysis results for the 20 SNPs and the three SNPs with strong LD. The left panel indicates results using a nonparametric NPL approach, while the right panel indicates results from a parametric linkage analysis allowing for locus heterogeneity.

Table 2

Multipoint LOD scores for different set of markers from model-free linkage analysis.

MarkerMultipoint LOD scores at the marker location from model-free linkage analysesb
B03T2401--------------0.48
B03T2402--------------0.46
B03T2403--------------0.42
B03T2404--------------0.36
B03T2405----1.06------0.690.35
B03T2406--1.061.08----0.780.730.33
B03T2407 a 2.051.131.161.741.420.850.830.34
B03T24082.061.151.181.751.430.860.850.37
C03R02212.091.181.221.741.420.870.840.46
B03T2410------1.721.400.860.820.47
B03T2411--------1.39--0.800.44
B03T2412--------------0.34
B03T2413--------------0.19
B03T2414--------------0.17
B03T2415--------------0.14
B03T2416--------------0.11
B03T2417--------------0.11
B03T2418--------------0.09
C03R0222--------------0.08
B03T2420--------------0.08

aMarkers in strong LD are indicated by bold.

bColumns reflect results from varying multipoint analyses including markers as indicated.

Conclusion

For multipoint linkage analysis of affected sib-pair data, for which parental phase information is inferred from the sib pairs, usual methods of linkage analysis assume linkage equilibrium between multiple linked markers and assigns equal probabilities to all possible phases. This assumption can cause overestimation of multipoint identity by decent (IBD) sharing and induces false positives for both model-free and parametric linkage analysis, as showed by Huang et al. [6]. This study further confirmed this observation by studying independently generated data that were simulated to reflect conditions that might be found in a genome scan. Among the markers that we studied, false-positive evidence for linkage was only obtained for a small subset of markers that showed high LD. We also showed here that including markers that are not in LD can reduce the false-positive evidence of linkage induced by markers in high LD. This indicated that including markers that are not in strong LD ensures that the haplotype frequencies are closer to those expected under the linkage equilibrium assumption and thus may help to reduce false-positive linkage findings. We also found that the LD effect is severe only when the majority of the markers being jointly examined are in strong LD. Single-point linkage analysis is not affected by LD. Therefore, given the relatively accurate allele frequencies that can readily be obtained for single marker, single-point linkage analysis can be used as a check for any suspicious false positives by comparing results to multipoint analysis. However, when a very large number of SNPs are studied, a possibility remains that allele frequency estimates for individual SNPs might be biased perhaps either by unrecognized strong stratification in the sample or by nonrandom errors introduced during processing. A potential further check is the confirmation of linkage at multiple SNPs in a region, as well as absence of linkage signal for most of the remainder of the genome. With current advances in high-throughput genotyping technology, high density marker data are easily generated. Caution must be taken when applying traditional linkage analysis to dense markers where strong LD may exist. Our results indicate that LD among tightly linked marker should be examined, especially in the fine-mapping stage where strong LD is likely to exist between the markers. Markers that are in strong LD should not be used together for linkage analysis in order to avoid possible false positives. An alternative approach is to modify current linkage programs to allow for LD so that all marker information can be used in the search for a disease-related region.

Abbreviations

GAW14: Genetic Analysis Workshop 14 IBD: Identity by descent KPD: Kofendrerd Personality Disorder LD: Linkage disequilibrium SNP: Single-nucleotide polymorphism

Authors' contributions

QH did the analysis and prepared the manuscript. MS assisted in the development of data for this project and performed analysis of the LD patterns of simulated data, and also presented the results at the Genetic Analysis Workshop. SS provided guidance in concept development. CIA directed the project and revised the manuscript.
  11 in total

1.  Allegro, a new computer program for multipoint linkage analysis.

Authors:  D F Gudbjartsson; K Jonasson; M L Frigge; A Kong
Journal:  Nat Genet       Date:  2000-05       Impact factor: 38.330

2.  Ignoring linkage disequilibrium among tightly linked markers induces false-positive evidence of linkage for affected sib pair analysis.

Authors:  Qiqing Huang; Sanjay Shete; Christopher I Amos
Journal:  Am J Hum Genet       Date:  2004-10-18       Impact factor: 11.025

3.  Strategies for characterizing highly polymorphic markers in human gene mapping.

Authors:  J Ott
Journal:  Am J Hum Genet       Date:  1992-08       Impact factor: 11.025

4.  A comparison of linkage disequilibrium measures for fine-scale mapping.

Authors:  B Devlin; N Risch
Journal:  Genomics       Date:  1995-09-20       Impact factor: 5.736

5.  Caution in the interpretation of MLS.

Authors:  S Eichenbaum-Voline; E Génin; M C Babron; P Margaritte-Jeannin; B Prum; F Clerget-Darpoux
Journal:  Genet Epidemiol       Date:  1997       Impact factor: 2.135

6.  Allele-sharing models: LOD scores and accurate linkage tests.

Authors:  A Kong; N J Cox
Journal:  Am J Hum Genet       Date:  1997-11       Impact factor: 11.025

7.  Guess LOD approach: sufficient conditions for robustness.

Authors:  J A Williamson; C I Amos
Journal:  Genet Epidemiol       Date:  1995       Impact factor: 2.135

8.  Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population.

Authors:  L Excoffier; M Slatkin
Journal:  Mol Biol Evol       Date:  1995-09       Impact factor: 16.240

9.  Incorrect specification of marker allele frequencies: effects on linkage analysis.

Authors:  N B Freimer; L A Sandkuijl; S M Blower
Journal:  Am J Hum Genet       Date:  1993-06       Impact factor: 11.025

10.  A class of tests for linkage using affected pedigree members.

Authors:  A S Whittemore; J Halpern
Journal:  Biometrics       Date:  1994-03       Impact factor: 2.571

View more
  13 in total

Review 1.  Linkage analysis in the next-generation sequencing era.

Authors:  Joan E Bailey-Wilson; Alexander F Wilson
Journal:  Hum Hered       Date:  2011-12-23       Impact factor: 0.444

2.  The affected-/discordant-sib-pair design can guarantee validity of multipoint model-free linkage analysis of incomplete pedigrees when there is marker-marker disequilibrium.

Authors:  Chao Xing; Ritwik Sinha; Guan Xing; Qing Lu; Robert C Elston
Journal:  Am J Hum Genet       Date:  2006-06-26       Impact factor: 11.025

3.  Examining the effect of linkage disequilibrium between markers on the Type I error rate and power of nonparametric multipoint linkage analysis of two-generation and multigenerational pedigrees in the presence of missing genotype data.

Authors:  Yoonhee Kim; Priya Duggal; Elizabeth M Gillanders; Ho Kim; Joan E Bailey-Wilson
Journal:  Genet Epidemiol       Date:  2008-01       Impact factor: 2.135

4.  Ignoring intermarker linkage disequilibrium induces false-positive evidence of linkage for consanguineous pedigrees when genotype data is missing for any pedigree member.

Authors:  Bingshan Li; Suzanne M Leal
Journal:  Hum Hered       Date:  2007-12-11       Impact factor: 0.444

5.  Spatiotemporal dynamics of gene flow and hybrid fitness between the M and S forms of the malaria mosquito, Anopheles gambiae.

Authors:  Yoosook Lee; Clare D Marsden; Laura C Norris; Travis C Collier; Bradley J Main; Abdrahamane Fofana; Anthony J Cornel; Gregory C Lanzaro
Journal:  Proc Natl Acad Sci U S A       Date:  2013-11-18       Impact factor: 11.205

6.  Evaluation of approaches to identify associated SNPs that explain the linkage evidence in nuclear families with affected siblings.

Authors:  Ming-Huei Chen; Paul Van Eerdewegh; Quentin B Vincent; Alexandre Alcais; Laurent Abel; Josée Dupuis
Journal:  Hum Hered       Date:  2009-12-04       Impact factor: 0.444

7.  Collapsed haplotype pattern method for linkage analysis of next-generation sequence data.

Authors:  Gao T Wang; Di Zhang; Biao Li; Hang Dai; Suzanne M Leal
Journal:  Eur J Hum Genet       Date:  2015-04-15       Impact factor: 4.246

8.  Adaptive introgression in an African malaria mosquito coincident with the increased usage of insecticide-treated bed nets.

Authors:  Laura C Norris; Bradley J Main; Yoosook Lee; Travis C Collier; Abdrahamane Fofana; Anthony J Cornel; Gregory C Lanzaro
Journal:  Proc Natl Acad Sci U S A       Date:  2015-01-05       Impact factor: 11.205

9.  Genetic analysis of 103 candidate genes for coronary artery disease and associated phenotypes in a founder population reveals a new association between endothelin-1 and high-density lipoprotein cholesterol.

Authors:  Guillaume Pare; David Serre; Diane Brisson; Sonia S Anand; Alexandre Montpetit; Gerald Tremblay; James C Engert; Thomas J Hudson; Daniel Gaudet
Journal:  Am J Hum Genet       Date:  2007-02-21       Impact factor: 11.025

10.  Mapping a new spontaneous preterm birth susceptibility gene, IGF1R, using linkage, haplotype sharing, and association analysis.

Authors:  Ritva Haataja; Minna K Karjalainen; Aino Luukkonen; Kari Teramo; Hilkka Puttonen; Marja Ojaniemi; Teppo Varilo; Bimal P Chaudhari; Jevon Plunkett; Jeffrey C Murray; Steven A McCarroll; Leena Peltonen; Louis J Muglia; Aarno Palotie; Mikko Hallman
Journal:  PLoS Genet       Date:  2011-02-03       Impact factor: 5.917

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.