Literature DB >> 16451629

Investigation of altering single-nucleotide polymorphism density on the power to detect trait loci and frequency of false positive in nonparametric linkage analyses of qualitative traits.

Alison P Klein1, Ya-Yu Tsai, Priya Duggal, Elizabeth M Gillanders, Michael Barnhart, Rasika A Mathias, Ian P Dusenberry, Amy Turiff, Peter S Chines, Janet Goldstein, Robert Wojciechowski, Wayne Hening, Elizabeth W Pugh, Joan E Bailey-Wilson.   

Abstract

Genome-wide linkage analysis using microsatellite markers has been successful in the identification of numerous Mendelian and complex disease loci. The recent availability of high-density single-nucleotide polymorphism (SNP) maps provides a potentially more powerful option. Using the simulated and Collaborative Study on the Genetics of Alcoholism (COGA) datasets from the Genetics Analysis Workshop 14 (GAW14), we examined how altering the density of SNP marker sets impacted the overall information content, the power to detect trait loci, and the number of false positive results. For the simulated data we used SNP maps with density of 0.3 cM, 1 cM, 2 cM, and 3 cM. For the COGA data we combined the marker sets from Illumina and Affymetrix to create a map with average density of 0.25 cM and then, using a sub-sample of these markers, created maps with density of 0.3 cM, 0.6 cM, 1 cM, 2 cM, and 3 cM. For each marker set, multipoint linkage analysis using MERLIN was performed for both dominant and recessive traits derived from marker loci. Our results showed that information content increased with increased map density. For the homogeneous, completely penetrant traits we created, there was only a modest difference in ability to detect trait loci. Additionally, as map density increased there was only a slight increase in the number of false positive results when there was linkage disequilibrium (LD) between markers. The presence of LD between markers may have led to an increased number of false positive regions but no clear relationship between regions of high LD and locations of false positive linkage signals was observed.

Entities:  

Mesh:

Year:  2005        PMID: 16451629      PMCID: PMC1866766          DOI: 10.1186/1471-2156-6-S1-S20

Source DB:  PubMed          Journal:  BMC Genet        ISSN: 1471-2156            Impact factor:   2.797


Background

Genome-wide linkage analysis using microsatellite markers has been successful in the identification of numerous Mendelian and complex disease loci. Recently available high-density single-nucleotide polymorphism (SNP) maps theoretically provide greater information content (IC), which should help to both identify and narrow linkage regions. This is supported by a few published reports comparing genome-wide linkage analysis using microsatellites to studies of the same dataset using dense SNP maps [1,2]. Yet questions remain about the optimal density of SNP marker sets for linkage studies. Additionally, current algorithms for linkage analysis assume that adjacent markers are in linkage equilibrium. However, there may be significant linkage disequilibrium (LD) between adjacent markers in dense SNP marker sets, which can lead to false positive results [3]. To explore these issues we used the simulated and Collaborative Study on the Genetics of Alcoholism (COGA) datasets to examine how altering the SNP density impacted the overall IC, the power to detect trait loci, and the number of false positive results. We compared these results to analyses performed using microsatellite markers.

Methods

Simulated data

Analyses were performed (separately for each population and replicate) using all replicates of the Aiputo, Danaca, and Karanga populations. The full marker sets for both the MS (7.5 cM) and SNP (3 cM) maps were used. Additional fine mapping markers were purchased for chromosomes 8 and 9 (packets 400–406 and 416–419) to increase the density of the SNPs (0.3 cM). We had knowledge of the answers.

Trait definition (simulated)

Dominant or recessive traits were created using these marker loci: B08T8044, B08T8045, B08T8050, and B08T8051. Affection status for a dominant trait was defined as individuals with ≥ 1 copy of allele 1 at the marker and for a recessive trait as individuals with 2 copies of allele 1.

COGA data

Using a perl script, we created an interpolated genetic map that used MS markers from the deCode map and SNPs from both Illumina and Affymetrix. For each SNP, 2 MS markers from the deCode map were identified that flanked the SNP using the physical positions of these markers obtained from sequence build 34. From the physical and genetic position of the 2 flanking microsatellites and assuming a linear interpolation between the markers, the genetic position of the SNP was determined. Any MS or SNP without a physical position was removed. If SNP markers mapped to the same genetic location, the SNP with the largest physical location was kept.

Trait definition (COGA)

The following markers (and risk alleles) were used to create a dominant and/or a recessive trait: rs0041510 (allele 2), tsc2832191 (allele 1), tsc0061481 (allele 1). To avoid errors due to differences in allele frequencies between ethnic groups, analysis was limited to the white/non-Hispanic families, which comprised the largest ethnic subgroup.

Creation of SNP maps

Using a perl script, we selected a subset of the SNP markers to create maps that were less dense. Our goal was to select markers with desired inter-marker distances. To avoid tight clusters of markers, we moved at least the desired distance minus 10% of that distance before another marker was selected. If there were multiple markers within ± 10% of the desired distance, the marker with the major allele frequency (MAF) closest to 0.5 was selected. For example, for the 0.3-cM map, markers were forced to be at least 0.27 cM apart, and if there were multiple markers located between 0.27 cM and 0.33 cM from the last marker, the marker with the MAF closest to 0.5 was selected.

Statistical analysis

We used the analysis program MERLIN for all linkage analyses [4]. Allele frequencies were estimated from all founders. Kong and Cox LOD scores [5] and the associated p-values for Whittemore and Halpern's NPLAll [6] statistic were used for the analysis of qualitative traits. Entropy, a measure of IC, was used. Multipoint evaluation was performed at each of the marker loci (between-marker evaluations were not performed). For the evaluation of power and type I error we used 4 standard p-value thresholds (0.05, 0.01, 0.001, and 0.0001) and 2 Lander-Krugylak [7] genome-wide significance levels. We calculated power as the number of replicates with a p-value less than the threshold within a 20 cM region (10 cM in either direction) of the trait loci. To assess the frequency of false positive results, we counted the number of regions where a p-value less than the above-mentioned cut-off occurred on chromosomes not containing the trait loci. In order to ensure that adjacent makers with p-values below the given level were not counted as multiple false positive results, a region with a p-value greater than or equal to 0.2 was required to occur between two false positive regions.

Results

Table 1 presents the results of our comparison of the IC for the various map densities. In the simulated data, the average IC of the MS map was 0.934. There is a loss in information when we compared the 3-cM SNP map (0.833) to the MS map. Conversely, a very dense SNP map showed a modest increase in IC (0.986); the mean IC was highest in the very dense (0.3 cM) SNP map (0.986). In the COGA dataset IC increased with increasing map density and was lowest in the MS marker set. The overall IC was a bit lower in the COGA data; this could be due in part to the presence of missing data in the COGA dataset or overall marker heterozygosity. Note that the MS map in the COGA dataset (13.6 cM) is less dense than the MS map in the simulated dataset (7.5 cM).
Table 1

Information content

Marker setNumber of marker in mapMean minimuma (SD)Overall meana (SD)Mean maximuma (SD)
SimulatedMS (~7.5 cM)4160.812 (0.077)0.934 (0.004)0.9724 (0.005)
SNP 3 cM9170.644 (0.084)0.833 (0.015)0.914 (0.010)
SNP 1 cMb340.849 (0.018)0.937 (0.006)0.969 (0.006)
SNP 0.3 cMb2010.933 (0.013)0.986 (0.001)0.998 (0.001)
COGAMS (~13.5 cM)3150.586 (0.078)0.744 (0.060)0.840 (0.064)
SNP 3 cM11030.674 (0.076)0.747 (0.012)0.820 (0.015)
SNP 2 cM17920.566 (0.074)0.767 (0.010)0.825 (0.010)
SNP 1 cM23820.692 (0.055)0.868 (0.008)0.910 (0.007)
SNP 0.6 cM36710.724 (0.059)0.895 (0.006)0.930 (0.006)
SNP 0.3 cM54050.751 (0.062)0.916 (0.005)0.943 (0.006)
SNP 0.25 cM150150.825 (0.046)0.939 (0.005)0.955 (0.003)

aOverall mean, average minimum, and average maximum information content across all 3 populations and replicates for simulated data and across all 22 chromosomes for COGA data.

bThe SNP 1 cM and SNP 0.3 cM map for the simulated data are based only on the regions for which fine mapping markers were purchased.

There was a modest increase in power with increasing SNP map density in the simulated data (Table 2). Power was greatest for the 0.3-cM density. Power for the MS map seemed to fall between the 1 cM and 3 cM SNP map. Overall power was quite low when we used a genome-wide significance level of 0.000049. However, in the COGA dataset (Table 3) there were less consistent trends in the ability to detect the trait loci as map density increased. In fact, the denser maps sometimes gave smaller LOD scores as compared with less dense maps (e.g., Drs0041510). This could be due to errors in marker order or inter-marker distance for the denser map sets. It is important to note that our created traits were homogenous and had complete penetrance, and thus overall power was very high, possibility masking any true variations in power due to differences in map density. For all map sets disease frequencies had a large impact on power. Additionally, given we only performed analysis at the marker loci and not between marker loci, we cannot evaluate if denser maps yielded smaller confidence intervals for the linkage peaks because 1-LOD confidence intervals are dependant upon the density of analytic evaluations.
Table 2

Power in simulated data

TraitMarker setPop. dz. freq.Percentage of replicates with p-value belowa

0.050.010.00170.0010.00010.000049
Dominant
 D8044MS 7.5 cM0.060.960.890.670.570.270.16
SNP 3 cM0.950.860.600.510.180.12
SNP 1 cM0.980.900.720.630.290.21
SNP 0.3 cM0.960.920.770.690.370.27
 D8050MS 7.5 cM0.181.001.000.990.990.970.94
SNP 3 cM1.001.000.990.990.930.91
SNP 1 cM1.001.001.001.000.980.98
SNP 0.3 cM0.990.990.990.990.970.96
 D8051MS 7.5 cM0.500.990.950.880.830.620.54
SNP 3 cM0.990.950.850.810.590.51
SNP 1 cM1.000.970.900.880.700.61
SNP 0.3 cM1.000.980.920.90.770.70
Recessive
 R8045MS 7.5 cM0.080.990.930.740.670.320.22
SNP 3 cM0.980.890.690.610.240.15
SNP 1 cM0.990.950.780.730.360.26
SNP 0.3 cM0.980.950.840.780.440.34
 R8050MS 7.5 cM0.010.460.100.005000
SNP 3 cM0.410.050000
SNP 1 cM0.790.450.130.00400
SNP 0.3 cM0.470.130.0050.00500
 R8051MS 7.5 cM0.221.001.000.960.940.830.78
SNP 3 cM1.001.000.940.920.810.74
SNP 1 cM1.001.000.980.980.880.85
SNP 0.3 cM0.990.990.990.980.900.88

aPercentage of replicates with p-value below the following criteria within a 20 cM range of the given "true" trait locus. The results were summarized across the 3 simulated populations. Each population was analyzed separately.

Table 3

Power in COGA data

TraitMarker setDz. freq.LODMinimum P-valuea
Dominant
 Drs0041510MS0.143.10.00008
SNP 3 cM3.80.00001
SNP 2 cM2.80.0002
SNP 1 cM4.10.00001
SNP 0.6 cM3.00.00008
SNP 0.3 cM3.90.00001
SNP 0.254.10.00001
 Dtsc0061481MS0.311.00.02b
SNP 3 cM5.7<0.00001
SNP 2 cM5.0<0.00001
SNP 1 cM5.9<0.00001
SNP 0.6 cM6.1<0.00001
SNP 0.3 cM6.3<0.00001
SNP 0.256.1<0.00001
Recessive
 Rtsc0061481MS0.030.810.03b
SNP 3 cM1.650.003
SNP 2 cM1.700.003
SNP 1 cM1.680.002
SNP 0.6 cM1.680.003
SNP 0.3 cM1.680.002
SNP 0.251.790.002
 Rtsc2832191MS0.226.0<0.00001
SNP 3 cM4.5<0.00001
SNP 2 cM6.2<0.00001
SNP 1 cM6.7<0.00001
SNP 0.6 cM7.4<0.00001
SNP 0.3 cM7.5<0.00001
SNP 0.253.6<0.00003

aMinimum p-value within 20 cM of the "true" trait locus

bMarker D13S325 located about 12.3 cM from the trait loci gave a p-value of 0.0004 for trait Dtsc0061481 and a p-value of 0.004 for trait Rtsc0061481

The number of false positive linkages (p-value below a given level in a region unlinked to the trait loci) for the simulated data is in Tables 4 and 5. When we compare the results for the 3-cM SNP map to the MS map or the 0.3 cM to the 1-cM SNP map, the number of false positive results remains similar. Although the 0.3-cM map has a slight increase in the number of false positive results compared to the 1-cM map, it is hard to interpret this because such a dense map was only available in one 18-cM region. We also examined the number of false positive regions for each of the traits in the COGA dataset (Table 6) by tabulating significant linkages on 18 unlinked chromosomes. Overall, the number of false positive regions at the 0.05 level was greater in the combined 0.25-cM SNP map than it was in the less dense maps. At the more stringent p-value levels there were only a few false positive results, and no false positives were observed for any of the traits at genome-wide significant p-values (0.000049) [7].
Table 4

Type I error count in simulated data for full dataset

TraitMarker setPop. dz. freq.# of replicates with dataaMean number of false positives below p-value criterion ofb

0.050.010.00170.0010.00010.000049
Dominant
 D8044MS 7.5 cM0.063008.101.920.180.0800
SNP 3 cM7.731.790.160.0900
 D8050MS 7.5 cM0.183007.631.820.530.380.030.02
SNP 3 cM7.332.110.470.300.030.02
 D8051MS 7.5 cM0.503007.482.560.450.300.040.02
SNP 3 cM7.222.190.450.300.040.02
Recessive
 R8045MS 7.5 cM0.083008.592.360.290.160.010
SNP 3 cM8.112.230.250.140.010
 R8050MS 7.5 cM0.012292.660.060000
SNP 3 cM2.200.030000
 R8051MS 7.5 cM0.223007.572.020.390.270.030.02
SNP 3 cM7.442.030.470.340.040.01

aFor rare disease not all replicates contained informative pedigrees.

bMean number of false positive regions in the 9 unlinked chromosomes per replicate with p-value below the following criteria.

Table 5

Type I error count in densely mapped simulated data

TraitMarker setPop. dz. freq.# of replicates with dataMean number of false positives below p-value criterion ofa

0.050.010.00170.0010.00010.000049
Dominant
 D8044SNP 1 cM0.063000.1570.0630000
SNP 0.3 cM0.1770.0530.0100.00700
 D8050SNP 1 cM0.183000.1460.0030.0070.0070.0030
SNP 0.3 cM0.180.0370.0130.0070.0030
 D8051SNP 1 cM0.503000.2300.0570.020.01300
SNP 0.3 cM0.2800.1000.020.01700
Recessive
 R8045SNP 1 cM0.083000.1130.0370.0070.00300
SNP 0.3 cM0.1470.0500.003000
 R8050SNP 1 cM0.012290.0390.0040000
SNP 0.3 cM0.0370.0040000
 R8051SNP 1 cM0.223000.1600.0600.0200.0200.0070.007
SNP 0.3 cM0.1830.0630.0200.0030.0070.007

aMean number of false positive results in the ~18 cM unlinked region per replicate with p-value below the following criteria.

Table 6

Type I error in COGA data

TraitSNP SetDz. freq.Number of false positive below p-value criterion ofa

0.050.010.00170.0010.00010.000049
Dominant
 Drs0041510MS0.14722200
SNP 3 cM1031000
SNP 2 cM710000
SNP 1 cM1521100
SNP 0.6 cM1251100
SNP 0.3 cM1452100
SNP 0.251882100
 Dtsc0061481MS0.31611000
SNP 3 cM842100
SNP 2 cM1262100
SNP 1 cM1770000
SNP 0.6 cM1662200
SNP 0.3 cM1571100
SNP 0.252493100
Recessive
 Rtsc0061581MS0.03800000
SNP 3 cM700000
SNP 2 cM700000
SNP 1 cM900000
SNP 0.6 cM1000000
SNP 0.3 cM900000
SNP 0.251310000
 Rtsc2832191MS0.22642000
SNP 3 cM1031000
SNP 2 cM1232200
SNP 1 cM1341100
SNP 0.6 cM1462100
SNP 0.3 cM1661110
SNP 0.252362200

aNumber of false positive regions across the 18 unlinked chromosomes with p-value below the following criteria.

Conclusion

Overall, IC was higher for the dense SNP maps as compared with the less dense SNP and MS maps. In the simulated data, there was a modest increase in power with increasing SNP map density. However in the COGA data, no consistent trends were observed in our ability to detect trait loci with increasing map density. There was variation in the LOD scores across maps, with more dense maps sometimes yielding lower LOD scores. This could be due to errors in map order and supports the need for precise genetic maps when using dense SNP maps for linkage. Unsurprisingly, power was dependent on disease prevalence for these homogeneous, completely penetrant traits. In the simulated data, in which there was no significant LD between markers, the number of false positives did not increase with increasing map density. In the COGA data, more false positives were observed for the densest map set, 0.25 cM, in which there was significant intermarker LD. Huang et al. [3] reported that the presence of intermarker LD caused an increase in false positives, particularly when there is missing parental data. This is of particular concern because others have reported that SNPs are more powerful than microsatellites when there is missing parental data. To examine this, we calculated the LD between all SNPs up to 500 kb apart. Twenty-one percent of all pairwise SNPs had a D' > 0.70 (high LD). Of those SNPs with a D' > 0.70, 89% were <200 kb apart, 9% were 200–400 kb apart and 2% >400 kb apart. The LD between SNPs diminished as distance increased, suggesting maps with an average marker distance >200 kb would have limited intermarker LD. Comprehensive review of the locations of all type I errors observed for two of these traits (created from marker tsc006148 on chromosome 13) showed that while 90% of these regions contained markers exhibiting LD, the LD patterns in these regions did not differ markedly from the LD on the remainder of the chromosomes. Interestingly, 20% of the false positives occurred at the telomeres of chromosomes. While some of the increases in numbers of type I errors could be due to increased intermarker LD in the densest maps, they could also be caused by the fact that more evaluations of linkage were performed for the dense maps, since we evaluated linkage at each marker location and did not perform any intermarker evaluations. Thus, the densest map had the largest number of linkage tests performed (see Table 1), so increased type I errors could be due to LD or to increased tests.

Abbreviations

COGA: Collaborative Study of the Genetics of Alcoholism GAW14: Genetic Analysis Workshop 14 IC: Information content LD: Linkage disequilibrium MAF: Major allele frequency MS: Microsatellite SNP: Single-nucleotide polymorphism
  7 in total

1.  Merlin--rapid analysis of dense genetic maps using sparse gene flow trees.

Authors:  Gonçalo R Abecasis; Stacey S Cherny; William O Cookson; Lon R Cardon
Journal:  Nat Genet       Date:  2001-12-03       Impact factor: 38.330

2.  Ignoring linkage disequilibrium among tightly linked markers induces false-positive evidence of linkage for affected sib pair analysis.

Authors:  Qiqing Huang; Sanjay Shete; Christopher I Amos
Journal:  Am J Hum Genet       Date:  2004-10-18       Impact factor: 11.025

3.  Allele-sharing models: LOD scores and accurate linkage tests.

Authors:  A Kong; N J Cox
Journal:  Am J Hum Genet       Date:  1997-11       Impact factor: 11.025

4.  Genetic dissection of complex traits: guidelines for interpreting and reporting linkage results.

Authors:  E Lander; L Kruglyak
Journal:  Nat Genet       Date:  1995-11       Impact factor: 38.330

5.  A class of tests for linkage using affected pedigree members.

Authors:  A S Whittemore; J Halpern
Journal:  Biometrics       Date:  1994-03       Impact factor: 2.571

6.  Comparison of microsatellites versus single-nucleotide polymorphisms in a genome linkage screen for prostate cancer-susceptibility Loci.

Authors:  Daniel J Schaid; Jennifer C Guenther; Gerald B Christensen; Scott Hebbring; Carsten Rosenow; Christopher A Hilker; Shannon K McDonnell; Julie M Cunningham; Susan L Slager; Michael L Blute; Stephen N Thibodeau
Journal:  Am J Hum Genet       Date:  2004-10-08       Impact factor: 11.025

7.  Whole-genome scan, in a complex disease, using 11,245 single-nucleotide polymorphisms: comparison with microsatellites.

Authors:  Sally John; Neil Shephard; Guoying Liu; Eleftheria Zeggini; Manqiu Cao; Wenwei Chen; Nisha Vasavda; Tracy Mills; Anne Barton; Anne Hinks; Steve Eyre; Keith W Jones; William Ollier; Alan Silman; Neil Gibson; Jane Worthington; Giulia C Kennedy
Journal:  Am J Hum Genet       Date:  2004-05-20       Impact factor: 11.025

  7 in total
  4 in total

1.  The value of molecular haplotypes in a family-based linkage study.

Authors:  E M Gillanders; J V Pearson; A J M Sorant; J M Trent; J R O'Connell; J E Bailey-Wilson
Journal:  Am J Hum Genet       Date:  2006-06-28       Impact factor: 11.025

2.  Interpopulation linkage disequilibrium patterns of GABRA2 and GABRG1 genes at the GABA cluster locus on human chromosome 4.

Authors:  Chupong Ittiwut; Jennifer Listman; Apiwat Mutirangura; Robert Malison; Jonathan Covault; Henry R Kranzler; Atapol Sughondhabirom; Nuntika Thavichachart; Joel Gelernter
Journal:  Genomics       Date:  2007-11-05       Impact factor: 5.736

3.  Examining the effect of linkage disequilibrium between markers on the Type I error rate and power of nonparametric multipoint linkage analysis of two-generation and multigenerational pedigrees in the presence of missing genotype data.

Authors:  Yoonhee Kim; Priya Duggal; Elizabeth M Gillanders; Ho Kim; Joan E Bailey-Wilson
Journal:  Genet Epidemiol       Date:  2008-01       Impact factor: 2.135

4.  Linkage analysis of quantitative refraction and refractive errors in the Beaver Dam Eye Study.

Authors:  Alison P Klein; Priya Duggal; Kristine E Lee; Ching-Yu Cheng; Ronald Klein; Joan E Bailey-Wilson; Barbara E K Klein
Journal:  Invest Ophthalmol Vis Sci       Date:  2011-07-13       Impact factor: 4.799

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.