Literature DB >> 17916242

The BRCA1 Ashkenazi founder mutations occur on common haplotypes and are not highly correlated with anonymous single nucleotide polymorphisms likely to be used in genome-wide case-control association studies.

Lutécia H Mateus Pereira1, Marbin A Pineda, William H Rowe, Libia R Fonseca, Mark H Greene, Kenneth Offit, Nathan A Ellis, Jinghui Zhang, Andrew Collins, Jeffery P Struewing.   

Abstract

BACKGROUND: We studied linkage disequilibrium (LD) patterns at the BRCA1 locus, a susceptibility gene for breast and ovarian cancer, using a dense set of 114 single nucleotide polymorphisms in 5 population groups. We focused on Ashkenazi Jews in whom there are known founder mutations, to address the question of whether we would have been able to identify the 185delAG mutation in a case-control association study (should one have been done) using anonymous genetic markers. This mutation is present in approximately 1% of the general Ashkenazi population and 4% of Ashkenazi breast cancer cases. We evaluated LD using pairwise and haplotype-based methods, and assessed correlation of SNPs with the founder mutations using Pearson's correlation coefficient.
RESULTS: BRCA1 is characterized by very high linkage disequilibrium in all populations spanning several hundred kilobases. Overall, haplotype blocks and pair-wise LD bins were highly correlated, with lower LD in African versus non-African populations. The 185delAG and 5382insC founder mutations occur on the two most common haplotypes among Ashkenazim. Because these mutations are rare, even though they are in strong LD with many other SNPs in the region as measured by D-prime, there were no strong associations when assessed by Pearson's correlation coefficient, r (maximum of 0.04 for the 185delAG).
CONCLUSION: Since the required sample size is related to the inverse of r, this suggests that it would have been difficult to map BRCA1 in an Ashkenazi case-unrelated control association study using anonymous markers that were linked to the founder mutations.

Entities:  

Mesh:

Year:  2007        PMID: 17916242      PMCID: PMC2093936          DOI: 10.1186/1471-2156-8-68

Source DB:  PubMed          Journal:  BMC Genet        ISSN: 1471-2156            Impact factor:   2.797


Background

Numerous advances in our understanding of genetic susceptibility to breast cancer have been made over the past decade, most notably the discovery of BRCA1 in 1994 and BRCA2 in 1995 [1,2]. Mutations in these genes account for approximately 2/3 of families with clearly inherited forms of breast and ovarian cancer (5 or more cases in a family) [3,4]. In addition to the high-penetrance genes BRCA1/BRCA2, rare mutations in a number of other genes, such as CHEK2, ATM, BRIP1, and PALB1 predispose to breast cancer [5-9], as do more common variants in CASP8 and TGFB1[10]. The total heritability of breast cancer is at least 10% [11,12], and possibly up to 25% or higher [13,14]. Mutations in known high-risk genes, however, account for a relatively small proportion (probably less than 20%) of the excess risk due to genetic factors [15,16]. Fueled by the completion of the first phases of the HapMap project [17], which characterized common variation within the genome of four population groups, there is considerable interest in using these resources to map susceptibility genes for common, complex diseases. The genome-wide case-control association study, whereby the prevalence of genetic variants is compared between cases and unrelated control subjects without the disease, may have the greatest power to identify novel susceptibility genes [18-20]. They rely on using a very dense set of markers that capture a significant fraction of all common genetic variation, such that the variants assayed either include those that are biologically relevant, or those which are highly-correlated with the former due to linkage disequilibrium. Although some design and analysis issues remain, numerous common variants have now been identified for breast cancer and other conditions using this design [21,22]. Breast cancer may serve as a useful paradigm for common, complex disease mapping studies because, while a portion of susceptibility genes have been identified, the majority of the residual familial clustering remains unexplained, and is likely to be polygenic in nature, due to a number of lower-penetrance genes in the context of environmental exposures [23,24]. Furthermore, there are two common Ashkenazi Jewish (AJ) founder BRCA1 mutations, 185delAG and 5382insC, initially identified in linkage studies of multiple-case breast/ovarian cancer families [25]. This contrasts with most other populations in which there are numerous unique BRCA1/BRCA2 mutations, with none present at 1% or greater population frequency. The BRCA1 AJ founder mutations account for the majority of Jewish breast-ovarian cancer families, and are present in approximately 1% of the general Jewish population [26]. The AJ founder mutations, owing to their high prevalence compared to other populations, offered an opportunity to test whether they might have been identified through a case-control association study of the kind suggested as the new gene discovery strategy in the post-HapMap era.

Results

SNP allele frequencies

A total of 289 unrelated reference subjects selected without regard to breast cancer from five population groups (48 each from African-Americans, Chinese-Americans and Mexican-Americans, 60 CEPH subjects, and 85 Ashkenazi Jews) were genotyped across BRCA1, spanning a region of approximately 646 kb. Table 1 presents the allele frequencies and Hardy-Weinberg P-vales for all the 112 polymorphic SNPs and the two founder mutations that were typed for all 5 populations. Eight of 570 tests showed departures from equilibrium at the 0.01 level, but because none of the eight showed Mendelian segregation errors within families, and because of the number of comparisons performed, they were not excluded from later calculations. Allele frequencies were generally highly correlated among Ashkenazi Jews, CEPH, Chinese-Americans and Mexican-Americans (minimum r >0.82), whereas African-Americans presented the lowest correlation values with all the other populations (maximum r <0.44). The highest correlation was found between Ashkenazi Jews and CEPH (r = 0.947) and the lowest between African-Americans and Mexican-Americans (r = 0.362). Most of the private SNPs (n = 18) originated in the African-American samples, although private SNPs were also observed in Ashkenazi Jews (n = 4), Chinese-Americans (n = 3), and Mexican-Americans (n = 1) (Table 1). Total observed heterozygosity for each marker across the five populations ranged from 0.4% for private SNPs to 49.3% [see Additional file 1]. FST ranged from 0.0047 (SNP8rs8176072) to 0.4338 (SNP102 – rs2593595). Sixty three percent of the SNPs showed little genetic differentiation (FST < 0.05), followed by twenty eight percent with moderate (0.05–0.15) and less than ten percent with higher genetic differentiation. We also calculated pair-wise FST measures, and the distribution was very similar for the Ashkenazi Jews, CEPH, Chinese-Americans and Mexican-Americans versus each other (range from 0.008–0.018) as compared with African-Americans versus all the other populations (range from 0.082–0.092), showing that African-Americans had by far the greatest level of differentiation. These results are congruent with the low allele frequency correlation values observed between African-Americans versus all the other groups.
Table 1

SNP Frequencies and Hardy-Weinberg P-Values

Ashkenazi Jews (n = 85)CEPH (n = 60)African Americans (n = 48)Chinese Americans (n = 48)Mexican Americans (n = 48)
rs numberSNP NumberSNP namePosition b33aPosition b36.1bCom AllelecMin AlleledmafeP-valuemafP-valuemafP-valuemafP-valuemafP-value
131191C_92704204137348638718247ct0.4240.5800.3500.7120.1770.6240.3960.3590.4580.961
7486202C_92704214137303738717798cg0.4400.6400.3500.7120.2400.5500.3910.5180.4650.669
175999483C_92704544136417538708936ag0.2110.6490.1920.8650.1040.0220.0940.3260.3620.591
116532314C_952014124022538584986ga0.4350.7030.3500.7120.2400.5500.3960.3590.4670.905
99088055C_31786924123067538575436ct0.0080.9480.3750.8780.0110.941
21759576C_116311834119558738540348tg0.4470.6650.3500.7120.2500.4410.3960.3590.4460.935
30929867BR1_0003404118676138531522ag0.0510.6360.0420.7360.0100.9420.0310.823
81760728BR1_0003924118670938531470ta0.006f0.957
81760749BR1_0005714118653038531291ga0.0100.942
376564010C_26151644118501238529773ag0.4400.6400.3500.7120.2280.7400.3940.4330.4440.592
NAg11M_185delAG4118481138529572a-0.0060.957
817609012BR1_0079184117918338523944cg0.0080.9480.0960.338
180006213BR1_0105744117652838521289ga0.0210.883
817610114BR1_0107964117630638521067at0.0210.882
817610315C_26151714117581538520576ga0.4630.2910.3500.7120.2340.6400.3960.3590.4580.961
817610416BR1_0113404117576238520523ga0.0240.8240.0330.7890.0100.942
817610917C_26151724117454138519302ag0.4410.4970.3500.7120.2390.6090.3960.3590.4690.751
806587218BR1_0167754117032838515089ta0.1350.2780.0100.942
817612019BR1_0171054116999838514759ga0.4240.9130.3500.7120.2400.5500.3960.3590.4580.961
79991420BR1_0185734116854638513307ga0.1350.278
79991321BR1_0194084116771138512472ag0.0320.7720.0420.7360.3650.8130.0100.942
817612822BR1_0199044116721538511976ta0.0630.644
817613323BR1_0208964116622338510984tg0.4220.7380.3500.7120.1670.4880.3960.3590.4580.961
79991224C_26151804116589938510660ct0.4720.7010.3640.9260.1250.3220.3960.3590.4890.882
79992325BR1_0264224116069638505457ct0.2320.0310.3080.8580.0420.7630.1040.459
817614526BR1_0292584115785938502620tc0.4000.0380.3190.5860.2400.5500.4040.3090.4480.829
817614627BR1_0294484115766938502430ct0.0310.7740.0080.948
750315428BR1_0307484115636938501130tg0.4150.6180.3390.4780.2230.7710.3960.3590.4680.862
179995029BR1_0318754115524638500007ag0.0770.4420.0420.7360.0100.9420.0310.823
498685030BR1_0328854115423638498997ga0.1330.0960.1130.0710.0330.8190.0130.935
1694031BR1_0331194115400238498763tc0.4220.4290.3500.7120.1770.6240.3960.3590.4480.829
79991732BR1_0334204115370138498462ct0.4290.7660.3580.8680.1250.3220.3960.3590.4790.571
498685233BR1_0039274115319438497955ga0.0120.9120.0420.7360.0100.942
222794534BR1_0342264115289538497656ag0.0520.7030.0100.942
1694235BR1_0343564115276538497526ag0.4330.8680.3480.6150.2450.5210.3960.3590.4470.716
79991636C_75301094115195538496716tg0.4380.4820.3580.8680.2710.7030.3940.4330.4790.894
207083337BR1_0355074115161438496375ca0.0310.0010.2710.7260.1350.167
207083438BR1_0360774115105038495811ac0.4350.6260.3420.5680.2501.0000.3960.3590.4380.634
817615839BR1_0367934115033438495095ag0.4180.3930.3530.6650.1740.5320.3960.3590.4580.961
817616040BR1_0368594115026838495029ag0.4350.6260.3500.7120.2400.5500.3960.3590.4580.961
817616641BR1_0380854114904238493803ag0.1960.3910.1500.1720.1040.0220.1200.6320.3440.395
817617442BR1_0403504114677738491538at0.0630.644
395098943C_31786964114671838491479ga0.4410.5660.3480.6150.2340.6400.3960.3590.4570.923
817617544BR1_0406694114645838491220t-0.0060.9560.0730.586
817617745BR1_0412884114584038490601ag0.0210.883
817617846BR1_0417214114540738490168ag0.0420.763
106091547C_31786764114323538487996ag0.3800.1050.3520.8510.1770.6240.3960.3590.4680.862
373755948C_31786774114306938487830ct0.1240.4810.0670.5800.0830.2080.0430.761
817618749BR1_0451544114197438486735tc0.0100.942
817618850BR1_0455054114162338486384tg0.0630.644
641692751BR1_00460194114110938485870gc0.0830.208
817619852BR1_0478264113930238484063ta0.4220.4290.3620.4270.3480.1140.4260.7610.4790.571
817619953BR1_04778394113928938484050ac0.3410.5940.2580.5020.1350.1670.1770.6240.3960.135
423914754BR1_0485514113857738483338tc0.4400.6400.3240.6770.2710.7030.3940.4330.4520.711
817620655BR1_0502444113688538481646ag0.0520.7030.0100.942
223676256C_116210424113544038480201at0.4350.6260.3580.8680.2710.7030.4060.5810.4790.894
179996657C_26152084113185938476620tc0.4560.4560.3500.7120.2330.7810.3960.3590.4690.751
309298758BR1_0556694113148838476249ag0.4160.5450.3500.7120.1700.5100.3960.3590.4580.961
817622559BR1_0567964113036138475122gt0.0310.823
817623260BR1_0583694112878838473549ct0.0130.936
817623461BR1_0586144112854538473306ag0.4370.3450.3500.7120.2390.6090.3960.3590.4580.961
817623562BR1_0588344112832538473086ga0.3390.8720.2580.5020.1670.4880.3960.3590.4360.972
817623663BR1_0595894112757038472331tc0.0080.9480.3590.5560.0100.942
817624064BR1_0600224112713738471898tc0.0630.644
817624265BR1_0605204112663938471400ga0.4330.5380.3500.7120.1700.5100.4060.5810.4580.961
817624566BR1_0610144112614538470906tc0.0630.644
309299467C_26152204112459038469351ct0.4340.4710.3500.7120.2280.2420.3960.3590.2660.808
817625968BR1_0625884112457138469332t-0.0520.703
817626569BR1_00643984112276138467522ga0.4170.4770.3500.7120.1670.7290.3940.4330.2600.848
218760370BR1_0645014112265838467419ga0.4220.4290.3500.7120.1670.7290.3960.3590.2600.848
817627371BR1_0667414112041838465179tc0.4240.5800.3420.5680.1670.7290.3960.3590.2600.848
817627872BR1_0679784111918138463942ag0.0080.9480.5000.5640.0210.883
806617173BR1_0680634111909638463857gt0.1460.2370.0100.942
NA74M_5382insC4111784538462606-c0.0060.957
817628975C_26152304111482138459582tc0.4410.5660.3420.5680.2230.2580.3960.3590.2660.808
817629376BR1_0730234111413638458897a-0.0310.823
479319277BR1_0740084111315538457916ag0.4350.6260.3360.7940.2290.2140.3960.3590.2600.848
817629678BR1_0748074111235638457117ag0.4350.7030.3500.7120.2290.2140.3960.3590.2600.848
309298879C_26152384111046738455228ct0.4240.5800.3500.7120.1700.7090.3960.3590.2600.848
817630380BR1_0769334111023038454991ag0.0100.9420.0420.763
817630581BR1_0770344111012938454890ag0.0940.7530.0920.4420.0210.8830.0100.942
817630782BR1_0773284110983538454596tc0.0630.644
806846383BR1_0792204110794338452704ct0.1250.322
817631384BR1_0793964110776738452528ga0.0060.9570.0170.896
817631685BR1_0805704110659438451355ca0.0210.883
817631886BR1_0811254110603938450800gt0.4350.9630.3500.7120.1670.7290.3850.4910.2501.000
1251687BR1_0819904110517338449934ct0.4350.6260.3500.7120.2290.2140.4060.5810.2600.848
817632088BR1_0820354110512838449889ga0.0060.9570.0330.7890.0100.942
817632189BR1_0821994110496438449725ga0.0100.942
817632390BR1_0826874110447638449237cg0.4290.7660.3500.7120.2290.2140.3960.3590.2600.848
722395291C_116210124110364938448411tc0.4410.5660.3560.7650.2810.5690.3960.3590.2870.931
991163092C_31786654109710838441868ag0.4220.4290.3640.9260.2660.8080.3960.3590.2830.813
1146096393C_26152454109006238435121-g0.4410.4970.3500.7120.2280.2420.3960.3590.2660.808
229886194C_31786994108559638430357ga0.4410.3090.3500.7120.2230.2580.3960.3590.2660.808
229886295C_31786984108545338430214tc0.4470.3830.3500.7120.2080.3430.3960.3590.2600.848
44375996C_22879054107449938419260ct0.2290.1290.2410.6560.3540.5200.0830.2080.1460.981
1187163697C_116172314106358238408343ac0.2350.3040.2590.3860.3020.3450.1670.0830.1350.278
227153998C_15884474105971438404475ag0.3880.9320.3880.3390.2130.4470.4480.3410.2770.768
69097199C_7652274102533038370091gt0.2810.885
528854100C_15884174100687338351634ag0.1060.2300.0980.4880.4790.9900.0630.644
323495101C_15884054098325238328013ga0.3160.1820.3420.5680.2080.0010.3960.1350.4430.407
2593595102C_32568854096501038309771ag0.1240.0070.0540.6720.1560.0020.1350.8830.177<.0001
324075103C_32568814093528838280049ag0.1710.7170.1950.0630.1150.3700.1560.3650.3300.558
2290041104C_158833104085608538200846ct0.0120.9130.3020.1030.0310.823
1078523105C_21600774082152538166286ag0.4800.1580.3700.6500.1350.8830.4780.3690.4740.075
752313106C_10756214081058938155350tc0.3630.3140.4220.8510.2920.9540.4060.2500.3510.251
7359598107C_32568674080623438150996ct0.3860.0730.4400.6740.1350.8830.4380.4860.3830.243
2271027108C_159592774077956738124328ct0.0650.6360.0310.823
7214055109C_14414354076193638106697cg0.1490.1090.0080.9480.3850.0030.0310.8230.0210.882
9766110C_14414364076160638106367ga0.3530.2190.4330.7000.2920.9540.4060.2500.3440.823
1553469111C_75296394075152738096288ac0.0650.2480.0580.0610.031<.00010.2020.943
2271029112C_11253694074468738089448ca0.4050.0090.3500.4440.3750.4410.3850.9370.4020.117
3760384113C_14414384074421638088977ac0.3590.0200.3920.6660.4470.7160.3850.9370.4580.023
2292749114C_14414444072734938072110ct0.2170.2190.3830.5180.1920.7940.3750.8780.1280.316

Number of polymorphic loci8380997182
Number of SNPs with maf<0.05111314617
Mean maf0.3050.2690.1810.3270.287

a April 2003 (Build 33) Position

b March 2006 (Build 36.1) Position

c Com Allele = Common allele

d Min Allele = Minor allele

e maf = Minor allele frequency

f Population private SNPs are shown underlined

gNA= Not Applicable – not in dbSNP

SNP Frequencies and Hardy-Weinberg P-Values a April 2003 (Build 33) Position b March 2006 (Build 36.1) Position c Com Allele = Common allele d Min Allele = Minor allele e maf = Minor allele frequency f Population private SNPs are shown underlined gNA= Not Applicable – not in dbSNP

LD structure

In order to analyze the LD structure at the BRCA1 locus, we chose two methods that rely on different premises. The first is haplotype block analysis which identifies sequential and non-overlapping sets of variants in high LD, separated by low levels of LD that are consistent with historical recombination. In this method, all htSNPs need to be genotyped in order to capture most of the genetic variation [27]. The second is a binning method in which SNPs in one LD bin can be interleaved with SNPs in other overlapping bins. Under this approach, one TagSNP per bin needs to be tested in order to capture SNP diversity [28]. Our analyses of D' and r2 showed BRCA1 residing in a large region (~288 kb) of high LD (Fig. 1), in agreement with other reports [29-31]. The entire region studied showed long-range LD, falling primarily into three blocks among non-African populations. The block containing BRCA1 includes 95 SNPs and overlaps the largest LDSelect bin of SNPs correlated at r2 > 0.8 (Fig. 2 and Fig. 3).
Figure 1

Comparison of haplotype blocks at 114 loci across five populations. Blocks were defined as in [27]; markers with MAF <0.05 are shown with a white background and were ignored in the calculations and block boundary estimation. Haplotype tag SNPs (htSNPs) within a block are indicated by arrowheads; htSNPs in only one population are shown on a yellow background while the single htSNP shared between all populations is shown on a green background.

Figure 2

Comparison of SNP bins derived from pair-wise measurements of linkage disequilibrium using LDSelect-Comp. SNPs with MAF < 5% do not have a vertical line or arrowhead in the column. A) Scale representation of the ~650 kb region studied, indicating the BRCA1 gene, founder mutations, and genome sequence gap of unknown true size. Anchor lines link to position of the SNP within the region. B-F) LDSelect creates bins of SNPs that have an r2 value of 0.8 or greater with at least one other SNP in the bin. Each vertical line and arrowhead represents a SNP, with dashed lines and shaded background connecting SNPs within the same bin. Down arrowheads indicate Tag SNPs (those with r2 ≥ 0.8 with all other SNPs in a bin). Note that this use of the term Tag-SNP is different from Haploview – with LDSelect, only one Tag-SNP per bin would be required to capture the majority of the nucleotide diversity. Singleton bins (SNPs that did not have r2 ≥ 0.8 with any other SNP) are indicated by solid dots on a single row. SNP number refers to numbering in column 1 of Table 1.

Figure 3

Pair-wise measures of linkage disequilibrium and the two founder mutation-containing haplotypes. SNP number refers to numbering in column 1 of Table 1; only the 70 with MAF ≥ 0.05 in Ashkenazi Jews are shown in B-D. A) Scale representation of the ~650 kb region studied, indicating the BRCA1 gene, founder mutations, and genome sequence gap of unknown true size. B) LDSelect-Comp output showing a total of 22 bins for Ashkenazi Jews, with 17 "singleton" bins indicated by solid dots on a single row. C) Haploview output showing three block structures and related ht-SNPs (indicated with up arrowheads). D) Haplotypes estimated for 85 unrelated Ashkenazi Jews using SNPHAP as implemented in HapScope. The block boundaries were calculated in Haploview and overlaid on this figure. All haplotypes with an estimated frequency of at least 1% are displayed (h1 to h11), with individual frequencies and sums indicated to the right of the blocks. The common allele is designated "1" and the minor allele "2". The 185delAG and 5382insC containing haplotypes, determined from the family based genotypes, are indicated with gray (haplotype 2) and blue background (haplotype 1), respectively. Black arrows indicate the relative position of these two founder mutations.

Comparison of haplotype blocks at 114 loci across five populations. Blocks were defined as in [27]; markers with MAF <0.05 are shown with a white background and were ignored in the calculations and block boundary estimation. Haplotype tag SNPs (htSNPs) within a block are indicated by arrowheads; htSNPs in only one population are shown on a yellow background while the single htSNP shared between all populations is shown on a green background. Comparison of SNP bins derived from pair-wise measurements of linkage disequilibrium using LDSelect-Comp. SNPs with MAF < 5% do not have a vertical line or arrowhead in the column. A) Scale representation of the ~650 kb region studied, indicating the BRCA1 gene, founder mutations, and genome sequence gap of unknown true size. Anchor lines link to position of the SNP within the region. B-F) LDSelect creates bins of SNPs that have an r2 value of 0.8 or greater with at least one other SNP in the bin. Each vertical line and arrowhead represents a SNP, with dashed lines and shaded background connecting SNPs within the same bin. Down arrowheads indicate Tag SNPs (those with r2 ≥ 0.8 with all other SNPs in a bin). Note that this use of the term Tag-SNP is different from Haploview – with LDSelect, only one Tag-SNP per bin would be required to capture the majority of the nucleotide diversity. Singleton bins (SNPs that did not have r2 ≥ 0.8 with any other SNP) are indicated by solid dots on a single row. SNP number refers to numbering in column 1 of Table 1. Pair-wise measures of linkage disequilibrium and the two founder mutation-containing haplotypes. SNP number refers to numbering in column 1 of Table 1; only the 70 with MAF ≥ 0.05 in Ashkenazi Jews are shown in B-D. A) Scale representation of the ~650 kb region studied, indicating the BRCA1 gene, founder mutations, and genome sequence gap of unknown true size. B) LDSelect-Comp output showing a total of 22 bins for Ashkenazi Jews, with 17 "singleton" bins indicated by solid dots on a single row. C) Haploview output showing three block structures and related ht-SNPs (indicated with up arrowheads). D) Haplotypes estimated for 85 unrelated Ashkenazi Jews using SNPHAP as implemented in HapScope. The block boundaries were calculated in Haploview and overlaid on this figure. All haplotypes with an estimated frequency of at least 1% are displayed (h1 to h11), with individual frequencies and sums indicated to the right of the blocks. The common allele is designated "1" and the minor allele "2". The 185delAG and 5382insC containing haplotypes, determined from the family based genotypes, are indicated with gray (haplotype 2) and blue background (haplotype 1), respectively. Black arrows indicate the relative position of these two founder mutations. African-Americans presented the least LD of all populations, with the presence of more distinct blocks within the region (Fig. 1). Maps for all five populations shared a break-point that maps approximately 20 kb downstream of the BRCA1 gene, between SNP95 (rs2298862) and SNP96 (rs443759). Among non-African groups, only Mexican-Americans exhibited an additional break point within the 288 kb block structure that encompasses BRCA1. The 3'end of the entire region showed less extensive LD but a similar pattern across all the groups. Only one htSNP (SNP3 – rs17599948) was found to be completely shared across populations, which is not unexpected since htSNPs often are population-specific [28] (Fig. 1). When the bin-based approach was used, we found that bins were largely shared across different ethnic groups (Fig. 2). The differences across populations were related to the number of bins as well as the number and position of TagSNPs. As expected, African-Americans were the most diverse group, containing the highest number of bins (34), followed by Ashkenazi Jews (22), CEPH (19), Mexican-Americans and Chinese-American (14). This contrasts with 28 htSNPs in African-Americans, and 16, 13, 25 and 18 among Ashkenazi Jews, CEPH, Mexican- Americans and Chinese-Americans, respectively. Three TagSNPs were shared by all populations (SNP3 – rs17599948, SNP41 – rs8176166, and SNP67 – rs3092994), showing average MAF of 0.193, 0.183 and 0.335, respectively (Fig. 2). Mexican-Americans showed two disjoint bins of highly-correlated SNPs, rather than one extended bin structure as evidenced in Ashkenazim, CEPH and Chinese-Americans. The disjoint occurred between positions 38,471,400 (SNP65 – rs8176242) and 38,469,351 (SNP67 – rs3092994), mapping between introns 17 and 18 of BRCA1 (Fig. 2F). Interestingly, our results resemble what others have observed [32] in Native- Americans, namely an historical recombination event between introns 15 and 18. All five populations showed a large bin spanning ~288 kb encompassing SNP1 (rs13119) through SNP95 (rs2298862) (Fig. 2A–F), which represented the same extended region found in the block analysis. This large bin had 0.278 average MAF across populations and included BRCA1 coding polymorphisms L771L_(TTG>CTG), P871L_(CCG>CTG), 1183R_(AAA>AGA) and S1436S_(TCT>TCC) (Fig. 2). The maps of linkage disequilibrium in LD units (LDU) corresponded well with the two previous approaches of assessing disequilibrium. The four major breakpoints that were observed in Fig. 1 and 2, when haplotype blocks and bin structures were inferred, coincided with the same major steps in the LDU analysis (Fig. 4). In addition, we were able to observe two small steps in Fig. 4 for the Mexican-Americans, which were not observed in any other population. The first step occurred between SNP65 (rs8176242, intron 17) and SNP69 (rs8176265, intron 19). The corresponding site of possible recombination could be observed as a split at the main bin structure for Mexican-Americans in Fig. 2F. The second step was found between SNP90 (rs8176323) and SNP91 (rs7223952), downstream of the gene. A close correspondence was evidenced as a breakdown in LD around the same position in Fig. 1.
Figure 4

LDU maps. Comparison of LDU maps across the ~650 kb region containing the . Top. Scale representation of the ~650 kb region studied, indicating the BRCA1 gene, founder mutations, and genome sequence gap of unknown true size. Bottom. LDU maps for five populations.

LDU maps. Comparison of LDU maps across the ~650 kb region containing the . Top. Scale representation of the ~650 kb region studied, indicating the BRCA1 gene, founder mutations, and genome sequence gap of unknown true size. Bottom. LDU maps for five populations.

185delAG and 5382insC haplotype reconstruction

We were particularly interested in fine mapping the BRCA1 locus to identify possible gene variants or haplotypes associated with the two founder mutations in Ashkenazi Jews. Therefore, 82 intragenic and 30 flanking SNPs were tested against the two founder mutations with the Pearson's correlation (r) coefficient (Table 2). Although in strong LD with the majority of markers as measured by D' (Table 2), the highest pair-wise r2 for 185delAG was 0.04, owing to its relatively low frequency, with a common SNP (SNP96, rs443759) that mapped outside the large BRCA1-containing block, approximately 110 kb downstream of the gene. Regarding 5382insC, there was one highly-significant association (r2 = 1.0) with SNP8 (rs8176072) (Table 2). This is a rare SNP that was present only in 5382insC mutation carriers, and 5382insC was not correlated above 0.03 with any other SNP in the region.
Table 2

Pair-wise correlation coefficients between the two founder BRCA1 mutations and all other SNPs among Ashkenazi Jews

Ashkenazim (n = 85)Correlation (r 2) withD' withD' with


SNP namers numberSNP numberSNP descriptionmafaHetbHW P-value185delAG5382insC185delAG5382insC
C_9270420131191NBR1: UTR0.4240.520.5800.0110.0011.0001.000
C_92704217486202NBR1: UTR0.4400.520.6400.0160.0001.0001.000
C_9270454175999483NBR1: intron0.2110.350.6490.0160.0011.0001.000
C_95201116532314NBR1: intron0.4350.510.7030.0140.0011.0001.000
C_317869299088055LBRCA1: UTR0.0001.0001.000
C_1163118321759576NBR2: intron0.4470.520.6650.0130.0011.0001.000
BR1_00034030929867NBR2: intron0.0510.100.6360.0010.0001.0001.000
BR1_00039281760728NBR2:intron0.006c0.010.9570.0001.0001.0001.000
BR1_00057181760749NBR2:UTR0.0001.0001.000
C_2615164376564010BRCA1:UTR0.4400.520.6400.0150.0001.0001.000
M_185delAGNAd11BRCA1:exon 20.0060.010.957NA0.000NA1.000
BR1_007918817609012BRCA1: intron0.0001.0001.000
BR1_010574180006213BRCA1: P_K38K_(AAG>AAA)0.0001.0001.000
BR1_010796817610114BRCA1: intron0.0001.0001.000
C_2615171817610315BRCA1: intron0.4630.560.2910.0140.0001.0001.000
BR1_011340817610416BRCA1: intron0.0240.050.8240.0010.0001.0001.000
C_2615172817610917BRCA1: intron0.4410.530.4970.0110.0001.0000.000
BR1_016775806587218BRCA1: intron0.0001.0001.000
BR1_017105817612019BRCA1: intron0.4240.490.9130.0150.0011.0001.000
BR1_01857379991420BRCA1: intron0.0001.0001.000
BR1_01940879991321BRCA1: intron0.0320.060.7720.0010.0001.0001.000
BR1_019904817612822BRCA1: intron0.0001.0001.000
BR1_020896817613323BRCA1: intron0.4220.510.7380.0120.0001.0001.000
C_261518079991224BRCA1: intron0.4720.520.7010.0110.0001.0000.000
BR1_02642279992325BRCA1: intron0.2320.440.0310.0030.0001.0001.000
BR1_029258817614526BRCA1: intron0.4000.590.0380.0210.0011.0000.000
BR1_029448817614627BRCA1: intron0.0310.060.7740.0010.0001.0001.000
BR1_030748750315428BRCA1: intron0.4150.510.6180.0120.0011.0001.000
BR1_031875179995029BRCA1: P_Q356R_(CAG>CGG)0.0770.150.4420.0020.0001.0001.000
BR1_032885498685030BRCA1: P_D693N_(GAC>AAC)0.1330.190.0960.0010.0000.0001.000
BR1_0331191694031BRCA1: P_L771L_(TTG>CTG)0.4220.530.4290.0120.0011.0001.000
BR1_03342079991732BRCA1:P_P871L_(CCG>CTG)0.4290.510.7660.0150.0011.0001.000
BR1_003927498685233BRCA1: P_S1040N_(AGC>AAC)0.0120.020.9120.0000.0001.0001.000
BR1_034226222794534BRCA1: P_S1140G_(AGT>GGT)0.0001.0001.000
BR1_0343561694235BRCA1: P_K1183R_(AAA>AGA)0.4330.500.8680.0150.0011.0001.000
C_753010979991636BRCA1: intron0.4380.530.4820.0150.0001.0001.000
BR1_035507207083337BRCA1: intron0.0310.040.0010.0010.0001.0001.000
BR1_036077207083438BRCA1: intron0.4350.520.6260.0140.0011.0001.000
BR1_036793817615839BRCA1: intron0.4180.540.3930.0120.0001.0001.000
BR1_036859817616040BRCA1: intron0.4350.520.6260.0140.0011.0001.000
BR1_038085817616641BRCA1: intron0.1960.350.3910.0180.0000.3881.000
BR1_040350817617442BRCA1: intron0.0001.0001.000
C_3178696395098943BRCA1: intron0.4410.520.5660.0070.0001.0001.000
BR1_040669817617544BRCA1: intron0.0060.010.9560.0000.0001.0001.000
BR1_041288817617745BRCA1: intron0.0001.0001.000
BR1_041721817617846BRCA1: intron0.0001.0001.000
C_3178676106091547BRCA1: P_S1436S_(TCT>TCC)0.3800.560.1050.0180.0011.0000.000
C_3178677373755948BRCA1: intron0.1240.200.4810.0010.0010.4621.000
BR1_045154817618749BRCA1: intron0.0001.0001.000
BR1_045505817618850BRCA1: intron0.0001.0001.000
BR1_0046019641692751BRCA1: intron0.0001.0001.000
BR1_047826817619852BRCA1: intron0.4220.530.4290.0160.0011.0001.000
BR1_0477839817619953BRCA1: intron0.3410.420.5940.0160.0001.0001.000
BR1_048551423914754BRCA1: intron0.4400.520.6400.0110.0011.0001.000
BR1_050244817620655BRCA1: intron0.0001.0001.000
C_11621042223676256BRCA1: intron0.4350.520.6260.0160.0011.0001.000
C_2615208179996657BRCA1: intron0.4560.540.4560.0080.0001.0000.185
BR1_055669309298758BRCA1: intron0.4160.520.5450.0130.0021.0001.000
BR1_056796817622559BRCA1: intron0.0001.0001.000
BR1_058369817623260BRCA1: intron0.0130.030.9360.0000.0001.0001.000
BR1_058614817623461BRCA1: intron0.4370.540.3450.0160.0001.0001.000
BR1_058834817623562BRCA1: intron0.3390.440.8720.0170.0001.0001.000
BR1_059589817623663BRCA1: intron0.0001.0001.000
BR1_060022817624064BRCA1: intron0.0001.0001.000
BR1_060520817624265BRCA1: intron0.4330.520.5380.0110.0011.0001.000
BR1_061014817624566BRCA1: intron0.0001.0001.000
C_2615220309299467BRCA1: intron0.4340.530.4710.0120.0001.0000.000
BR1_062588817625968BRCA1: intron0.0001.0001.000
BR1_0064398817626569BRCA1: intron0.4170.520.4770.0120.0011.0001.000
BR1_064501218760370BRCA1: intron0.4220.530.4290.0090.0011.0001.000
BR1_066741817627371BRCA1: intron0.4240.520.5800.0110.0011.0001.000
BR1_067978817627872BRCA1: intron0.0001.0001.000
BR1_068063806617173BRCA1: intron0.0001.0001.000
M_5382insCNA74BRCA1:exon 200.0060.010.9570.000NA1.000NA
C_2615230817628975BRCA1: intron0.4410.520.5660.0140.0001.0001.000
BR1_073023817629376BRCA1: intron0.0001.0001.000
BR1_074008479319277BRCA1: intron0.4350.520.6260.0160.0011.0001.000
BR1_074807817629678BRCA1: intron0.4350.510.7030.0150.0001.0001.000
C_2615238309298879BRCA1: intron0.4240.520.5800.0150.0011.0001.000
BR1_076933817630380BRCA1: intron0.0001.0001.000
BR1_077034817630581BRCA1: intron0.0940.160.7530.0020.0011.0001.000
BR1_077328817630782BRCA1: intron0.0001.0001.000
BR1_079220806846383BRCA1: intron0.0001.0001.000
BR1_079396817631384BRCA1: intron0.0060.010.9570.0000.0001.0001.000
BR1_080570817631685BRCA1: intron0.0001.0001.000
BR1_081125817631886BRCA1: UTR0.4350.490.9630.0090.0011.0001.000
BR1_0819901251687BRCA1: UTR0.4350.520.6260.0140.0011.0001.000
BR1_082035817632088BRCA1: UTR0.0060.010.9570.0000.0001.0001.000
BR1_082199817632189Intergenic0.0001.0001.000
BR1_082687817632390Intergenic0.4290.510.7660.0150.0011.0001.000
C_11621012722395291Intergenic0.4410.520.5660.0140.0001.0001.000
C_3178665991163092Intergenic0.4220.530.4290.0160.0011.0001.000
C_26152451146096393Intergenic0.4410.530.4970.0140.0011.0001.000
C_3178699229886194ARHN: locus0.4410.550.3090.0110.0010.0001.000
C_3178698229886295ARHN: locus0.4470.540.3830.0110.0011.0001.000
C_228790544375996IFI35:intron0.2290.410.1290.0400.0021.0001.000
C_116172311187163697RPL27: intron0.2350.400.3040.0360.0021.0001.000
C_1588447227153998RPL27: intron0.3880.470.9320.0230.0031.0001.000
C_76522769097199MGC2744: intron0.0001.0001.000
C_1588417528854100MGC2744: intergenic0.1060.160.2300.0000.0011.0001.000
C_1588405323495101G6PC: intergenic0.3160.370.1820.0040.0000.2051.000
C_32568852593595102G6PC: intron0.1240.150.0070.0000.0011.0001.000
C_3256881324075103Intergenic0.1710.290.7170.0000.0000.2661.000
C_158833102290041104PRKWNK4: mis-sense0.0120.020.9130.0050.0001.0001.000
C_21600774321242105RAMP2: intron0.4800.580.1580.0030.0001.0001.000
C_1075621752313106EZH1: intergenic0.3630.400.3140.0050.0001.0001.000
C_32568677359598107EZH1: intergenic0.3860.570.0730.0050.0011.0000.000
C_159592772271027108EZH1: intron0.0001.0001.000
C_14414357214055109CNTNAP1: UTR0.1490.300.1090.0010.0011.0001.000
C_14414369766110CNTNAP1: UTR0.3530.520.2190.0070.0000.1731.000
C_75296391553469111CNTNAP1: silent0.0650.110.2480.0020.0000.4631.000
C_11253692271029112CNTNAP1: silent0.4050.620.0090.0000.0010.0401.000
C_14414383760384113CNTNAP1: intron0.3590.580.0200.0050.0001.0001.000
C_14414442292749114TUBG2: intron0.2170.390.2190.0060.0011.0001.000

a maf = Minor allele frequency

bHet = Heterozygosity

cPopulation private SNPs are shown underlined

dNA= Not Applicable

Pair-wise correlation coefficients between the two founder BRCA1 mutations and all other SNPs among Ashkenazi Jews a maf = Minor allele frequency bHet = Heterozygosity cPopulation private SNPs are shown underlined dNA= Not Applicable Haplotypes were estimated for the set of all SNPs with MAF ≥ 0.05. Haplotypes for the founder mutation containing chromosomes were unambiguously determined in at least one family across the entire region studied. The185delAG and 5382insC mutations occurred on the two most common haplotypes, representing 15% (haplotype 2) and 29% (haplotype 1) of the chromosomes, respectively, among Ashkenazi Jews (Fig. 3D). In the haplotype analyses, the 185delAG mutation occurred on a chromosome with the minor allele at most loci, and the 5382insC on a chromosome with the major allele at most loci (Fig. 3D). This pattern constitutes what has been previously described as "yin yang haplotypes", in which two high-frequency haplotypes have different alleles at most SNP sites [33].

Discussion

The primary objective of this study was to address the question of whether we could identify the Ashkenazi BRCA1 founder mutation 185delAG in a typical case-control association study, using anonymous genetic markers. The answer is no. The impact on the required sample size (S) needed if one studies a marker in LD with the true disease allele is related to the inverse of the square of their correlation coefficient (r), as in S = 1/r2 [34]. Almost none of the SNPs had a high correlation coefficient with either founder mutation (Table 2). Since most markers were more common than the founder mutations, this result is not surprising. However, our SNP selection strategy did not exclude low frequency SNPs. In fact, one of the SNPs identified in 3 of the 90 Polymorphism Discovery Resource subjects [35] was perfectly correlated with 5382insC. However, we did not observe this SNP in any of the four non-Ashkenazi reference populations. Based on our results, the sample size required to detect the 185delAG mutation in a breast cancer case-control study conducted in Ashkenazi women that did not directly test for the mutation, would be at least 25 times larger than one that measured the mutation directly, requiring on the order of 62,000 subjects. Based on pair-wise measurements, we conclude that it would have been extremely difficult to have mapped the two founder mutations using the case-control association methodology using common SNPs. Association studies may also compare combinations of SNPs, or haplotypes, between cases and controls, and the founder mutations might have been discoverable if they occurred on uncommon haplotypes. Using relatively common SNPs (MAF ≥ 5%), like those on whole-genome SNP platforms, we found that the two mutations were present on haplotypes representing a polar pattern, termed yin-yang haplotypes [33]. These two haplotypes accounted together for a large percentage of the total chromosomes studied, independent of the population, ranging from 64% for the Chinese-Americans to 43% for the CEPH. It is highly unlikely that the founder mutations could have been discovered owing to a difference in haplotype frequency between cases and controls largely because they occur on the two most common haplotypes. For example, consider a case-control study of Ashkenazi Jews with 500 cases and 500 controls. Among controls (1000 chromosomes), the distribution of BRCA1-containing haplotypes would be roughly as in Fig. 3D (i.e., there would have been 288 chromosomes with haplotype 1 and 153 with haplotype 2). Among cases, assuming 1% carried 5382insC and 4% carried 185delAG, there would be 5 additional haplotype 1 (total = 293) and 20 additional haplotype 2 (total = 173) chromosomes. These case-control contrasts, 293 vs. 288 (OR 1.02) and 173 vs. 153, would require extremely large sample sizes of over 30,000 subjects to detect either mutation with 80% statistical power. Conversely, in the more advantageous situation in which the 185delAG mutation by chance occurred on a rare haplotype (for example, haplotype 8), there would have been 32 such chromosomes in cases vs. 12 in controls requiring approximately 4,800 subjects for the same statistical power. The BRCA1 locus is well known to have significant LD [29,30]. Nonetheless, we found a marked differentiation between African-Americans and non-African Americans in the haplotype block analysis. Compared with African-Americans, the non-African American populations had less haplotype diversity and more extensive LD (Fig. 1). The increased number of crossovers along the entire region for African-Americans probably reflects older evolutionary events. Our data conform to previous findings [27], describing higher haplotype diversity as well as less extensive LD in the Yoruban and African American samples than in European and Asian populations. When SNP "bins" derived from pair-wise measurements of LD were compared, we found a greater extent of LD boundaries being shared across the five different ethnical groups (Fig. 2). Ashkenazi Jews and the CEPH population had highly similar patterns of LD, independently of the type of analysis used to generate the LD structures (haplotype, or pair-wise bin methods) (Fig. 1 and Fig. 2). Overall, haplotype blocks and bins showed similar patterns, probably owing to the strong LD present overall in this region. The LDU analysis showed a remarkable overall similarity with the two previous methods that were used to analyze LD (Fig. 4). There were basically four major breakdowns in LD downstream to BRCA1 that were largely shared across populations. Nevertheless, African-Americans presented more recombination events than the other four populations, consistent with the smaller block sizes showed in Fig. 1.

Conclusion

In summary, our detailed analyses of 114 polymorphic SNPs in a 646 kb region around BRCA1 in Ashkenazi Jews and other populations confirmed a high level of linkage disequilibrium across nearly the entire region. In addition to 85 unrelated Ashkenazi Jews, we over-sampled carriers of the founder mutations 185delAG and 5382insC and their relatives to more precisely calculate correlations with other markers and to molecularly determine the mutation associated haplotypes (these subjects were not included in allele frequency estimates). This allowed us to assess the likelihood of discovering the founder mutations by virtue of their association with individual SNPs or haplotypes that one would assay in a breast cancer case-control study in Ashkenazi Jews. We did not observe a high correlation coefficient between any individual SNP likely to be included in a genome-wide anonymous scan and either founder mutation. Our findings suggest that a study at least 25X larger (60,000 subjects or more) would have been required if the mutations were not tested for directly. The two founder mutations occur on the two most common haplotypes, representing over 40% of the chromosomes, also suggesting that a haplotype-based analysis would not have been successful at detecting either of the underlying mutations. These results are influenced heavily by the relative rarity of the founder mutations, as reflected by high values for Lewontin's D' measures of LD but low correlation coefficients. Our results suggest caution in using genome-wide association studies with common SNPs for detecting uncommon, disease-causing mutations.

Methods

Subjects

Independent subjects included 85 unrelated Ashkenazi Jews, 60 European-Americans (Utah) from the CEPH (The Centre d'Etude du Polymorphisme Humain) family collection, and 48 each from African-Americans, Chinese-Americans and Mexican-Americans (Human Diversity Collection, Coriell Cell Repository, Camden, NJ). The 30 children of the 60 Utah CEPH subjects were also assayed to test for Mendelian errors. In addition, six unrelated BRCA1:185delAG and three unrelated BRCA1:5382insC founder mutation carriers and their relatives [36], identified through the National Cancer Institute's Cancer Family Registry, were included in the study in order to establish mutation-associated haplotypes from family data. Mutation-associated haplotypes were inferred through inspection of genotypes for all available first-degree relatives of mutation carriers. The Ashkenazi Jewish samples were obtained from anonymous control subjects from the National Laboratory for the Genetics of Israeli Populations at Tel-Aviv University [37].

Marker selection and genotyping

The 90 kb BRCA1 locus was previously re-sequenced in 90 individuals representing five major US ethnic/population groups from the Polymorphism Discovery Resource (PDR-90) [35], by the University of Washington as part of the Environmental Genome Project (EGP) [38]. Samples consisted of 24 European-, 24 African- 24 Asian-, 12 Mexican-, and six Native-Americans. The geographic origin of individual donors, however, is masked and may not be used to assign allele frequencies to specific sub-populations. Most of the 301 variants identified were SNPs. Genotyping all 301 variants at this locus in the current study was not necessary since many are highly correlated. We developed the following strategy to identify a reduced set of variants that still captured much of the diversity of the region. Using the EGP data on all 299 biallelic single nucleotide substitutions (i.e., no lower minor allele frequency cutoff), and using custom software, we calculated all pair-wise correlations (r2) and created "clusters", defined as groups of SNPs that were perfectly correlated with all others in the cluster. Our method is similar to LDSelect 1.0 [28] except that it required that all pair-wise correlations of SNPs in a cluster be 1.0 (i.e., complete LD). LDSelect is typically used with a threshold value of r2 of 0.8, and SNPs are clustered into "LD bins" if their pair-wise r2 is at or above this threshold value with at least one other SNP (but not all) in the bin. Using an r2 of 1.0 resulted in more clusters than a lower r2 threshold, increasing the number of SNPs assayed in this study. Taqman 5'-nuclease assays were developed through Applied Biosystems (Foster City, CA) Assay-by-Design service after first filtering for repetitive, non-unique, and low-complexity sequence. We developed assays for all "singleton" SNPs (those that did not have pair-wise r2 values of 1.0 with another SNP). For the 59 clusters with two or more SNPs, we chose one SNP from each cluster of two, three and four SNPs, and for 9 clusters of five or more SNPs, we chose one fourth of them for assay development. In addition, we selected all (n = 43) commercially available Assay-on-Demand assays (Applied Biosystems, Foster City, CA) that mapped within approximately 200 kb upstream and 400 kb downstream of the BRCA1 locus. This SNP set represented almost all known variants (or ones highly correlated) at this locus. Of the 143 resulting assays, three were excluded due to technical problems (poor clustering or more than one Mendelian error), and 28 were not polymorphic in our complete sample set, leaving 112 polymorphic SNPs in addition to the two founder mutations. There were 82 BRCA1 intragenic SNPs (approximately one SNP per 1 kb) and 30 SNPs that mapped to the region outside BRCA1 (approximately one SNP per 20 kb). The allelic discrimination assays were performed in 5 microliter reactions in 384-well plates according to manufacturer's recommendations. Data were analyzed with the allelic discrimination SDS 2.1 software on an ABI 7900HT (Applied Biosystems, Foster City, CA), with manual determination of genotype clusters [see Additional file 2].

Statistical analysis

Allelic frequency and chi-square goodness-of-fit tests for Hardy-Weinberg equilibrium (HWE) were calculated using SAS/Genetics 9.1 (SAS Institute, Inc., Cary, North Carolina). To assess the correlation between the two founder mutations and all other SNPs, we over-sampled mutation carriers and calculated a weighted Pearson's correlation coefficient using SAS 9.1. We also tested association by use of Tagger [39] operates in either pairwise or aggressive mode, and we used both approaches to examine association. Heterozygosity levels, as well as the variation in gene frequencies between populations by means of their FST (Wright's F-statistics) were calculated using POPGENE 1.31 [40]. Haplotypes and their frequencies were inferred from genotypes across the entire region for each population separately, using the software package SNPHAP 1.3 [41], as implemented in Hapscope [42], for loci with minor allele frequencies (MAF) > 5%. SNPHAP uses the expectation-maximization algorithm to calculate maximum likelihood estimates of haplotype frequencies from unphased genotype data. In order to compare LD patterns across different populations, we employed two different analyses, the first based on partitioning SNPs into haplotype blocks [27] using Haploview [43] and the second based on "bins" of correlated SNPs not constrained to be adjacent to each other [28]. The binning method used a modified version of LDSelect 1.0 that calculates composite LD measures [44], without assuming that loci are in Hardy-Weinberg equilibrium. We used an r2 threshold of 0.8 for binning SNPs, and filtered SNPs with population-specific MAF ≤ 0.05. LDSelect identifies tagSNPs, representing those SNPs in a bin that have r2 values at or above the threshold with all other SNPs in a bin. Only one tagSNP in each bin needs to be assayed to capture the majority of the SNP diversity. The block method employed by Haploview groups adjacent SNPs in strong LD, defined as those with one-sided upper 95% confidence bound on D' >0.98 and the lower bound >0.7. In this method, haplotype tag SNPs (htSNPs) represents the set of SNPs that must be assayed in each block to capture all haplotypes at 1% frequency in the population. LD maps were constructed from genotype data with the software LDMAP [45]. LD maps are scaled in linkage disequilibrium units (LDU) and show (when plotted against the physical map) a pattern of plateaus (reflecting regions of low haplotype diversity and low recombination) and steps (representing regions of historical recombination events). We genotyped related individuals from families segregating 185delAG and 5382insC founder mutations in order to reconstruct their haplotypes. The 185delAG- and 5382insC-containing haplotypes were unambiguously determined from analyzing the genotypes of all available family members. The frequencies of these mutation-containing haplotypes were determined from SNPHAP analyses of the five populations separately. Block boundaries were defined based on Haploview analyses and overlaid upon the SNPHAP results. We estimated the required number of subjects to have 80% statistical power to identify the 185delAG mutation if tested directly in a case-control study to be approximately 2492 using EpiInfo 4.0 [46], assuming equal numbers of cases and controls, alpha of 0.0001, and heterozygous carrier frequencies of 0.6% for controls and 3.3% for cases.

Competing interests

The author(s) declares that there are no competing interests.

Authors' contributions

LMP performed laboratory and statistical analysis and drafted the manuscript. MAP and LRF performed laboratory analysis. WHR and JZ wrote custom software and performed statistical analysis. MHG provided biomaterials and critically revised the manuscript. KO and NAE participated in the study design and analysis. AC provided design assistance and performed statistical analysis. JPS conceived and designed the study and performed statistical analysis. All authors read and approved the final manuscript.

Additional file 1

Summary of F-statistics and heterozygosity for all loci. Heterozygosity levels, as well as the variation in gene frequencies between populations by means of their FST (Wright's F-statistics). Click here for file

Additional file 2

Genotypes for all 114 polymorphic SNPs in 5 populations. Raw genotypes. Populations: (1) CEPH (includes related individuals), (2) African Americans, (3) Chinese Americans, (4) Mexican Americans and (5) Ashkenazi Jews. Click here for file
  38 in total

1.  Polygenic susceptibility to breast cancer and implications for prevention.

Authors:  Paul D P Pharoah; Antonis Antoniou; Martin Bobrow; Ron L Zimmern; Douglas F Easton; Bruce A J Ponder
Journal:  Nat Genet       Date:  2002-03-04       Impact factor: 38.330

2.  The first linkage disequilibrium (LD) maps: delineation of hot and cold blocks by diplotype analysis.

Authors:  N Maniatis; A Collins; C F Xu; L C McCarthy; D R Hewett; W Tapper; S Ennis; X Ke; N E Morton
Journal:  Proc Natl Acad Sci U S A       Date:  2002-02-12       Impact factor: 11.205

3.  The structure of haplotype blocks in the human genome.

Authors:  Stacey B Gabriel; Stephen F Schaffner; Huy Nguyen; Jamie M Moore; Jessica Roy; Brendan Blumenstiel; John Higgins; Matthew DeFelice; Amy Lochner; Maura Faggart; Shau Neen Liu-Cordero; Charles Rotimi; Adebowale Adeyemo; Richard Cooper; Ryk Ward; Eric S Lander; Mark J Daly; David Altshuler
Journal:  Science       Date:  2002-05-23       Impact factor: 47.728

4.  HapScope: a software system for automated and visual analysis of functionally annotated haplotypes.

Authors:  Jinghui Zhang; William L Rowe; Jeffery P Struewing; Kenneth H Buetow
Journal:  Nucleic Acids Res       Date:  2002-12-01       Impact factor: 16.971

5.  Haplotype and linkage disequilibrium architecture for human cancer-associated genes.

Authors:  Penelope E Bonnen; Peggy J Wang; Marek Kimmel; Ranajit Chakraborty; David L Nelson
Journal:  Genome Res       Date:  2002-12       Impact factor: 9.043

Review 6.  The genetic epidemiology of cancer: interpreting family and twin studies and their implications for molecular genetic approaches.

Authors:  N Risch
Journal:  Cancer Epidemiol Biomarkers Prev       Date:  2001-07       Impact factor: 4.254

Review 7.  Linkage disequilibrium in humans: models and data.

Authors:  J K Pritchard; M Przeworski
Journal:  Am J Hum Genet       Date:  2001-06-14       Impact factor: 11.025

8.  Environmental and heritable factors in the causation of cancer--analyses of cohorts of twins from Sweden, Denmark, and Finland.

Authors:  P Lichtenstein; N V Holm; P K Verkasalo; A Iliadou; J Kaprio; M Koskenvuo; E Pukkala; A Skytthe; K Hemminki
Journal:  N Engl J Med       Date:  2000-07-13       Impact factor: 91.245

9.  A comprehensive model for familial breast cancer incorporating BRCA1, BRCA2 and other genes.

Authors:  A C Antoniou; P D P Pharoah; G McMullan; N E Day; M R Stratton; J Peto; B J Ponder; D F Easton
Journal:  Br J Cancer       Date:  2002-01-07       Impact factor: 7.640

10.  Attributable risks for familial breast cancer by proband status and morphology: a nationwide epidemiologic study from Sweden.

Authors:  Kari Hemminki; Charlotta Granström; Kamila Czene
Journal:  Int J Cancer       Date:  2002-07-10       Impact factor: 7.396

View more
  4 in total

1.  Haplotype analysis of the 185delAG BRCA1 mutation in ethnically diverse populations.

Authors:  Yael Laitman; Bing-Jian Feng; Itay M Zamir; Jeffrey N Weitzel; Paul Duncan; Danielle Port; Eswary Thirthagiri; Soo-Hwang Teo; Gareth Evans; Ayse Latif; William G Newman; Ruth Gershoni-Baruch; Jamal Zidan; Shani Shimon-Paluch; David Goldgar; Eitan Friedman
Journal:  Eur J Hum Genet       Date:  2012-07-04       Impact factor: 4.246

2.  Haplotype structure in Ashkenazi Jewish BRCA1 and BRCA2 mutation carriers.

Authors:  Kate M Im; Tomas Kirchhoff; Xianshu Wang; Todd Green; Clement Y Chow; Joseph Vijai; Joshua Korn; Mia M Gaudet; Zachary Fredericksen; V Shane Pankratz; Candace Guiducci; Andrew Crenshaw; Lesley McGuffog; Christiana Kartsonaki; Jonathan Morrison; Sue Healey; Olga M Sinilnikova; Phuong L Mai; Mark H Greene; Marion Piedmonte; Wendy S Rubinstein; Frans B Hogervorst; Matti A Rookus; J Margriet Collée; Nicoline Hoogerbrugge; Christi J van Asperen; Hanne E J Meijers-Heijboer; Cees E Van Roozendaal; Trinidad Caldes; Pedro Perez-Segura; Anna Jakubowska; Jan Lubinski; Tomasz Huzarski; Paweł Blecharz; Heli Nevanlinna; Kristiina Aittomäki; Conxi Lazaro; Ignacio Blanco; Rosa B Barkardottir; Marco Montagna; Emma D'Andrea; Peter Devilee; Olufunmilayo I Olopade; Susan L Neuhausen; Bernard Peissel; Bernardo Bonanni; Paolo Peterlongo; Christian F Singer; Gad Rennert; Flavio Lejbkowicz; Irene L Andrulis; Gord Glendon; Hilmi Ozcelik; Amanda Ewart Toland; Maria Adelaide Caligo; Mary S Beattie; Salina Chan; Susan M Domchek; Katherine L Nathanson; Timothy R Rebbeck; Catherine Phelan; Steven Narod; Esther M John; John L Hopper; Saundra S Buys; Mary B Daly; Melissa C Southey; Mary-Beth Terry; Nadine Tung; Thomas V O Hansen; Ana Osorio; Javier Benitez; Mercedes Durán; Jeffrey N Weitzel; Judy Garber; Ute Hamann; Susan Peock; Margaret Cook; Clare T Oliver; Debra Frost; Radka Platte; D Gareth Evans; Ros Eeles; Louise Izatt; Joan Paterson; Carole Brewer; Shirley Hodgson; Patrick J Morrison; Mary Porteous; Lisa Walker; Mark T Rogers; Lucy E Side; Andrew K Godwin; Rita K Schmutzler; Barbara Wappenschmidt; Yael Laitman; Alfons Meindl; Helmut Deissler; Raymonda Varon-Mateeva; Sabine Preisler-Adams; Karin Kast; Laurence Venat-Bouvet; Dominique Stoppa-Lyonnet; Georgia Chenevix-Trench; Douglas F Easton; Robert J Klein; Mark J Daly; Eitan Friedman; Michael Dean; Andrew G Clark; David M Altshuler; Antonis C Antoniou; Fergus J Couch; Kenneth Offit; Bert Gold
Journal:  Hum Genet       Date:  2011-05-20       Impact factor: 4.132

3.  Impact of gene patents and licensing practices on access to genetic testing for inherited susceptibility to cancer: comparing breast and ovarian cancers with colon cancers.

Authors:  Robert Cook-Deegan; Christopher DeRienzo; Julia Carbone; Subhashini Chandrasekharan; Christopher Heaney; Christopher Conover
Journal:  Genet Med       Date:  2010-04       Impact factor: 8.822

4.  Limitations of the human reference genome for personalized genomics.

Authors:  Jeffrey A Rosenfeld; Christopher E Mason; Todd M Smith
Journal:  PLoS One       Date:  2012-07-11       Impact factor: 3.240

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.