Literature DB >> 16091150

Characterization of the linkage disequilibrium structure and identification of tagging-SNPs in five DNA repair genes.

Kristina Allen-Brady1, Nicola J Camp.   

Abstract

BACKGROUND: Characterization of the linkage disequilibrium (LD) structure of candidate genes is the basis for an effective association study of complex diseases such as cancer. In this study, we report the LD and haplotype architecture and tagging-single nucleotide polymorphisms (tSNPs) for five DNA repair genes: ATM, MRE11A, XRCC4, NBS1 and RAD50.
METHODS: The genes ATM, MRE11A, and XRCC4 were characterized using a panel of 94 unrelated female subjects (47 breast cancer cases, 47 controls) obtained from high-risk breast cancer families. A similar LD structure and tSNP analysis was performed for NBS1 and RAD50, using publicly available genotyping data. We studied a total of 61 SNPs at an average marker density of 10 kb. Using a matrix decomposition algorithm, based on principal component analysis, we captured >90% of the intragenetic variation for each gene.
RESULTS: Our results revealed that three of the five genes did not conform to a haplotype block structure (MRE11A, RAD50 and XRCC4). Instead, the data fit a more flexible LD group paradigm, where SNPs in high LD are not required to be contiguous. Traditional haplotype blocks assume recombination is the only dynamic at work. For ATM, MRE11A and XRCC4 we repeated the analysis in cases and controls separately to determine whether LD structure was consistent across breast cancer cases and controls. No substantial difference in LD structures was found.
CONCLUSION: This study suggests that appropriate SNP selection for an association study involving candidate genes should allow for both mutation and recombination, which shape the population-level genomic structure. Furthermore, LD structure characterization in either breast cancer cases or controls appears to be sufficient for future cancer studies utilizing these genes.

Entities:  

Mesh:

Substances:

Year:  2005        PMID: 16091150      PMCID: PMC1208870          DOI: 10.1186/1471-2407-5-99

Source DB:  PubMed          Journal:  BMC Cancer        ISSN: 1471-2407            Impact factor:   4.430


Background

Candidate gene association studies are a powerful study design for complex diseases such as cancer. Advances in association studies have been furthered by the recent discovery of single nucleotide polymorphisms (SNPs); their vast density throughout the genome, ease of genotyping and moderate cost contribute greatly to their utility. Association testing is efficient when the SNPs being analyzed represent the entire genetic variation of the gene. It has been suggested that nearby SNPs are organized into regions of high linkage disequilibrium (LD) separated by short segments of very low LD [1-6]. In Caucasians, high LD regions may vary in length from a few kb to >300 kb[2,6,7]. Regions of high LD contain redundant information and can be reduced to smaller subsets of tagging-SNPs (tSNPs)[8], such that tSNPs identify all common haplotypes within the region of high LD. A number of algorithms have been proposed to define regions of high LD and tSNPs[4,8-14]. Thus far, no consensus of which algorithm is best has been achieved. Several studies have suggested the utility of matrix decomposition algorithms.[12,13,15-17]. One advantage of these algorithms is that SNPs in high LD are not required to be contiguous nor mutually exclusive, a flexibility that is necessary for analyzing small genomic regions and rare variants. Further, these methods are stable with regards to marker density, minor allele frequency, analysis window, and possible analysis window length[18]. Growing evidence appears to suggest that tumorigenesis is a multi-step process of genetic alterations that transform a normal human cell into a malignant derivative[19]. The ability of a cell to maintain genomic stability through DNA repair mechanisms is essential to prevent tumor initiation and progression. A number of different types of cancer have been attributed to defective DNA repair including xeroderma pigmentosum[20], hereditary nonpolyposis colorectal cancer[21], and breast cancer due to mutations in BRCA1 and BRCA2 as well as other DNA repair genes (e.g., ATM, TP53 and CHK2)[22]. Many published candidate gene association studies involving DNA repair genes and cancer risk have assessed risk by examining a single SNP per gene or a single locus at a time analysis approach. Unfortunately, the former approach is often inadequate in comprehensively accounting for the genetic variation of a gene, and the latter incurs multiple testing corrections, which usually eliminate all or most of the association evidence found. It has been suggested that use of haplotypes in association studies may have increased power over single-allele studies[8]. Descriptions of haplotype diversity and LD structure as well as identification of potential tSNPs will be key for success in candidate gene association studies. Here we describe haplotypes, LD structure and potential tSNPs in five DNA repair breast cancer susceptibility genes: ATM, MRE11A, NBS1, RAD50, and XRCC4. We used a matrix decomposition algorithm based on a method of principal components analysis[13]; this method does not require SNPs to be in contiguous block structure. Characterization of the LD structure and tSNPs are necessary for the design of future effective association studies.

Methods

Subjects

This study is part of a larger study involving 139 high-risk Caucasian breast cancer families, defined as high risk because cancer rates in these families were significantly higher than the general population rate determined using the Utah Population Database (UPDB) [23-25]. All breast cancer cases in the larger cohort met at least one of the following criteria: 1.) their family tested negative for a BRCA1 or BRCA2 mutation, 2.) the case themselves tested negative for the same BRCA1/2 mutation that was present in their family, or 3.) their family had a low probability of carrying a BRCA1/2 mutation based on the number of breast cancer cases present in the family and/or ages at diagnosis of breast cancer within the family. Therefore, all breast cancer cases in the larger study had a low residual probability of their cancer being due to mutations in BRCA1/2. Breast cancer diagnosis information was obtained from medical records for the subject or the Utah Cancer Registry. For this LD characterization study, we selected a panel of 94 individuals (47 female breast cancer cases and 47 female controls), chosen randomly from separate kindreds to ensure independence. Both cases and controls were chosen such that comparisons of LD structure could be made between the groups. The sample size of 188 chromosomes is larger than generally used for this type of study [26-29], but inadequate for an association analysis. This current study is not a case-control study and associations with disease were not assessed. Blood samples were collected on all subjects and all individuals signed consent to participate this study. This study was approved by the University of Utah Institutional Review Board.

Genes and SNP selection

For each gene of interest (i.e., ATM, MRE11A, NBS1, RAD50 and XRCC4), all SNPs available from Applied Biosystems[30], within each gene and the flanking 10 kb on either side, that had been validated to have a minor allele frequency greater than 0.01 in Caucasians were selected. For ATM (on chromosome 11q22-q23), which spans approximately 143 kb and contains 64 exons, 14 SNPs were studied with a SNP resolution of 1 SNP/10,489 bp. For MRE11A (11q21), which spans approximately 76 kb and contains 20 exons, 11 SNPs were studied with a SNP resolution of 1 SNP/8539 bp. For NBS1 (8q21), which contains 16 exons and spans about 51 kb, 5 SNPs were studied with a SNP resolution of 1 SNP/8256 bp. The RAD50 gene (5q31) spans approximately 87 kb contains 25 exons, and we studied 10 SNPs at a resolution of 1 SNP/10,533 bp. Finally, for XRCC4 (5q13-q14) with 8 exons and approximately 276 kb in length, we studied 21 SNPs at a resolution of 1 SNP/13,198 bp. The vast majority of the SNPs studied were intronic (see Table 1).
Table 1

Characteristics of SNPs analyzed

GeneSNP CodeSNP IDBase change*PositionMAFABI reported MAF§# bp from the most 5' SNP
ATMA1rs228589T/AFlanking0.450.330
ATMA2rs228591G/AmRNA-utr0.450.334125
ATMA3rs641605T/CIntron0.450.338,711
ATMA4rs228599A/GIntron0.440.3114,452
ATMA5rs600931T/CIntron0.450.3524,127
ATMA6rs228592A/CIntron0.450.3329,981
ATMA7rs664677T/CIntron0.430.3349,974
ATMA8rs1003623T/CIntron0.450.3359,374
ATMA9rs609261C/TmRNA-utr, intron0.450.3264,926
ATMA10rs645485G/AIntron0.450.3275,655
ATMA11rs673281A/GIntron0.450.3188,861
ATMA12rs227061G/AmRNA-utr, intron0.450.34112,121
ATMA13rs227062A/GmRNA-utr, intron0.450.33112,175
ATMA14rs652311A/GFlanking0.450.36146,861
MRE11M1rs646130T/CFlanking0.30.390
MRE11M2rs491404G/CFlanking0.30.49192
MRE11M3rs10831227G/AIntron0.30.416,336
MRE11M4rs601341G/AIntron0.380.3628,536
MRE11M5rs554715T/CIntron0.30.432,986
MRE11M6rs556477A/GIntron0.30.440,565
MRE11M7rs1805365A/GIntron0.020.0261,721
MRE11M8rs680695A/GIntron0.340.3672,913
MRE11M9rs1009455C/GIntron0.020.01||85,033
MRE11M10rs1009456C/Alocus-region, mRNA-utr0.010.0287,401
MRE11M11rs10831234C/TFlanking0.090.0693,946
NBS1N1rs12680687G/TIntron- **0.280
NBS1N2rs709816A/GCoding-synon-0.4516,323
NBS1N3rs1805790C/TIntron-0.3923,313
NBS1N4rs741778C/GIntron-0.3633,415
NBS1N5rs1805841C/GIntron-0.4541,282
RAD50R1rs2522406G/AFlanking-0.010
RAD50R2rs2244012C/TIntron-0.1912,116
RAD50R3rs2299015T/GIntron-0.1912,388
RAD50R4rs2299014G/TIntron-0.4114,290
RAD50R5rs2706377A/GIntron-0.0150,388
RAD50R6rs2301713C/Tintron-0.1962,887
RAD50R7rs2040703C/GIntron-0.2283,149
RAD50R8rs2240032C/TIntron-0.1888,018
RAD50R9rs1800925C/TFlanking-0.19103,700
RAD50R10rs2066960C/AFlanking-0.17105,326
XRCC4X1rs1993948T/AFlanking0.460.470
XRCC4X2rs1478485G/AmRNA-utr0.470.458247
XRCC4X3rs11951257T/CIntron0.470.4531,031
XRCC4X4rs10045104C/TIntron0.430.4240,082
XRCC4X5rs6452526C/TIntron0.470.4364,531
XRCC4X6rs1382369G/AIntron0.470.4369,149
XRCC4X7rs1382368C/TIntron0.470.4178,795
XRCC4X8rs1382363C/TIntron0.470.4280,292
XRCC4X9rs13180316G/AIntron0.230.2687,173
XRCC4X10rs11741420A/TIntron0.470.4498,452
XRCC4X11rs2731861T/CIntron0.470.45112,984
XRCC4X12rs2662238G/AIntron0.460.45127,027
XRCC4X13rs1039786C/TIntron0.460.45127,761
XRCC4X14rs963248T/CIntron0.190.16161,614
XRCC4X15rs301276G/AIntron0.230.23175,451
XRCC4X16rs35268T/CIntron0.160.13216,216
XRCC4X17rs301286T/CIntron0.160.18230,675
XRCC4X18rs301289C/TIntron0.170.17233,955
XRCC4X19rs2386275G/AIntron0.090.12270,260
XRCC4X20rs2891980T/CIntron0.090.13270,383
XRCC4X21rs1056503T/GCoding-synon0.090.12276,697

* Base change listed as Major allele / Minor allele

† Position obtained from the University of California, Santa Cruz Genome Browser ; Flanking = within 10 kb of either side of gene; Locus region = variation in region of gene, but not in transcript; mRNA-utr = variation in transcript, but not in coding region interval

‡ MAF = minor allele frequency using our panel of 94 breast cancer case and control subjects

§Applied Biosystems reported minor allele frequency in Caucasians

|| Corrected value. Applied Biosystems acknowledged error in reported minor allele frequency of 0.49 on their web site, but it has not been updated.

** NBS1 and RAD50 were not genotyped in the current study. All analyses for these two genes were performed using the raw genotype data freely available online from Applied Biosystems. Base change obtained from University of California, Santa Cruz Genome Browser.

Genotyping

For the ATM and XRCC4 all SNPs that met the above criteria were genotyped on our panel of 94 subjects. For MRE11A, one SNP repeatedly failed to amplify (rs10831224) and was removed from the study. Genomic DNA was isolated and purified using standard phenol/chloroform DNA extraction. SNP genotyping was performed using the fluorogenic 5' nuclease TaqMan Assay[31] (Applied Biosystems). The TaqMan Assay requires TaqMan PCR Master Mix (Applied Biosystems), which we used according to manufacturer's instructions, yielding a final volume of 5 μl per well. PCR amplification was also performed according to the Applied Biosystems protocol. The 7900HT Sequence Detection System (Applied Biosystems) was used to measure each fluorescent dye-labeled probe specific for each allele studied and results were analyzed with the Sequence Detection Software (Applied Biosystems).

Haplotype structure and tSNP selection

Haplotypes and haplotype frequencies were estimated from unphased genotype data using an expectation-maximization algorithm, SNPHAP[32]. SNPHAP uses a maximum-likelihood program to predict multilocus haplotypes. Haplotypes with a frequency of at least 0.01 were analyzed using a two-step PCA method[13]. This method does not require that groups of SNPs be contiguous along a DNA fragment and also allows SNPs to be present in more than one group. In step I, LD groups are determined. In brief, the PCA method extracts factors (LD groups) to capture ≥ 90% of the genetic diversity. An LD group is defined as those SNPs that load onto the same factor. In step II, tSNPs are selected for each LD group. Each LD group is considered separately and the PCA method again extracts factors; tSNPs are chosen as the SNPs with the highest factor loading. When a number of SNPs load equally well on an LD group, these can all be considered potential tSNPs. Under such circumstances, we selected the single SNP that performed best in the genotyping assay. This was done in order to minimize errors in allele calls. We compared our genotype data for ATM, MRE11A, and XRCC4 with genotyping data for these same genes obtained from Applied Biosystems (ABI)[30] on 45 Caucasians. We found good concordance in allele frequencies between the data sets. Further, we applied the same LD characterization to both data sets and found excellent concordance in the LD groups and potential tSNPs (see Results). We therefore characterized LD groups and tSNPs for NBS1 and RAD50 using the genotyping data available online. We also examined whether differences existed between LD group structure and tSNP selection when cases and controls were considered separately. This analysis could only be performed for ATM, MRE11A, and XRCC4.

Results

Characteristics of the SNPs studied are listed in Table 1. Minor allele frequencies from our 94 subjects compared well with those listed by Applied Biosystems[30]. Despite the very low minor allele frequencies in some of the SNPs studied, we observed heterozygosity for all SNPs genotyped. Table 2 lists the haplotypes with a frequency > 0.01 obtained from SNPHAP, and the LD group designation and the tSNPs that were selected using the PCA method, for ATM, MRE11A, and XRCC4. Haplotypes are reported using the standard convention of designating the major allele as '1' and the minor allele as '2', in order to more easily spot occurrences of the minor allele. Please see Table 1 for the corresponding base pair change. For ATM, 7 haplotypes overall were observed and 5 had a frequency > 0.01. Using the PCA method, a single LD group was identified, encompassing the entire gene and accounting for 98.8% of the genetic variance across the gene. From this single LD group, a single tSNP (A13) was selected.
Table 2

Haplotypes with frequency>0.01, LD group characterization and tSNPs selected using Utah genotyping data*

a. ATM
A1A2A3A4A5A6A7A8A9A10A11A12A13A14Freq

111111111111110.54
222222222222220.42
222222122222220.01
222122122212220.01
111111111121110.01
LD Group and tSNP Designation
1111111111111†1
b. MRE11A

M1M2M3M4M5M6M7M8M9M10M11Freq

222122111110.30
111211121110.28
111111111110.25
111211111120.09
111111121110.06
111211212210.01
LD Group and tSNP Designation
1114†11†2422†3†
c. XRCC4

X1X2X3X4X5X6X7X8X9X10X11X12X13X14X15X16X17X18X19X20X21Freq

2222222212222111111110.35
1111111121111121111110.19
1111111111111111111110.11
1111111111111212221110.10
1222222212222111111110.05
1111111121111121112220.03
1111111111111212111110.02
2222222212222111112220.02
2111111111111111111110.02
2221222212222212221110.02
1111111111111111112220.02
1111111111111211221110.01
111111211111211221110.01
LD Group and tSNP Designation
11†1111114†11112†4222333†

* Analysis considers the total panel of 94 individuals together

† tSNP selected / group

For MRE11A, we observed 9 haplotypes in total and 6 with frequency > 0.01. From the PCA analysis, four LD groups were identified based on these 6 haplotypes with a frequency > 0.01, and accounted for 99.1% of the genetic variance. The LD groups did not conform to haplotype blocks. SNP M4 separated LD group 1 into two parts and M8 separated LD group 2. Each LD group was represented by a single tSNP, such that the tSNP set contained 4 tSNPs (M6, M10, M11, and M14). For XRCC4, we observed 26 haplotypes overall; 13 of which had a frequency >0.01. From the PCA method, four LD groups were observed which accounted for 97.2% of the variance. Similarly to MRE11A, the LD groups were not contiguous blocks. LD group 1 was divided by X9 and LD group 2 was divided by X15. Each of the LD groups could be represented by a single SNP resulting in the tSNP set (X2, X9, X14, and X21). Table 3 shows the LD groups and tSNPs for ATM, MRE11A and XRCC4 using our panel of 94 subjects and using the 45 Caucasian subjects from Applied Biosystems[30]. For these three genes, we observed the same number of LD groups containing precisely the same SNPs for both data sets. The difference between the results was in the number of potential tSNPs for each LD group. For the majority of LD groups, the potential tSNPs using Applied Biosystems data were a subset of those from our data. This is perhaps expected, because our sample size was more than double their size and is therefore capable of better resolution.
Table 3

Comparison of LD groups for the Utah breast cancer cases and controls with Applied Biosystems (ABI) data*

GeneGroupUtah breast cancer case/control SNPsUtah potential tSNPsUtah % variance captured/groupABI SNPsABI potential tSNPsABI % variance captured/group
ATM1A1-A14A1-A3, A5, A6, A8-A10, A12-A1498.8%A1-A14A1-A3, A5, A8, A13, A1498.2%
MRE111M1, M2, M3, M5, M6M1, M2, M3, M5, M6100%M1, M2, M3, M5, M6M1, M2, M3, M5, M6100%
2M7, M9, M10M1084.3%M7, M9, M10M7, M9, M10100%
3M11M11100%M11M11100%
4M4, M8M4, M882.2%M4, M8M4, M883.9%
XRCC41X1-X8, X10-X13X2-X3, X5-X8, X10-X1195.3%X1-X8, X10-X13X2, X3, X10, X11, X1396.0%
2X14, X16-X18X1491.6%X14, X16-X18X14, X17, X1893.5%
3X19-X21X19-X21100%X19-X21X19-X21100%
4X9, X15X9, X1597.4%X9, X15X9, X1596.8%

*We used Applied Biosystems' validated SNP genotype data for 45 Caucasian subjects.

Table 4 lists the haplotypes, LD group designation, potential tSNPs, and tSNP selected per group for NBS1 and RAD50 using the Applied Biosystems' data. For NBS1, 6 haplotypes overall were observed and all 6 haplotypes had a frequency > 0.01. Using the PCA method, two LD groups were identified and accounted for 93.8% of the variance. Two tSNPs were sufficient to tag these groups (N1, N2). However, N5 could replace N2 with no reduction in the variance explained. For the RAD50 gene, in order to include two available rare SNPs in the analysis, we lowered the haplotype acceptance threshold to 0.009. We observed a total of 14 haplotypes, 10 with a frequency greater than 0.01. Using the PCA method, we identified three LD groups, which accounted for 91.5% of the variance. Similarly to MRE11A and XRCC4, the LD groups for RAD50 were not contiguous blocks. Three tSNPs were sufficient to tag the groups (R1, R3, and R10), although R5 could replace R1 and R6 could replace R3 with no loss of variance explained.
Table 4

Haplotypes with frequency>0.01, LD group characterization and tSNP selected using data from Applied Biosystems*

a. NBS1
N1N2N3†N4†N5†Frequency

111110.55
222220.26
122220.10
122120.03
221120.03
121120.03
LD Group and tSNP Designation
2‡1‡111
b. RAD50

R1†R2R3†R4†R5R6†R7R8R9R10Frequency

11111111110.50
11121111110.21
12221222210.11
11111111120.08
12221222220.05
12221221220.01
12221222120.01
11121221210.01
11111222210.01
22122121120.009§
LD Group and tSNP Designation
2‡11‡1211113‡

*We used Applied Biosystems' validated SNP genotype data for 45 Caucasian subjects.

† Allele designations have been changed from that listed by ABI to conform to the convention 1 = common allele, 2 = rare allele.

‡ tSNP selected / group

§ The haplotype with a frequency 0.009 was also analyzed to allow inclusion of rare variants at R1 and R5.

For ATM, MRE11A, and XRCC4, we compared haplotypes and LD structure between the breast cancer cases and controls. For ATM and XRCC4 no difference in the LD structure was observed when cases and controls were analyzed separately. For the MRE11A gene differences in LD structure were noted, however, these were minor and likely attributable to small sample size since the differences were driven by 3 rare haplotypes (frequency = 0.02).

Discussion

Identification of the most informative markers to use in a large-scale association analysis for studies of complex disease, such as breast cancer, is critical to the success of the study. The key to this process is to select SNPs that are most informative about the underlying haplotype structure in a population of interest. As haplotype based designs have been suggested as being more powerful than the single-allele approach for association studies[8], a haplotype-based approach should result in more accurate and definitive findings. In this study, we have described haplotypes and characterized the LD structure of the ATM, MRE11A, and XRCC4 genes using a panel of 94 subjects, including breast cancer cases from high-risk breast cancer families as well as controls. Further, we identified tSNPs that can be used in future haplotype-based association studies. A similar analysis was performed for NBS1 and RAD50 using publicly available genotype data. We identified, using Principal Components Analysis[13], a single LD group for ATM, four noncontiguous LD groups for MRE11A, two LD groups for NBS1, three noncontiguous LD groups for RAD50, and four noncontiguous LD groups for XRCC4. In each case, the LD groups captured greater than 90% of the variance of the total SNPs available from Applied Biosystems across the gene. Furthermore for each gene, we present tSNPs that could be selected to represent the gene. It is of interest that the LD structure for three of these five DNA repair genes did not conform to the haplotype block model, that is, that the LD groups did not contain contiguous SNPs. This was true whether the genotyping data came from our own study or from Applied Biosystems. Although we did not directly sequence these genes to identify all possible variants, the discontinuity we observed illustrates that the underlying LD structure cannot conform to contiguous haplotype blocks. A more flexible LD group representation (as supported under principle components analysis) fit the data better and appears to be stable to differences in minor allele frequency. Similar findings of a complex pattern of LD structure were recently reported in a high-resolution study of the ELAC2 gene[15]. Our results suggest that when studying small genomic regions and low frequency variants (<0.2), mutation is an important dynamic in LD structure, and the simple recombination-only model used in classical haplotype block methods does not fit the data well and hence will lead to a poor selection of tSNPs. Due to the stability of the results for ATM, MRE11A and XRCC4, we pursued two additional DNA repair genes of interest (i.e., NBS1 and RAD50). Applied Biosystems provides freely-available genotyping data for four ethnically diverse populations of 45 subjects in each, therefore, even with limited funds, the haplotype structure and selection of tSNPs can be estimated for a study prior to any genotyping costs. However, caution must be used if this option is exercised as one's population must be one of Applied Biosystems' ethnic cohorts (i.e., Caucasian, African American, Chinese, or Japanese) and our experience is that occasionally errors exist in the data. Of the genes studied here, only ATM has previously been studied in any depth for LD structure. The reason that ATM has received so much attention is that patients with the recessive disease ataxia-telangiectasia, due to a mutation in the ATM gene, have a 100-fold increased risk of cancer[33,34] and obligate heterozygous carriers of ATM mutations may have an increased risk of cancer, particularly breast cancer [35-39], although this finding is controversial[40,41]. Extensive LD across the ATM gene has previously been reported [42-44], and sequence analysis reveals that ATM polymorphisms are relatively rare resulting in low overall sequence diversity[44]. Thus, it follows that only a small number of haplotypes have been found, particularly in Caucasian populations of European descent. Thorstenson et al [44] predicted seven haplotypes in populations throughout the world, only three of which were found in Europeans or the Americans. Bonnen et al [43] identified 22 unique haplotypes, seven of which occurred in Caucasians, and only five of these occurred at a frequency of greater than 5% among Caucasians. We observed five haplotypes for the ATM gene, but only two of these could be considered common haplotypes (>0.01) and together accounted for 96% of all chromosomes. A recently published study using those haplotypes defined by Thorstenson et al[44] and Bonnen et al[43] identified five haplotype tagging-SNPs that were necessary to capture all of these haplotypes with a frequency >1%[45]. In our study, which is limited to Applied Biosystems' validated SNPs, we found that one tSNP was sufficient to represent 98.8% of the total genetic variance for all the SNPs available. The results of our study differed from these other studies due most likely to differences in the minor allele frequency range of the SNPs utilized. Our minor allele frequency for the 14 SNPs studied in the ATM gene varied minimally from 0.43 – 0.45. Thorstenson et al[44] and Bonnen et al[43] included 2 and 3 SNPs, respectively, that had minor allele frequencies <0.25. Population structure exists in SNP-allele frequencies[43] and as observed by the results of this study, exclusion of rarer SNPs has an impact on the frequency of haplotypes that are observed. Comparison of haplotype and LD structure between cases and controls for ATM, MRE11A, and XRCC4 indicated that LD structure for these genes were similar in both groups. Results for ATM and XRCC4 were identical and only minor differences in LD structure were noted for MRE11A due to three rare haplotypes. A recent study has reported that rare haplotypes may be important for disease susceptibility and in their study these rare haplotypes had significant effects on their phenotype of interest[46]. Therefore, if rare haplotypes are of interest to an investigator, it may be prudent to characterize LD in both cases and controls and select tSNPs that comprehensively cover the diversity of both groups. However, most studies to date have empirically found that LD structure is similar across phenotype[1,47]. If major differences in LD structure were to exist, this would have a profound effect on guidelines for tSNP selection and for application of projects such as the HapMap[48,49]. Some limitations are inherent in this study and must be pointed out. First, we did not sequence our genes of interest and thus all of the genetic diversity within these genetic regions may not be captured. Our results must be interpreted in light of this. The gold standard is to identify all variants within a gene and select a subset of tSNPs from this set. It would be interesting to evaluate the robustness of our findings using sequence data. However, the SNPs examined were relatively evenly spaced, on the order of 1 SNP every 10 kb, and our results are important as they illustrate how smaller budget studies can best select tSNPs. Second, our sample size was modest (188 chromosomes), although larger than other previous studies examining LD and tSNPs [26-29]. Finally, haplotype block and haplotype-tagging SNP analyses have been suggested to only be reliable when markers are dense, otherwise marker sets have considerable loss of information[50]. This result may extend to PCA methods, however, the matrix decomposition algorithm used has been suggested to be stable with regards to varying levels of marker density[18].

Conclusion

In conclusion, we have described haplotypes, linkage disequilibrium structure, and identified tSNPs from all available Applied Biosystems' validated SNPs in ATM, MRE11A, NBS1, RAD50, and XRCC4 genes in a Caucasian population. As has been found for other genes, we identified LD structures that did not conform to contiguous haplotype block structures. This illustrates the importance of using flexible methods, such as matrix decomposition, that allow for multiple population dynamics such as recombination, mutation and selection. Although the gold standard for SNP characterization across a candidate gene is sequencing to identify all variants, we describe a low-budget means to characterize the LD structure and select tSNPs using publicly available data. Comprehensive characterization of the LD structure at genes of interest will be essential for future, effective association studies.

Electronic database information

The data from the 94 breast cancer case and control subjects for these tables is publicly available at under Supplemental Materials to Publication. On request from Dr. Nicola Camp a username and password to access the data will be given.

Competing interests

The author(s) declare that they have no competing interests.

Authors' contributions

KAB assisted in the study design, performed the genotyping, and drafted the manuscript. NJC conceived of the study and its design and helped to draft the manuscript. All authors read and approved the final manuscript.

Pre-publication history

The pre-publication history for this paper can be accessed here:
  43 in total

Review 1.  The hallmarks of cancer.

Authors:  D Hanahan; R A Weinberg
Journal:  Cell       Date:  2000-01-07       Impact factor: 41.582

2.  Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21.

Authors:  N Patil; A J Berno; D A Hinds; W A Barrett; J M Doshi; C R Hacker; C R Kautzer; D H Lee; C Marjoribanks; D P McDonough; B T Nguyen; M C Norris; J B Sheehan; N Shen; D Stern; R P Stokowski; D J Thomas; M O Trulson; K R Vyas; K A Frazer; S P Fodor; D R Cox
Journal:  Science       Date:  2001-11-23       Impact factor: 47.728

3.  Global analysis of ATM polymorphism reveals significant functional constraint.

Authors:  Y R Thorstenson; P Shen; V G Tusher; T L Wayne; R W Davis; G Chu; P J Oefner
Journal:  Am J Hum Genet       Date:  2001-07-03       Impact factor: 11.025

4.  Haplotype tagging for the identification of common disease genes.

Authors:  G C Johnson; L Esposito; B J Barratt; A N Smith; J Heward; G Di Genova; H Ueda; H J Cordell; I A Eaves; F Dudbridge; R C Twells; F Payne; W Hughes; S Nutland; H Stevens; P Carr; E Tuomilehto-Wolf; J Tuomilehto; S C Gough; D G Clayton; J A Todd
Journal:  Nat Genet       Date:  2001-10       Impact factor: 38.330

5.  High-resolution haplotype structure in the human genome.

Authors:  M J Daly; J D Rioux; S F Schaffner; T J Hudson; E S Lander
Journal:  Nat Genet       Date:  2001-10       Impact factor: 38.330

6.  Linkage disequilibrium in the human genome.

Authors:  D E Reich; M Cargill; S Bolk; J Ireland; P C Sabeti; D J Richter; T Lavery; R Kouyoumjian; S F Farhadian; R Ward; E S Lander
Journal:  Nature       Date:  2001-05-10       Impact factor: 49.962

Review 7.  DNA double-strand breaks: signaling, repair and the cancer connection.

Authors:  K K Khanna; S P Jackson
Journal:  Nat Genet       Date:  2001-03       Impact factor: 38.330

8.  Cancer in patients with ataxia-telangiectasia and in their relatives in the nordic countries.

Authors:  J H Olsen; J M Hahnemann; A L Børresen-Dale; K Brøndum-Nielsen; L Hammarström; R Kleinerman; H Kääriäinen; T Lönnqvist; R Sankila; N Seersholm; S Tretli; J Yuen; J D Boice; M Tucker
Journal:  J Natl Cancer Inst       Date:  2001-01-17       Impact factor: 13.506

9.  ATM-heterozygous germline mutations contribute to breast cancer-susceptibility.

Authors:  A Broeks; J H Urbanus; A N Floore; E C Dahler; J G Klijn; E J Rutgers; P Devilee; N S Russell; F E van Leeuwen; L J van 't Veer
Journal:  Am J Hum Genet       Date:  2000-02       Impact factor: 11.025

10.  Dominant negative ATM mutations in breast cancer families.

Authors:  Georgia Chenevix-Trench; Amanda B Spurdle; Magtouf Gatei; Helena Kelly; Anna Marsh; Xiaoqing Chen; Karen Donn; Margaret Cummings; Dale Nyholt; Mark A Jenkins; Clare Scott; Gulietta M Pupo; Thilo Dörk; Regina Bendix; Judy Kirk; Katherine Tucker; Margaret R E McCredie; John L Hopper; Joseph Sambrook; Graham J Mann; Kum Kum Khanna
Journal:  J Natl Cancer Inst       Date:  2002-02-06       Impact factor: 13.506

View more
  6 in total

1.  Genetic variations in the homologous recombination repair pathway genes modify risk of glioma.

Authors:  Haishi Zhang; Yanhong Liu; Keke Zhou; Chengcheng Zhou; Renke Zhou; Chunxia Cheng; Qingyi Wei; Daru Lu; Liangfu Zhou
Journal:  J Neurooncol       Date:  2015-10-29       Impact factor: 4.130

2.  PedGenie: an analysis approach for genetic association testing in extended pedigrees and genealogies of arbitrary size.

Authors:  Kristina Allen-Brady; Jathine Wong; Nicola J Camp
Journal:  BMC Bioinformatics       Date:  2006-04-18       Impact factor: 3.169

3.  Pairwise shared genomic segment analysis in three Utah high-risk breast cancer pedigrees.

Authors:  Zheng Cai; Alun Thomas; Craig Teerlink; James M Farnham; Lisa A Cannon-Albright; Nicola J Camp
Journal:  BMC Genomics       Date:  2012-11-28       Impact factor: 3.969

4.  Non-homologous end-joining pathway associated with occurrence of myocardial infarction: gene set analysis of genome-wide association study data.

Authors:  Jeffrey J W Verschuren; Stella Trompet; Joris Deelen; David J Stott; Naveed Sattar; Brendan M Buckley; Ian Ford; Bastiaan T Heijmans; Henk-Jan Guchelaar; Jeanine J Houwing-Duistermaat; P Eline Slagboom; J Wouter Jukema
Journal:  PLoS One       Date:  2013-02-15       Impact factor: 3.240

5.  Association between the NBS1 E185Q polymorphism and cancer risk: a meta-analysis.

Authors:  Meixia Lu; Jiachun Lu; Xiaobo Yang; Miao Yang; Hao Tan; Bai Yun; Luyuan Shi
Journal:  BMC Cancer       Date:  2009-04-24       Impact factor: 4.430

6.  Analysis of high-density single-nucleotide polymorphism data: three novel methods that control for linkage disequilibrium between markers in a linkage analysis.

Authors:  Kristina Allen-Brady; Benjamin D Horne; Alka Malhotra; Craig Teerlink; Nicola J Camp; Alun Thomas
Journal:  BMC Proc       Date:  2007-12-18
  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.