Literature DB >> 15588478

The impact of sample size and marker selection on the study of haplotype structures.

Xiao Sun1, J Claiborne Stephens, Hongyu Zhao.   

Abstract

Several studies of haplotype structures in the human genome in various populations have found that the human chromosomes are structured such that each chromosome can be divided into many blocks, within which there is limited haplotype diversity. In addition, only a few genetic markers in a putative block are needed to capture most of the diversity within a block. There has been no systematic empirical study of the effects of sample size and marker set on the identified block structures and representative marker sets, however. The purpose of this study was to conduct a detailed empirical study to examine such impacts. Towards this goal, we have analysed three representative autosomal regions from a large genome-wide study of haplotypes with samples consisting of African-Americans and samples consisting of Japanese and Chinese individuals. For both populations, we have found that the sample size and marker set have significant impact on the number of blocks and the total number of representative markers identified. The marker set in particular has very strong impacts, and our results indicate that the marker density in the original datasets may not be adequate to allow a meaningful characterisation of haplotype structures. In general, we conclude that we need a relatively large sample size and a very dense marker panel in the study of haplotype structures in human populations.

Entities:  

Mesh:

Substances:

Year:  2004        PMID: 15588478      PMCID: PMC3525083          DOI: 10.1186/1479-7364-1-3-179

Source DB:  PubMed          Journal:  Hum Genomics        ISSN: 1473-9542            Impact factor:   4.639


Introduction

Human DNA sequence variation accounts for a large fraction of the observed phenotypic differences between individuals, including susceptibility to disease. Sites in the DNA sequence where individuals differ at a single DNA base are called single nucleotide polymorphisms (SNPs). SNPs represent by far the most common source of genetic variation, and it is estimated that the human genome may contain over 10 million SNPs, about one in every 300 bases [1-3]. A haplotype is the specific combination of marker alleles within a region of a chromosome. Tightly linked SNPs are not independent on a given chromosome, but tend to be associated with each other across small regions. This tendency is called linkage disequilibrium. Empirical data suggest that relatively few of the theoretically possible haplotypes are observed at significant frequencies for a set of SNPs within a very short physical distance [4]. Genome-wide disease association studies using SNPs and haplotypes may be the most promising approach to identifying genetic variants underlying complex diseases, and recent technological advances have made high-throughput sequencing and genotyping possible. With the aim of speeding the discovery of genes related to common illnesses, as well as preventing adverse drug reactions, the National Institutes of Health launched the international HapMap Project to organise what is known about genetic variation within the human genome. One objective of this project was to understand haplotype structures throughout the human genome. Recent studies [5-8] have shown that haplotypes may be divided into discrete blocks, within which there is limited haplotype diversity. For example, Gabriel and colleagues systematically examined 51 autosomal regions in four populations and found that the minimal span of the blocks averaged 9 kb in Yoruban and African-American samples, with a range of < 1 kb up to 94 kb, whereas the average in European and Asian samples was 18 kb, with a range of < 1 kb to 73 kb [8]. Furthermore, in a study of the class II region of the major histocom-patibility complex, researchers found that the haplotype blocks were flanked by precisely localised recombination hotspots, leading to the hypothesis that 'punctuate recombination' could be the molecular mechanism underlying block structure [9]. One attractive feature of statistical association methods based on haplotype blocks is the idea that, although blocks may contain a large number of SNPs, only a few SNPs are needed to uniquely identify the haplotypes in a block. This much smaller subset of SNPs, which are termed 'haplotype tagging SNPs' (htSNPs), can be used to explain a large proportion of diversity. Tag SNPs make it unnecessary to genotype all the SNPs in a given region and therefore represent an economic approach to genome-wide association studies. Zhang et al. [10] studied the power of different association tests in a variety of disease models by using Tag SNPs and concluded that the genotyping efforts can be significantly reduced without much loss of power. Despite these findings of block-like structures in the human genome, there is no universally accepted definition of haplotype blocks. In fact, each study has its own definition. Different definitions of haplotype blocks include: (1) a continuous set of markers in which the average pairwise D' is greater than some predetermined threshold;[11] (2) a region where a small number of common haplotypes account for the majority of the chromosomes;[6,12] (3) regions with both limited haplotype diversity and strong linkage disequilibrium but allowing several markers to be skipped;[7] and (4) regions with absolutely no evidence for historical recombination between any pair of SNPs [13]. Therefore, block definition remains subjective and arbitrary, and it is not yet clear how to compare haplotype blocks between studies. Furthermore, each method varies in terms of the SNP minor allele frequency threshold used. The most appropriate definition may depend on how the inferred blocks are used, such as whether the identified blocks will be used to infer recombination hot spots or to identify regions that are associated with disease. Moreover, recent studies suggest that there may be non-trivial departures from block structures [14]. Despite extensive empirical studies on haplotype blocks, one issue that has not been well addressed is the impact of sample size on the assessment of haplotype block structure. In some previous studies, blocks were identified based on a small set of chromosomes and may not provide a comprehensive representation of the whole population. For example, the chromosome 21 study only examined 20 independent chromosomes from diverse populations [6]. The largest dataset reported to date contains samples from 275 individuals, leading to 400 independent chromosomes [8]. It is not known, however, how many individuals are sufficient to get reliable characterisation of haplotype block structures. In addition, the effect of SNP marker selection on the inferred haplotype block structures has not been well studied either. To date, the density of SNPs analysed has ranged from approximately one marker per kilobase [6,9] to one marker per 15 kb [7]. Published results suggest that a denser marker panel tends to give rise to a larger number of shorter blocks,[6] whereas a sparser marker panel generates fewer longer blocks [7,8]. Furthermore, the block boundaries and Tag SNPs may substantially change, even if we keep the SNP density constant but select a different set of SNPs. In a recent study by Wall and Pritchard,[15] they found using simulations that marker density is more important than sample size for inferring haplotype structures. One of the objectives of the HapMap project is to understand population differences in their haplotype structures. It is important to compare haplotype blocks in different populations and to examine whether the same set of Tag SNPs can be used in different populations to capture haplotype diversities. Existing data have shown that the blocks in a Yoruban population from Nigeria are generally the same as, but shorter than, those in European and Asian populations [8]. If different populations indeed share similar haplotype block structures, one broad map would be sufficient. If the populations are different enough, however, it might be necessary to construct population-specific haplotype maps. These are very important questions requiring answers, and the data collected from the HapMap project may help us to gain a better understanding of these issues. In the current study, we focused on the impact of sample size and SNP marker selection on the haplotype block partitioning and Tag SNP selections in a sample consisting of African-Americans and a sample consisting of Japanese and Chinese people.

Materials and methods

Datasets

SNP genotype data of 51 autosomal regions that collectively span ~0.4 per cent of the human genome from African-American samples (called population B in the original study) and from Japanese and Chinese samples (called population C in the original study) were downloaded from the following website: http://www.genome.wi.mit.edu/mpg/hapmap/ hapstruc.html. A detailed description of the data can be found in the paper by Gabriel et al.,[8] Population B contains 50 samples from unrelated African-Americans and population C includes 42 samples from unrelated individuals of Japanese and Chinese origin. This is the largest public dataset available to date. In order to examine the impact of sample size and marker selection on haplotype block boundaries and Tag SNPs, three regions from the above database were chosen in our study. Region 52a spans 237.22 kb on chromosome 22 and contains 46 SNPs for population B and 45 SNPs for population C. Region 42a is 409.92 kb long and is located on chromosome 15, it includes 100 SNPs for population B and 99 SNPs for population C. Region 31a is the shortest of the three. It is on chromosome 9, is 181.98 kb long and has 23 SNPs for population B and 25 SNPs for population C. The density of the markers in these three regions is one SNP per 4 to 8kb. We chose these three regions because they represent small, medium and large numbers of SNPs within a given region in this dataset.

Haplotype block partitioning and Tag SNP selections

To obtain haplotype boundaries and Tag SNPs, we used 'HapBlock', a dynamic programming algorithm for haplotype block partitioning with minimum number of Tag SNPs developed by Zhang et al.,[12] The following parameters were used in our analysis: the input data type was genotype data; the method for block definition was the one used in Patil et al.,[6] the threshold to define the block was set at 0.8; the threshold to define the common haplotype was set at 0.099; the method to find the Tag SNPs was the haplotype block diversity introduced by Johnson et al.;[16] and the threshold to find the Tag SNPs was set at 0.9.

Impact of sample size

To examine the impact of sample size on the identified hap-lotype structures, we randomly selected 10, 20, 30 and 40 individuals out of 50 African-Americans in population B and repeated the random selection 100 times. For each randomly selected sample, we took their SNP genotype data in regions 52a, 42a and 31a and ran the HapBlock program to identify the number of blocks, the block boundaries and the Tag SNPs for each block. The same procedures were applied to population C, which included 42 unrelated Japanese and Chinese people. These results were used to assess the effect of sample size on haplotype block structures.

Impact of marker selection

Random marker selection

To study the impact of marker selection on the assessment of haplotype block structures, we carried out random selection on SNP markers for the three regions. Because region 52a contains 46 SNPs for population B (African-American) and 45 SNPs for population C ( Japanese and Chinese), we randomly selected 10, 20, 30 and 40 SNPs for each population and repeated random selection 100 times. For region 42a, which includes 100 SNPs for population B and 99 SNPs for population C, we randomly selected 20, 40, 60 and 80 SNPs for each population and repeated this 100 times. Similarly for region 31a, where there are 23 SNPs for population B and 25 SNPs for population C, we randomly selected 5, 10, 15 and 20 SNPs for each population and repeated this 100 times. For each marker set selected, we ran the HapBlock program to identify the total number of blocks, the block boundaries and the Tag SNPs for each block.

Sequential marker selection

Since an SNP could only be a boundary marker in the event that it was in the subset chosen, comparing block boundaries among totally different sets of SNP markers is difficult. In order to further investigate the underlying mechanism explaining why higher density markers usually give rise to more, smaller blocks than is the case for lower density markers, we applied a sequential marker selection method to 46 SNPs on chromosome region 52a from the African-American population. First, we randomly selected ten SNPs out of the original 46 SNPs to identify block structures. Secondly, we randomly selected another ten SNPs out of the 36 remaining SNPs and combined them with the previously selected 10 SNPs to identify block structures. Then, we randomly selected another 10 SNPs out of the 26 remaining SNPs and combined them with the previously selected 20 SNPs to do the analysis. Lastly, we randomly selected 10 more SNPs out of the 16 remaining SNPs and combined them with the previously selected 30 SNPs to identify block structures. This simulation approach ensured that the lower density marker set is a subset of the higher density marker set. The whole selection process was repeated 100 times. Comparisons of the block boundary results were based on these results.

Block boundary and Tag SNP comparisons

In the comparison of block boundaries, we counted the frequency of each SNP that was used as the starting or ending position of the block boundaries in the results based on 100 randomly selected samples. Comparing Tag SNPs is more complicated than comparing block boundaries because the Tag SNPs are not unique for each block. In other words, there is usually more than one set of Tag SNPs (see Appendix A for a Tag SNP example) in a block. Therefore, to incorporate the multiplicities of the Tag SNPs, for the results from each randomly selected sample, we counted the frequency of each SNP that was selected as a Tag SNP across all Tag SNP sets and divided this frequency by the number of Tag SNP sets in each block and the total number of blocks in the region. Based on the 100 randomly selected samples, we then calculated the mean weighted frequency for each SNP.

Results

Haplotype block partitioning based on the observed data

Using the observed genotype data, region 52a was partitioned into nine blocks with a total of 19 Tag SNPs for the African-Americans (population B) and six blocks with a total of ten Tag SNPs for the Japanese and Chinese (population C). Region 42a, however, was divided into 16 blocks with a total of 33 Tag SNPs for African-Americans and 14 blocks with a total of 22 SNPs for Japanese and Chinese. As with region 31a, both populations had three blocks and six Tag SNPs (see appendix for detailed block information using region 52a as an example). Inspection of all 51 autosomal regions in the Gabriel et al. data set reveals that, in general, chromosomal regions were partitioned into more blocks and had more tag SNPs based on the African-American samples than those based on the Japanese and Chinese samples. In addition, for both populations, the total number of Tag SNPs increases as the number of blocks increases (data not shown). Table 1 summarises the results of the number of blocks when we randomly selected 10, 20, 30 and 40 individuals 100 times from each population. For example, in the upper left panel of Table 1, column 'ran10' corresponds to the results based on 100 simulated datasets consisting of ten individuals. The sum did not add up to 100 because the HapBlock program we used for block partitioning would tend to fail when we had few individuals or few markers included in the sample. Among the 99 simulated samples with HapBlock results, region 52a was partitioned into five blocks 17 times, six blocks 55 times, seven blocks 21 times, and eight blocks six times. If we focus on the trend of modes for each sample size based on 100 simulated samples, it is apparent that the number of blocks generally increases as we include more individuals in the sample. With the original 50 African-Americans, region 52a was partitioned into nine blocks. When we included only ten people, most of the times we obtained six blocks for this region. When we increased the sample size to 20 people, most of the times the region was partitioned into eight blocks. When the sample size grew to 30 and 40, most of the times the region was partitioned into nine blocks, the same as that in the original dataset. Therefore, a minimum of 30 individuals is needed for this given set of markers to infer the number of blocks.
Table 1

Frequency of the number of blocks in which the number of individuals is varied in simulations

*Sum does not always add up to 100. See results part for detailed explanation.

Frequency of the number of blocks in which the number of individuals is varied in simulations *Sum does not always add up to 100. See results part for detailed explanation. We also examined the sample size effect on the total number of Tag SNPs associated with block partitioning, and the results are summarised in Table 2. Similar to the results summarised in Table 1, the total number of Tag SNPs increases as the sample size increases. A shorter region with fewer SNPs, such as region 31a, seems to require fewer individuals than a longer region with more SNPs, such as regions 52a and 42a, to identify a similar number of Tag SNPs as the original sample. In fact, the inferred number of blocks and Tag SNPs did not level off in region 42a in either population, indicating that our sample size may not have been adequate to define a set of Tag SNPs for this region. Statistical comparisons based on t-tests or Wilcoxon tests also indicated that there was a significant difference between the inferred block structures from samples of size 30 and those from samples of size 40 in region 52a.
Table 2

Frequency of the total number of Tag SNPs when the number of individuals is varied in simulations

* Sum does not always add up to 100. See results part for detailed explanation.

Frequency of the total number of Tag SNPs when the number of individuals is varied in simulations * Sum does not always add up to 100. See results part for detailed explanation. Using region 52a as an example, Figure 1 summarises the frequency of each SNP being used as block boundary against its chromosomal location across 100 simulated samples with 10, 20, 30 and 40 individuals, respectively. Although block boundaries differed from one sample to another (for samples consisting of the same number of individuals), when we pooled the results of 100 random selections, the overall patterns were very similar for samples of different sizes. The block boundaries in region 52a from the Japanese and Chinese samples were more clear-cut than those from the African-American samples. The high frequency bars matched block boundary positions from those identified in the original 42 Japanese and Chinese people perfectly.
Figure 1

Frequency of each SNP being selected as block boundary against its chromosomal location in individual selection for Region 52a. (a) African-American. (b) Japanese & Chinese. + indicates the position of block boundaries using the original sample.

Frequency of each SNP being selected as block boundary against its chromosomal location in individual selection for Region 52a. (a) African-American. (b) Japanese & Chinese. + indicates the position of block boundaries using the original sample. Detailed Tag SNP comparisons are more difficult than block boundary comparisons mainly because Tag SNPs are not unique. Usually there is more than one set of Tag SNPs in a block (see Appendix A for tag SNP example). In order to examine the impact of sample size on Tag SNP selections, we calculated the weighted frequency of each SNP being selected as a Tag SNP and plotted it against the SNPs in the combined order (see Appendix B for SNPs in the combined order due to differences between SNP sets between the two populations). Figure 2 summarises the results for Tag SNP selections for different sample sizes (10, 20, 30 and 40) and it can clearly be seen that similar sets of Tag SNPs were identified on average across all simulations for different sizes. Comparing these to the Tag SNPs from the original sample of 50 African-Americans, we found that they were almost identical, with the exception of SNP numbers 20 and 45. Both of these had a relatively high frequency of being selected as Tag SNPs using randomly selected samples, but they did not show up in the Tag SNP list using the original sample. In addition, we found that most of the Tag SNPs selected for the Japanese and Chinese population also appeared on the Tag SNP list for the African-American population, but not vice versa, indicating that Tag SNPs for the Japanese and Chinese population is largely a subset of those for the African-American population.
Figure 2

Weighted frequency of the selected Tag SNPs for region 52a when the number of individuals is varied in simulations. Arrows indicate those Tag SNPs scoring highest in the block using the original sample.

Weighted frequency of the selected Tag SNPs for region 52a when the number of individuals is varied in simulations. Arrows indicate those Tag SNPs scoring highest in the block using the original sample. Table 3 summarises the results of the number of blocks after we randomly selected: 10, 20, 30 and 40 SNPs for region 52a; 20, 40, 60 and 80 SNPs for region 42a; and 10, 15 and 20 SNPs for region 31a. Simulated samples consisting of a random selection of five SNPs for region 31a crashed the HapBlock program every time, and therefore no results from this part of the study are shown in Table 3. It is apparent from this Table that as we included more SNP markers in our sample, the number of blocks continued to grow, and there was evidence that the inferred haplotype structures would have continued to change if more markers had been included.
Table 3

Frequency of the number of blocks when the number of markers is varied in simulations

* Sum does not always add up to 100. See results part for detailed explanation.

Frequency of the number of blocks when the number of markers is varied in simulations * Sum does not always add up to 100. See results part for detailed explanation. As for the number of Tag SNPs, Table 4 clearly shows that, as we included more SNP markers in our sample, the total number of Tag SNPs also continued to grow, and did not show any sign of stabilisation.
Table 4

Frequency of the total number of Tag SNPs when the number of markers is varied in simulations

* Sum does not always add up to 100. See results part for detailed explanation.

Frequency of the total number of Tag SNPs when the number of markers is varied in simulations * Sum does not always add up to 100. See results part for detailed explanation. To answer the question of why denser marker sets usually give rise to more, smaller blocks than is the case for sparser marker sets, we studied chromosomal region 52a in the African-American population in detail. Figure 3 shows two representative patterns of how region 52a was partitioned into blocks using 10, 20, 30 and 40 sequentially-selected SNP markers, as well as the original 46 SNP marker set. Both marker sets of size 10 generated three blocks, with one set consisting of SNPs number 2, 8, 19, 21, 24, 30, 36, 42, 43 and 46, and the other set consisting of SNPs number 5, 6, 11, 15, 22, 23, 24, 25, 40 and 45. The blank space between the blocks is due to the lack of information regarding which block the SNPs belong to. By adding ten more SNPs to both marker sets, the two 20-marker sets generated five blocks, as shown in Figures 3a and 3b. As we included additional SNPs in the marker set within this region, i.e. as we increased the marker density, the number of blocks increased for two reasons. First, the old large blocks at lower densities are often broken into smaller pieces at higher density. For example, in Figure 3a, block 7 in marker set 40 became block 7 and block 8 when two more SNPs (numbers 35 and 40) were added to this region. Secondly, new blocks emerged from areas where there was a lack of information due to the lack of markers in the smaller marker set, such as block 3 in marker set 20 in Figure 3a and block 1 in marker set 30 in Figure 3b. The block boundaries were obviously not random but were in fact quite consistent across different marker sets.
Figure 3

Two representative samples of block partitions on region 52a using 46 original SNP markers from the African-American population and 10, 20, 30, 40 marker sets generated by partially fixed marker selection method. Each block is denoted by the shaded areas above. Labels such as '1', '2', etc on each shaded area indicate the position where a particular SNP was selected in the marker set, as well as which block it is on.

Two representative samples of block partitions on region 52a using 46 original SNP markers from the African-American population and 10, 20, 30, 40 marker sets generated by partially fixed marker selection method. Each block is denoted by the shaded areas above. Labels such as '1', '2', etc on each shaded area indicate the position where a particular SNP was selected in the marker set, as well as which block it is on.

Discussion

Our studies have clearly demonstrated that sample size and marker selection have a significant impact on the number of blocks and the total number of Tag SNPs inferred from a population sample. As we include more individuals in our sample, both the number of blocks and the total number of Tag SNPs increase. For a shorter region with fewer SNP markers, like region 31a (181.98kb, 23 SNPs), 20 people may be adequate to infer the haplotype patterns, while for a longer region with more SNP markers, such as 52a (237.22 kb, 46 SNPs) and 42a (409.92 kb, 100 SNPs), the required sample size may be 30 or more. The minimal sample size needed for a reliable haplotype structure inference clearly depends on the structure of the region being investigated. Although the patterns of block boundary and the set of Tag SNPs selected look very similar on average across all sample sizes, there is more variation from one simulated sample to another when the sample size is small. In addition, the set of Tag SNPs selected in the Japanese and Chinese population seems to be a subset of those in the African-American population [8]. This observation, however, may be due to the ascertainment of the specific set of markers being examined in the original study. Our marker selection results demonstrate that the number of blocks and the total number of Tag SNPs increase as more SNP markers are included. In addition, our results indicate that we would need to include more SNP markers in these regions in order to draw a valid conclusion on the number of blocks and Tag SNPs. The number of SNPs needed for a reliable inference on the haplotype structures may be a function of both the region and the specific population under study. Another issue to bear in mind is that our haplotypes were inferred from genotype data, not directly observed. Although the accuracy is quite high, greater than 80 per cent,[17] it is likely that the results may differ if different algorithms are used to reconstruct individual haplotypes. In addition, the inaccuracy in haplotype inference may contribute to the observed sample size effect. It should also be noted that the specific set of parameters used in the HapBlock program in our analysis to infer blocks and Tag SNPs does not affect the general patterns for the impact of the sample size and marker selection on the inferred haplotype structures (results not shown). In summary, our study indicates that sample size and marker selection have a significant impact on the inferred haplotype structures reflected in the haplotype blocks and Tag SNPs. Although haplotype blocks may be an over-simplistic representation of the haplotype structures,[14] we hypothesise that the impact would have been equally significant if we had used other approaches to analysing haplotype structures in the human genome. In order to draw valid conclusions on hap-lotype block structure, we need a relatively large sample size and a dense marker panel and we need to make adaptive adjustments according to the specific region and specific population to be studied.

Appendix A

Region 52a (Chromosome 22, 237.22 kb) Region 52a (Chromosome 22, 237.22 kb) † Tag SNPs are in combined order. * - 1 lines indicate the Tag SNPs that scored the highest in each block by the HapBlock program.
Table 5

Region 52a (Chromosome 22, 237.22 kb)

Population B (African-Americ)†
# of blocks 9total # of Tag SNPs 19
BlockIDNumTagSNPStartPosEndPosBlockSizeNumHap
Block_00013144100
Block_00022595100
Block_00032101910100
Block_0004220245100
Block_0005225284100
Block_0006229335100
Block_0007234352100
Block_0008236438100
Block_0009244463100
Tag SNP for block_0001Tag SNP for block_0005
1450.9582527290.91095
1450.95825 - 1*27290.91095 - 1*
Tag SNP for block_0002Tag SNP for block_0006
7100.9433931330.90816
9100.9059432330.90614
7100.94339 - 1*31330.90816 - 1*
Tag SNP for block_0003Tag SNP for block_0007
11150.9344636371
11170.9310736371 - 1*
11180.93234
11190.9346Tag SNP for block_0008
15200.9201340420.90181
17200.9156140420.90181 - 1*
18200.91455
19200.92754Tag SNP for block_0009
11190.9346 2 1*46470.93096
46480.92839
Tag SNP for block_000446470.93096 - 1*
25260.90927
25260.90927 2 1*
Population C (Japanese & Chinese)†
# of blocks = 6Total # of TagSNPs = 10
BlockIDNumTagSNPStartPosEndPosBlockSizeNumHap
Block_0001111184
Block_000222222184
Block_000322329784
Block_000423034584
Block_000523543984
Block_000614445284
Tag SNP for block_0001Tag SNP for block_0004
11.0000032360.90335
11.00000 - 1*34360.9073
34360.9073 - 1*
Tag SNP for block_0002
7150.96085Tag SNP for block_0005
7170.9608537390.93032
7180.9608537400.92593
7190.9608539460.93265
7150.96085 - 1*40460.92716
39460.93265 - 1*
Tag SNP for block_0003
25270.92191Tag SNP for block_0006
25310.90516480.95869
26270.92676480.95869 - 1*
26310.91236
27310.92645
26270.92676 - 1*

† Tag SNPs are in combined order.

* - 1 lines indicate the Tag SNPs that scored the highest in each block by the HapBlock program.

Table 6
SNP_IDCOMBINED ORDERPOP_B ORDERPOP_C ORDERCHROM_POSPOP_B BLOCKPOP_C BLOCK
11092411140077996Block_0001Block_0001
11092622240078865Block_0001Block_0002
1105253NA340104585NABlock_0002
11052743440112652Block_0001Block_0002
11052854540120338Block_0001Block_0002
11052965640120419Block_0002Block_0002
388476740131747Block_0002Block_0002
11758787840147031Block_0002Block_0002
11759098940147256Block_0002Block_0002
910371091040159355Block_0002Block_0002
8225611101140162170Block_0003Block_0002
1175751211NA40163399Block_0003NA
1175781312NA40163843Block_0003NA
394314131240163920Block_0003Block_0002
244215141340164108Block_0003Block_0002
11758016151440164192Block_0003Block_0002
11758117161540164236Block_0003Block_0002
11758218171640164840Block_0003Block_0002
11758319181740165138Block_0003Block_0002
3772820191840165262Block_0003Block_0002
1452321NA1940166038NABlock_0002
8202522202040166144Block_0004Block_0002
8439523212140168971Block_0004Block_0002
11758624222240173352Block_0004Block_0002
11759225232340182141Block_0004Block_0003
11759326242440182498Block_0004Block_0003
11759627252540207457Block_0005Block_0003
2672628262640218483Block_0005Block_0003
1689329272740229786Block_0005Block_0003
1169230282840241571Block_0005Block_0003
11760831292940242422Block_0006Block_0003
3293632303040249849Block_0006Block_0004
11756633313140250303Block_0006Block_0004
4413334323240250387Block_0006Block_0004
11756735333340256951Block_0006Block_0004
2313936343440257384Block_0007Block_0004
11868137353540283200Block_0007Block_0005
9986938363640283420Block_0008Block_0005
258439373740284703Block_0008Block_0005
11866940383840285521Block_0008Block_0005
11867441393940294440Block_0008Block_0005
3010942404040295018Block_0008Block_0005
11867643414140300494Block_0008Block_0005
883474442NA40303907Block_0008Block_0005
11867945434240303949Block_0008NA
8834846444340303993Block_0009Block_0005
374247454440314969Block_0009Block_0006
5448464540315218Block_0009Block_0006
  17 in total

1.  A new statistical method for haplotype reconstruction from population data.

Authors:  M Stephens; N J Smith; P Donnelly
Journal:  Am J Hum Genet       Date:  2001-03-09       Impact factor: 11.025

2.  Variation is the spice of life.

Authors:  L Kruglyak; D A Nickerson
Journal:  Nat Genet       Date:  2001-03       Impact factor: 38.330

3.  Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21.

Authors:  N Patil; A J Berno; D A Hinds; W A Barrett; J M Doshi; C R Hacker; C R Kautzer; D H Lee; C Marjoribanks; D P McDonough; B T Nguyen; M C Norris; J B Sheehan; N Shen; D Stern; R P Stokowski; D J Thomas; M O Trulson; K R Vyas; K A Frazer; S P Fodor; D R Cox
Journal:  Science       Date:  2001-11-23       Impact factor: 47.728

4.  Haplotype variation and linkage disequilibrium in 313 human genes.

Authors:  J C Stephens; J A Schneider; D A Tanguay; J Choi; T Acharya; S E Stanley; R Jiang; C J Messer; A Chew; J H Han; J Duan; J L Carr; M S Lee; B Koshy; A M Kumar; G Zhang; W R Newell; A Windemuth; C Xu; T S Kalbfleisch; S L Shaner; K Arnold; V Schulz; C M Drysdale; K Nandabalan; R S Judson; G Ruano; G F Vovis
Journal:  Science       Date:  2001-07-12       Impact factor: 47.728

5.  Haplotype tagging for the identification of common disease genes.

Authors:  G C Johnson; L Esposito; B J Barratt; A N Smith; J Heward; G Di Genova; H Ueda; H J Cordell; I A Eaves; F Dudbridge; R C Twells; F Payne; W Hughes; S Nutland; H Stevens; P Carr; E Tuomilehto-Wolf; J Tuomilehto; S C Gough; D G Clayton; J A Todd
Journal:  Nat Genet       Date:  2001-10       Impact factor: 38.330

6.  High-resolution haplotype structure in the human genome.

Authors:  M J Daly; J D Rioux; S F Schaffner; T J Hudson; E S Lander
Journal:  Nat Genet       Date:  2001-10       Impact factor: 38.330

7.  Intensely punctate meiotic recombination in the class II region of the major histocompatibility complex.

Authors:  A J Jeffreys; L Kauppi; R Neumann
Journal:  Nat Genet       Date:  2001-10       Impact factor: 38.330

8.  Linkage disequilibrium in the human genome.

Authors:  D E Reich; M Cargill; S Bolk; J Ireland; P C Sabeti; D J Richter; T Lavery; R Kouyoumjian; S F Farhadian; R Ward; E S Lander
Journal:  Nature       Date:  2001-05-10       Impact factor: 49.962

9.  Assessing the performance of the haplotype block model of linkage disequilibrium.

Authors:  Jeffrey D Wall; Jonathan K Pritchard
Journal:  Am J Hum Genet       Date:  2003-08-11       Impact factor: 11.025

10.  The sequence of the human genome.

Authors:  J C Venter; M D Adams; E W Myers; P W Li; R J Mural; G G Sutton; H O Smith; M Yandell; C A Evans; R A Holt; J D Gocayne; P Amanatides; R M Ballew; D H Huson; J R Wortman; Q Zhang; C D Kodira; X H Zheng; L Chen; M Skupski; G Subramanian; P D Thomas; J Zhang; G L Gabor Miklos; C Nelson; S Broder; A G Clark; J Nadeau; V A McKusick; N Zinder; A J Levine; R J Roberts; M Simon; C Slayman; M Hunkapiller; R Bolanos; A Delcher; I Dew; D Fasulo; M Flanigan; L Florea; A Halpern; S Hannenhalli; S Kravitz; S Levy; C Mobarry; K Reinert; K Remington; J Abu-Threideh; E Beasley; K Biddick; V Bonazzi; R Brandon; M Cargill; I Chandramouliswaran; R Charlab; K Chaturvedi; Z Deng; V Di Francesco; P Dunn; K Eilbeck; C Evangelista; A E Gabrielian; W Gan; W Ge; F Gong; Z Gu; P Guan; T J Heiman; M E Higgins; R R Ji; Z Ke; K A Ketchum; Z Lai; Y Lei; Z Li; J Li; Y Liang; X Lin; F Lu; G V Merkulov; N Milshina; H M Moore; A K Naik; V A Narayan; B Neelam; D Nusskern; D B Rusch; S Salzberg; W Shao; B Shue; J Sun; Z Wang; A Wang; X Wang; J Wang; M Wei; R Wides; C Xiao; C Yan; A Yao; J Ye; M Zhan; W Zhang; H Zhang; Q Zhao; L Zheng; F Zhong; W Zhong; S Zhu; S Zhao; D Gilbert; S Baumhueter; G Spier; C Carter; A Cravchik; T Woodage; F Ali; H An; A Awe; D Baldwin; H Baden; M Barnstead; I Barrow; K Beeson; D Busam; A Carver; A Center; M L Cheng; L Curry; S Danaher; L Davenport; R Desilets; S Dietz; K Dodson; L Doup; S Ferriera; N Garg; A Gluecksmann; B Hart; J Haynes; C Haynes; C Heiner; S Hladun; D Hostin; J Houck; T Howland; C Ibegwam; J Johnson; F Kalush; L Kline; S Koduru; A Love; F Mann; D May; S McCawley; T McIntosh; I McMullen; M Moy; L Moy; B Murphy; K Nelson; C Pfannkoch; E Pratts; V Puri; H Qureshi; M Reardon; R Rodriguez; Y H Rogers; D Romblad; B Ruhfel; R Scott; C Sitter; M Smallwood; E Stewart; R Strong; E Suh; R Thomas; N N Tint; S Tse; C Vech; G Wang; J Wetter; S Williams; M Williams; S Windsor; E Winn-Deen; K Wolfe; J Zaveri; K Zaveri; J F Abril; R Guigó; M J Campbell; K V Sjolander; B Karlak; A Kejariwal; H Mi; B Lazareva; T Hatton; A Narechania; K Diemer; A Muruganujan; N Guo; S Sato; V Bafna; S Istrail; R Lippert; R Schwartz; B Walenz; S Yooseph; D Allen; A Basu; J Baxendale; L Blick; M Caminha; J Carnes-Stine; P Caulk; Y H Chiang; M Coyne; C Dahlke; A Deslattes Mays; M Dombroski; M Donnelly; D Ely; S Esparham; C Fosler; H Gire; S Glanowski; K Glasser; A Glodek; M Gorokhov; K Graham; B Gropman; M Harris; J Heil; S Henderson; J Hoover; D Jennings; C Jordan; J Jordan; J Kasha; L Kagan; C Kraft; A Levitsky; M Lewis; X Liu; J Lopez; D Ma; W Majoros; J McDaniel; S Murphy; M Newman; T Nguyen; N Nguyen; M Nodell; S Pan; J Peck; M Peterson; W Rowe; R Sanders; J Scott; M Simpson; T Smith; A Sprague; T Stockwell; R Turner; E Venter; M Wang; M Wen; D Wu; M Wu; A Xia; A Zandieh; X Zhu
Journal:  Science       Date:  2001-02-16       Impact factor: 47.728

View more
  9 in total

1.  Genome-wide definitive haplotypes determined using a collection of complete hydatidiform moles.

Authors:  Yoji Kukita; Katsuyuki Miyatake; Renee Stokowski; David Hinds; Koichiro Higasa; Norio Wake; Toshio Hirakawa; Hidenori Kato; Takao Matsuda; Krishna Pant; David Cox; Tomoko Tahira; Kenshi Hayashi
Journal:  Genome Res       Date:  2005-11       Impact factor: 9.043

2.  Genetic variations in the SMAD4 gene and gastric cancer susceptibility.

Authors:  Dong-Mei Wu; Hai-Xia Zhu; Qing-Hong Zhao; Zhi-Zhong Zhang; Shi-Zhi Wang; Mei-Lin Wang; Wei-Da Gong; Ming Tan; Zheng-Dong Zhang
Journal:  World J Gastroenterol       Date:  2010-11-28       Impact factor: 5.742

3.  The effect of single-nucleotide polymorphism marker selection on patterns of haplotype blocks and haplotype frequency estimates.

Authors:  Michael Nothnagel; Klaus Rohde
Journal:  Am J Hum Genet       Date:  2005-10-19       Impact factor: 11.025

4.  A detailed Hapmap of the Sitosterolemia locus spanning 69 kb; differences between Caucasians and African-Americans.

Authors:  Bhaswati Pandit; Gwang-Sook Ahn; Starr E Hazard; Derek Gordon; Shailendra B Patel
Journal:  BMC Med Genet       Date:  2006-02-28       Impact factor: 2.103

5.  A model-based approach to selection of tag SNPs.

Authors:  Pierre Nicolas; Fengzhu Sun; Lei M Li
Journal:  BMC Bioinformatics       Date:  2006-06-15       Impact factor: 3.169

6.  HaploBlocker: Creation of Subgroup-Specific Haplotype Blocks and Libraries.

Authors:  Torsten Pook; Martin Schlather; Gustavo de Los Campos; Manfred Mayer; Chris Carolin Schoen; Henner Simianer
Journal:  Genetics       Date:  2019-05-31       Impact factor: 4.562

7.  Evaluation of sample size effect on the identification of haplotype blocks.

Authors:  Dai Osabe; Toshihito Tanahashi; Kyoko Nomura; Shuichi Shinohara; Naoto Nakamura; Toshikazu Yoshikawa; Hiroshi Shiota; Parvaneh Keshavarz; Yuka Yamaguchi; Kiyoshi Kunika; Maki Moritani; Hiroshi Inoue; Mitsuo Itakura
Journal:  BMC Bioinformatics       Date:  2007-06-14       Impact factor: 3.169

8.  Efficacy assessment of SNP sets for genome-wide disease association studies.

Authors:  Andreas Wollstein; Alexander Herrmann; Michael Wittig; Michael Nothnagel; Andre Franke; Peter Nürnberg; Stefan Schreiber; Michael Krawczak; Jochen Hampe
Journal:  Nucleic Acids Res       Date:  2007-08-28       Impact factor: 16.971

Review 9.  Genome-to-phenome research in rats: progress and perspectives.

Authors:  Amy L Zinski; Shane Carrion; Jennifer J Michal; Maria A Gartstein; Raymond M Quock; Jon F Davis; Zhihua Jiang
Journal:  Int J Biol Sci       Date:  2021-01-01       Impact factor: 6.580

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.