| Literature DB >> 15588478 |
Xiao Sun1, J Claiborne Stephens, Hongyu Zhao.
Abstract
Several studies of haplotype structures in the human genome in various populations have found that the human chromosomes are structured such that each chromosome can be divided into many blocks, within which there is limited haplotype diversity. In addition, only a few genetic markers in a putative block are needed to capture most of the diversity within a block. There has been no systematic empirical study of the effects of sample size and marker set on the identified block structures and representative marker sets, however. The purpose of this study was to conduct a detailed empirical study to examine such impacts. Towards this goal, we have analysed three representative autosomal regions from a large genome-wide study of haplotypes with samples consisting of African-Americans and samples consisting of Japanese and Chinese individuals. For both populations, we have found that the sample size and marker set have significant impact on the number of blocks and the total number of representative markers identified. The marker set in particular has very strong impacts, and our results indicate that the marker density in the original datasets may not be adequate to allow a meaningful characterisation of haplotype structures. In general, we conclude that we need a relatively large sample size and a very dense marker panel in the study of haplotype structures in human populations.Entities:
Mesh:
Substances:
Year: 2004 PMID: 15588478 PMCID: PMC3525083 DOI: 10.1186/1479-7364-1-3-179
Source DB: PubMed Journal: Hum Genomics ISSN: 1473-9542 Impact factor: 4.639
Frequency of the number of blocks in which the number of individuals is varied in simulations
*Sum does not always add up to 100. See results part for detailed explanation.
Frequency of the total number of Tag SNPs when the number of individuals is varied in simulations
* Sum does not always add up to 100. See results part for detailed explanation.
Figure 1Frequency of each SNP being selected as block boundary against its chromosomal location in individual selection for Region 52a. (a) African-American. (b) Japanese & Chinese. + indicates the position of block boundaries using the original sample.
Figure 2Weighted frequency of the selected Tag SNPs for region 52a when the number of individuals is varied in simulations. Arrows indicate those Tag SNPs scoring highest in the block using the original sample.
Frequency of the number of blocks when the number of markers is varied in simulations
* Sum does not always add up to 100. See results part for detailed explanation.
Frequency of the total number of Tag SNPs when the number of markers is varied in simulations
* Sum does not always add up to 100. See results part for detailed explanation.
Figure 3Two representative samples of block partitions on region 52a using 46 original SNP markers from the African-American population and 10, 20, 30, 40 marker sets generated by partially fixed marker selection method. Each block is denoted by the shaded areas above. Labels such as '1', '2', etc on each shaded area indicate the position where a particular SNP was selected in the marker set, as well as which block it is on.
Region 52a (Chromosome 22, 237.22 kb)
| Population B (African-Americ)† | ||||||
|---|---|---|---|---|---|---|
| # of blocks 9 | total # of Tag SNPs 19 | |||||
| BlockID | NumTagSNP | StartPos | EndPos | BlockSize | NumHap | |
| Block_0001 | 3 | 1 | 4 | 4 | 100 | |
| Block_0002 | 2 | 5 | 9 | 5 | 100 | |
| Block_0003 | 2 | 10 | 19 | 10 | 100 | |
| Block_0004 | 2 | 20 | 24 | 5 | 100 | |
| Block_0005 | 2 | 25 | 28 | 4 | 100 | |
| Block_0006 | 2 | 29 | 33 | 5 | 100 | |
| Block_0007 | 2 | 34 | 35 | 2 | 100 | |
| Block_0008 | 2 | 36 | 43 | 8 | 100 | |
| Block_0009 | 2 | 44 | 46 | 3 | 100 | |
| Tag SNP for block_0001 | Tag SNP for block_0005 | |||||
| 1 | 4 | 5 | 0.95825 | 27 | 29 | 0.91095 |
| 1 | 4 | 5 | 0.95825 - 1* | 27 | 29 | 0.91095 - 1* |
| Tag SNP for block_0002 | Tag SNP for block_0006 | |||||
| 7 | 10 | 0.94339 | 31 | 33 | 0.90816 | |
| 9 | 10 | 0.90594 | 32 | 33 | 0.90614 | |
| 7 | 10 | 0.94339 - 1* | 31 | 33 | 0.90816 - 1* | |
| Tag SNP for block_0003 | Tag SNP for block_0007 | |||||
| 11 | 15 | 0.93446 | 36 | 37 | 1 | |
| 11 | 17 | 0.93107 | 36 | 37 | 1 - 1* | |
| 11 | 18 | 0.93234 | ||||
| 11 | 19 | 0.9346 | Tag SNP for block_0008 | |||
| 15 | 20 | 0.92013 | 40 | 42 | 0.90181 | |
| 17 | 20 | 0.91561 | 40 | 42 | 0.90181 - 1* | |
| 18 | 20 | 0.91455 | ||||
| 19 | 20 | 0.92754 | Tag SNP for block_0009 | |||
| 11 | 19 | 0.9346 2 1* | 46 | 47 | 0.93096 | |
| 46 | 48 | 0.92839 | ||||
| Tag SNP for block_0004 | 46 | 47 | 0.93096 - 1* | |||
| 25 | 26 | 0.90927 | ||||
| 25 | 26 | 0.90927 2 1* | ||||
| # of blocks = 6 | Total # of TagSNPs = 10 | |||||
| BlockID | NumTagSNP | StartPos | EndPos | BlockSize | NumHap | |
| Block_0001 | 1 | 1 | 1 | 1 | 84 | |
| Block_0002 | 2 | 2 | 22 | 21 | 84 | |
| Block_0003 | 2 | 23 | 29 | 7 | 84 | |
| Block_0004 | 2 | 30 | 34 | 5 | 84 | |
| Block_0005 | 2 | 35 | 43 | 9 | 84 | |
| Block_0006 | 1 | 44 | 45 | 2 | 84 | |
| Tag SNP for block_0001 | Tag SNP for block_0004 | |||||
| 1 | 1.00000 | 32 | 36 | 0.90335 | ||
| 1 | 1.00000 - 1* | 34 | 36 | 0.9073 | ||
| 34 | 36 | 0.9073 - 1* | ||||
| Tag SNP for block_0002 | ||||||
| 7 | 15 | 0.96085 | Tag SNP for block_0005 | |||
| 7 | 17 | 0.96085 | 37 | 39 | 0.93032 | |
| 7 | 18 | 0.96085 | 37 | 40 | 0.92593 | |
| 7 | 19 | 0.96085 | 39 | 46 | 0.93265 | |
| 7 | 15 | 0.96085 - 1* | 40 | 46 | 0.92716 | |
| 39 | 46 | 0.93265 - 1* | ||||
| Tag SNP for block_0003 | ||||||
| 25 | 27 | 0.92191 | Tag SNP for block_0006 | |||
| 25 | 31 | 0.90516 | 48 | 0.95869 | ||
| 26 | 27 | 0.92676 | 48 | 0.95869 - 1* | ||
| 26 | 31 | 0.91236 | ||||
| 27 | 31 | 0.92645 | ||||
| 26 | 27 | 0.92676 - 1* | ||||
† Tag SNPs are in combined order.
* - 1 lines indicate the Tag SNPs that scored the highest in each block by the HapBlock program.
| SNP_ID | COMBINED ORDER | POP_B ORDER | POP_C ORDER | CHROM_POS | POP_B BLOCK | POP_C BLOCK |
|---|---|---|---|---|---|---|
| 110924 | 1 | 1 | 1 | 40077996 | Block_0001 | Block_0001 |
| 110926 | 2 | 2 | 2 | 40078865 | Block_0001 | Block_0002 |
| 110525 | 3 | NA | 3 | 40104585 | NA | Block_0002 |
| 110527 | 4 | 3 | 4 | 40112652 | Block_0001 | Block_0002 |
| 110528 | 5 | 4 | 5 | 40120338 | Block_0001 | Block_0002 |
| 110529 | 6 | 5 | 6 | 40120419 | Block_0002 | Block_0002 |
| 3884 | 7 | 6 | 7 | 40131747 | Block_0002 | Block_0002 |
| 117587 | 8 | 7 | 8 | 40147031 | Block_0002 | Block_0002 |
| 117590 | 9 | 8 | 9 | 40147256 | Block_0002 | Block_0002 |
| 91037 | 10 | 9 | 10 | 40159355 | Block_0002 | Block_0002 |
| 82256 | 11 | 10 | 11 | 40162170 | Block_0003 | Block_0002 |
| 117575 | 12 | 11 | NA | 40163399 | Block_0003 | NA |
| 117578 | 13 | 12 | NA | 40163843 | Block_0003 | NA |
| 3943 | 14 | 13 | 12 | 40163920 | Block_0003 | Block_0002 |
| 2442 | 15 | 14 | 13 | 40164108 | Block_0003 | Block_0002 |
| 117580 | 16 | 15 | 14 | 40164192 | Block_0003 | Block_0002 |
| 117581 | 17 | 16 | 15 | 40164236 | Block_0003 | Block_0002 |
| 117582 | 18 | 17 | 16 | 40164840 | Block_0003 | Block_0002 |
| 117583 | 19 | 18 | 17 | 40165138 | Block_0003 | Block_0002 |
| 37728 | 20 | 19 | 18 | 40165262 | Block_0003 | Block_0002 |
| 14523 | 21 | NA | 19 | 40166038 | NA | Block_0002 |
| 82025 | 22 | 20 | 20 | 40166144 | Block_0004 | Block_0002 |
| 84395 | 23 | 21 | 21 | 40168971 | Block_0004 | Block_0002 |
| 117586 | 24 | 22 | 22 | 40173352 | Block_0004 | Block_0002 |
| 117592 | 25 | 23 | 23 | 40182141 | Block_0004 | Block_0003 |
| 117593 | 26 | 24 | 24 | 40182498 | Block_0004 | Block_0003 |
| 117596 | 27 | 25 | 25 | 40207457 | Block_0005 | Block_0003 |
| 26726 | 28 | 26 | 26 | 40218483 | Block_0005 | Block_0003 |
| 16893 | 29 | 27 | 27 | 40229786 | Block_0005 | Block_0003 |
| 11692 | 30 | 28 | 28 | 40241571 | Block_0005 | Block_0003 |
| 117608 | 31 | 29 | 29 | 40242422 | Block_0006 | Block_0003 |
| 32936 | 32 | 30 | 30 | 40249849 | Block_0006 | Block_0004 |
| 117566 | 33 | 31 | 31 | 40250303 | Block_0006 | Block_0004 |
| 44133 | 34 | 32 | 32 | 40250387 | Block_0006 | Block_0004 |
| 117567 | 35 | 33 | 33 | 40256951 | Block_0006 | Block_0004 |
| 23139 | 36 | 34 | 34 | 40257384 | Block_0007 | Block_0004 |
| 118681 | 37 | 35 | 35 | 40283200 | Block_0007 | Block_0005 |
| 99869 | 38 | 36 | 36 | 40283420 | Block_0008 | Block_0005 |
| 2584 | 39 | 37 | 37 | 40284703 | Block_0008 | Block_0005 |
| 118669 | 40 | 38 | 38 | 40285521 | Block_0008 | Block_0005 |
| 118674 | 41 | 39 | 39 | 40294440 | Block_0008 | Block_0005 |
| 30109 | 42 | 40 | 40 | 40295018 | Block_0008 | Block_0005 |
| 118676 | 43 | 41 | 41 | 40300494 | Block_0008 | Block_0005 |
| 88347 | 44 | 42 | NA | 40303907 | Block_0008 | Block_0005 |
| 118679 | 45 | 43 | 42 | 40303949 | Block_0008 | NA |
| 88348 | 46 | 44 | 43 | 40303993 | Block_0009 | Block_0005 |
| 3742 | 47 | 45 | 44 | 40314969 | Block_0009 | Block_0006 |
| 54 | 48 | 46 | 45 | 40315218 | Block_0009 | Block_0006 |