| Literature DB >> 15588479 |
Vanessa J Clark1, Michael Dean.
Abstract
Chemokine signals and their cell-surface receptors are important modulators of HIV-1 disease and cancer. To aid future case/control association studies, aim to further characterise the haplotype structure of variation in chemokine and chemokine receptor genes. To perform haplotype analysis in a population-based association study, haplotypes must be determined by estimation, in the absence of family information or laboratory methods to establish phase. Here, test the accuracy of estimates of haplotype frequency and linkage disequilibrium by comparing estimated haplotypes generated with the expectation maximisation (EM) algorithm to haplotypes determined from Centre d'Etude Polymorphisme Humain (CEPH) pedigree data. To do this, they have characterised haplotypes comprising alleles at 11 biallelic loci in four chemokine receptor genes (CCR3, CCR2, CCR5 and CCRL2), which span 150 kb on chromosome 3p21, and haplotyes of nine biallelic loci in six chemokine genes [MCP-1(CCL2), Eotaxin(CCL11), RANTES(CCL5), MPIF-1(CCL23), PARC(CCL18) and MIP-1alpha(CCL3)] on chromosome 17q11-12. Forty multi-generation CEPH families, totalling 489 individuals, were genotyped by the TaqMan 5'-nuclease assay. Phased haplotypes and haplotypes estimated from unphased genotypes were compared in 103 grandparents who were assumed to have mated at random. For the 3p21 single nucleotide polymorphism (SNP) data, haplotypes determined by pedigree analysis and haplotypes generated by the EM algorithm were nearly identical. Linkage disequilibrium, measured by the D' statistic, was nearly maximal across the 150 kb region, with complete disequilibrium maintained at the extremes between CCR3-Y17Y and CCRL2-I243V. D'-values calculated from estimated haplotypes on 3p21 had high concordance with pairwise comparisons between pedigree-phased chromosomes. Conversely, there was less agreement between analyses of haplotype frequencies and linkage disequilibrium using estimated haplotypes when compared with pedigree-phased haplotypes of SNPs on chromosome 17q11-12. These results suggest that, while estimations of haplotype frequency and linkage disequilibrium may be relatively simple in the 3p21 chemokine receptor cluster in population samples, the more complex environment on chromosome 17q11-12 will require a higher resolution haplotype analysis.Entities:
Mesh:
Substances:
Year: 2004 PMID: 15588479 PMCID: PMC3525080 DOI: 10.1186/1479-7364-1-3-195
Source DB: PubMed Journal: Hum Genomics ISSN: 1473-9542 Impact factor: 4.639
Biallelic loci typed in CEPH pedigrees
| Haplo- type position | Gene | NCBI locus link | Nucleotide/AA position | NCBI genbank number | NCBI contig | Contig position | NCBI dbSNP ss# | Allele 1 | CEPH GP frequency |
|---|---|---|---|---|---|---|---|---|---|
| 3p2l | |||||||||
| 1 | CCR3 | 1232 | Y17Y, A/G | NM_001837 | NT_05827 | 3997337 | 4987053 | A | 0.907 |
| CCR2 | 1231 | - 5983 G/A | U95626 | NT_05827 | 4083672 | G | 1 | ||
| 2 | CCR2 | 1231 | - 5048 G/T | U95626 | NT_05827 | 4084607 | 3918357 | G | 0.892 |
| CCR2 | 1231 | - 4866 C/G | U95626 | NT_05827 | 4084789 | 3918370 | C | 1 | |
| 3 | CCR2 | 1231 | - 3433 T/C | U95626 | NT_05827 | 4086222 | 3092964 | T | 0.802 |
| 4 | CCR2 | 1231 | V64I, C/T | NM_000647 | NT_05827 | 4089845 | 1799864 | C | 0.898 |
| 5 | CCR2 | 1231 | N260N, A/G | NM_000647 | NT_05827 | 4090435 | 1799865 | A | 0.696 |
| 6 | CCR5 | 1234 | 208 C/A | NM_000579 | NT_05827 | 4102477 | 2734648 | G | 0.631 |
| 7 | CCR5 | 1234 | 303 C/T | NM_000579 | NT_05827 | 4102572 | 1799987 | G | 0.524 |
| 8 | CCR5 | 1234 | 676 T/C | NM_000579 | NT_05827 | 4102945 | 1800023 | A | 0.631 |
| 9 | CCR5 | 1234 | L55Q, T/A | NM_000579 | NT_05827 | 4105194 | 1799863 | T | 0.976 |
| 10 | CCR5 | 1234 | D32 | NM_000579 | NT_05827 | NODEL | 0.905 | ||
| 11 | CCRL2 | 9034 | 1243V, C/T | NM_003965 | NT_05827 | 4140934 | 3204850 | C | 0.902 |
| CCRL2 | 9034 | 1137 C/G | NM_003965 | NT_05827 | 4141344 | C | 1 | ||
| 1 | MCP-1(CCL2) | 6347 | - 362 C/G | M37719 | NT_010799 | 7315787 | 2857656 | C | 0.718 |
| 2 | EOTAXIN (CCL11) | 6356 | - 1382 C/T | Z92709 | NT_010799 | 7345226 | 4795895 | C | 0.777 |
| 3 | RANTES (CCL5) | 6352 | - 8147 A/G | NM_002985 | NT_010799 | 8932972 | A | 0.903 | |
| 4(1) | MPIF-1 (CCL23) | 6368 | M106V, G/A | U85767 | NT_010799 | 9074064 | 1003645 | A | 0.832 |
| 5(2) | PARC (CCL18) | 6362 | - 116 C/T | AB012113 | NT_010799 | 9125397 | 2015086 | C | 0.662 |
| 6(3) | PARC (CCL18) | 6362 | 81 G/A | AB012113 | NT_010799 | 9125563 | 2015070 | G | 0.97 |
| 7(4) | PARC (CCL18) | 6362 | 311 C/A | AB012113 | NT_010799 | 9125793 | 2015052 | A | 0.922 |
| 8(5) | PARC (CCL18) | 6362 | 6793 A/G | AB012113 | NT_010799 | 9132275 | 14304 | G | 0.909 |
| 9(6) | MIP-1A (CCL3) | 6348 | - 1541 T/C | M23178 | NT_010799 | 9152727 | 1634497 | A | 0.705 |
Results from a comparison of pedigree-derived and estimated haplotype frequencies (n = 103).
| Haplotype number | Haplotype | GP count | Pedigree frequency | MLOCUS frequency | Similarity index | MSE |
|---|---|---|---|---|---|---|
| 1 | 11111111111111 | 60 | 0.2913 | 0.2936 | 0.0024 | 0.00001 |
| 2 | 11111112221111 | 39 | 0.1893 | 0.1909 | 0.0015 | 0.00000 |
| 3 | 11112122221111 | 35 | 0.1699 | 0.1719 | 0.0020 | 0.00000 |
| 5 | 11211211111111 | 21 | 0.1019 | 0.1001 | 0.0019 | 0.00000 |
| 4 | 21111121211121 | 20 | 0.0971 | 0.0882 | 0.0089 | 0.00008 |
| 6 | 11111111111211 | 18 | 0.0874 | 0.0919 | 0.0045 | 0.00002 |
| 8 | 11111111112111 | 5 | 0.0243 | 0.0223 | 0.0020 | 0.00000 |
| 7 | 11112121111111 | 4 | 0.0194 | 0.0186 | 0.0008 | 0.00000 |
| 11112121211121 | 2 | 0.0097 | 0.0098 | 0.0001 | 0.00000 | |
| 11212122221111 | 1 | 0.0049 | 0.0049 | 0.0000 | 0.00000 | |
| 11111121211111 | 1 | 0.0049 | 0.0049 | 0.0000 | 0.00000 | |
The total similarity index (IF) and mean squared error (MSE) values are indicated at the bottom of the table. Haplotypes that are only present in MLOCUS estimates are denoted in italics. The haplotype number indicates the equivalent haplotype to those seven SNP haplotypes discussed in Clarket et al. 2001 [25].
Comparison of pedigree-phased haplotypes for nine SNPs over 2 Mb of 17q11-12 in Centre d'Etude Polymorphisme Humain (CEPH) grandparents (n = 87) with MLOCUS estimates from unphased genotype data from these same individuals
| Haplotype | Count | Frequency | MLOCUS | Similarity index | MSE |
|---|---|---|---|---|---|
| 111111111 | 36 | 0.2069 | 0.2374 | 0.0305 | 0.0009 |
| 111111122 | 22 | 0.1264 | 0.1332 | 0.0067 | 0.0000 |
| 121111111 | 19 | 0.1092 | 0.1450 | 0.0358 | 0.0013 |
| 211111111 | 18 | 0.1034 | 0.0631 | 0.0404 | 0.0016 |
| 211111122 | 9 | 0.0517 | 0.0605 | 0.0088 | 0.0001 |
| 111211111 | 8 | 0.0460 | 0.0245 | 0.0214 | 0.0005 |
| 111111121 | 6 | 0.0345 | 0.0341 | 0.0004 | 0.0000 |
| 121122111 | 5 | 0.0287 | 0.0145 | 0.0143 | 0.0002 |
| 121111121 | 4 | 0.0230 | 0.0198 | 0.0032 | 0.0000 |
| 121211122 | 3 | 0.0172 | 0.0000 | 0.0172 | 0.0003 |
| 211122111 | 3 | 0.0172 | 0.0315 | 0.0143 | 0.0002 |
| 112111122 | 3 | 0.0172 | 0.0168 | 0.0004 | 0.0000 |
| 112111111 | 3 | 0.0172 | 0.0196 | 0.0023 | 0.0000 |
| 121111122 | 3 | 0.0172 | 0.0209 | 0.0037 | 0.0000 |
| 121211111 | 2 | 0.0115 | 0.0051 | 0.0064 | 0.0000 |
| 212111122 | 2 | 0.0115 | 0.0000 | 0.0115 | 0.0001 |
| 212111111 | 2 | 0.0115 | 0.0345 | 0.0230 | 0.0005 |
| 211221211 | 2 | 0.0115 | 0.0165 | 0.0050 | 0.0000 |
| 211111121 | 2 | 0.0115 | 0.0088 | 0.0027 | 0.0000 |
| 211211111 | 2 | 0.0115 | 0.0218 | 0.0103 | 0.0001 |
| 111121211 | 2 | 0.0115 | 0.0000 | 0.0115 | 0.0001 |
| 122211111 | 2 | 0.0115 | 0.0113 | 0.0002 | 0.0000 |
| 111122111 | 2 | 0.0115 | 0.0000 | 0.0115 | 0.0001 |
| 122111122 | 2 | 0.0115 | 0.0000 | 0.0115 | 0.0001 |
| 111211122 | 2 | 0.0115 | 0.0245 | 0.0130 | 0.0002 |
| 111111112 | 1 | 0.0057 | 0.0000 | 0.0057 | 0.0000 |
| 112211111 | 1 | 0.0057 | 0.0000 | 0.0057 | 0.0000 |
| 121222111 | 1 | 0.0057 | 0.0000 | 0.0057 | 0.0000 |
| 211211122 | 1 | 0.0057 | 0.0000 | 0.0057 | 0.0000 |
| 111221211 | 1 | 0.0057 | 0.0000 | 0.0057 | 0.0000 |
| 121111112 | 1 | 0.0057 | 0.0066 | 0.0009 | 0.0000 |
| 111222111 | 1 | 0.0057 | 0.0136 | 0.0078 | 0.0001 |
| 122111111 | 1 | 0.0057 | 0.0000 | 0.0057 | 0.0000 |
| 211122122 | 1 | 0.0057 | 0.0000 | 0.0057 | 0.0000 |
| 111121212 | 1 | 0.0057 | 0.0000 | 0.0057 | 0.0000 |
The IF (similarity index) and the mean squared error (MSE) for the two haplotype analyses are indicated at the bottom of the table. Those haplotypes present only in the MLOCUS analysis (less than 1 per cent frequency) are not included.
Comparison of MLOCUS estimated to pedigree-phased haplotypes (n = 96) for six SNPs in 79 kb 'core' region of 17q11-12
| No. | HAPLOTYPE | GP count | Frequency | MLOCUS | Similarity index | MSE |
|---|---|---|---|---|---|---|
| 1 | 111111 | 85 | 0.4427 | 0.4519 | 0.0092 | 0.00008 |
| 2 | 111122 | 46 | 0.2396 | 0.2632 | 0.0236 | 0.00056 |
| 3 | 211111 | 20 | 0.1042 | 0.0934 | 0.0108 | 0.00012 |
| 4 | 111121 | 12 | 0.0625 | 0.0610 | 0.0015 | 0.00000 |
| 5 | 122111 | 10 | 0.0521 | 0.0503 | 0.0018 | 0.00000 |
| 6 | 211122 | 6 | 0.0313 | 0.0148 | 0.0164 | 0.00027 |
| 7 | 221211 | 4 | 0.0208 | 0.0254 | 0.0045 | 0.00002 |
| 8 | 111112 | 3 | 0.0156 | 0.0063 | 0.0093 | 0.00009 |
| 9 | 121211 | 2 | 0.0104 | 0.0046 | 0.0058 | 0.00003 |
| 10 | 222111 | 2 | 0.0104 | 0.0174 | 0.0070 | 0.00005 |
| 11 | 121212 | 1 | 0.0052 | 0.0065 | 0.0013 | 0.00000 |
| 12 | 122122 | 1 | 0.0052 | 0.0000 | 0.0052 | 0.00003 |
Those haplotypes that are only present in the MLOCUS estimation results are denoted in italics.
Estimated D' values generated by two methods for all polymorphic loci in the 3p2l chemokine receptor gene region in the CEPH sample
Numbers above the diagonal for each table indicate the pairwise D' value for pairs of SNPs in the CEPH grandparent sample (n = 103). p values for each test are indicated below the diagonal. Values in the upper table (A) indicate D' values calculated in DnaSP from the pedigree-derived haplotypes. Values in the lower table (B) indicate D' values calculated using the PAIRWISE program from the MLOCUS haplotype frequency estimates. Those values in boxed cells indicate non-significant results. D' estimates in the lower table denoted in italics indicate differences from the values generated in DnaSP for that particular comparison of loci.
Estimated D' values generated by two methods for all nine SNPs in the 2 Mb chemokine gene region on chromosome I7ql 1-12 in CEPH grandparents (n = 87)
Numbers above the diagonal for each table indicate the pairwise D' value for pairs of SNPs in the CEPH grandparent sample, p values for each test are indicated below the diagonal. Values in the upper table (A) indicate D' values calculated in DnaSP from the pedigree-derived haplotypes. Values in the lower table (B) indicate D'-values calculated using the PAIRWISE program from the MLOCUS haplotype frequency estimates. Those values in boxed cells indicate non-significant results. D' estimates in the lower table denoted in italics indicate differences from the values generated in DnaSP for that particular comparison of loci.
Estimated D' values generated by two methods for six SNPs in the 79 kb 'core' region of three chemokine genes on chromosome 17q11-12 in CEPH
Numbers above the diagonal for each table indicate the pairwise D' value for pairs of SNPs in the CEPH grandparent sample (n = 96). p values for each test are indicated below the diagonal. Values in the upper table (A) indicate D' values calculated in DnaSP from the pedigree-derived haplotypes. Values in the lower table (B) indicate D' values calculated using the PAIRWISE program from the MLOCUS haplotype frequency estimates. Those values in boxed cells indicate non-significant results. D' estimates in the lower table denoted in italics indicate differences from the values generated in DnaSP for that particular comparison of loci.