| Literature DB >> 15932650 |
Audrey Sabbagh1, Pierre Darlu.
Abstract
BACKGROUND: Numerous studies have attempted to relate genetic polymorphisms within the N-acetyltransferase 2 gene (NAT2) to interindividual differences in response to drugs or in disease susceptibility. However, genotyping of individuals single-nucleotide polymorphisms (SNPs) alone may not always provide enough information to reach these goals. It is important to link SNPs in terms of haplotypes which carry more information about the genotype-phenotype relationship. Special analytical techniques have been designed to unequivocally determine the allocation of mutations to either DNA strand. However, molecular haplotyping methods are labour-intensive and expensive and do not appear to be good candidates for routine clinical applications. A cheap and relatively straightforward alternative is the use of computational algorithms. The objective of this study was to assess the performance of the computational approach in NAT2 haplotype reconstruction from phase-unknown genotype data, for population samples of various ethnic origin.Entities:
Mesh:
Substances:
Year: 2005 PMID: 15932650 PMCID: PMC1173101 DOI: 10.1186/1471-2156-6-30
Source DB: PubMed Journal: BMC Genet ISSN: 1471-2156 Impact factor: 2.797
The major human NAT2 alleles and their associated phenotypea.
| Allele | Nucleotide change b | Phenotype | ||||||
| C282T | C481T | A803G | ||||||
| rapid | ||||||||
| x | slow | |||||||
| x | x | slow | ||||||
| x | slow | |||||||
| x | slow | |||||||
| slow | ||||||||
| slow | ||||||||
| x | slow | |||||||
| x | rapid | |||||||
| x | x | rapid | ||||||
| x | rapid | |||||||
| slow | ||||||||
| x | slow | |||||||
Nucleotide substitutions shown in bold have a functional consequence on enzyme activity. Bold-faced alleles contain functional polymorphisms and are hence associated with the slow acetylator phenotype. Classification of NAT2 alleles into different clusters is based on the most functionally significant nucleotide substitution present: the NAT2*5, NAT2*6, NAT2*7 and NAT2*14 clusters possess signature nucleotide substitutions at positions 341, 590, 857 and 191, respectively and are hence all decreased function alleles ('slow alleles'). The others display enzymatic activity comparable to the rapid acetylator allele NAT2*4. There are significant interethnic differences in NAT2 allele distribution and frequency [14, 15].
a Adapted from Hein et al. [3]. NAT2 nomenclature is accessible on the internet at website
b Only changes from the reference sequence (NAT2*4) are indicated.
Figure 1The ambiguous gametic phase of haplotypes for a given multilocus genotype. To illustrate the relevance of linkage phase ascertainment, let us consider the following case of a four-site heterozygous individual at positions 191, 341, 481 and 803 within the NAT2 coding sequence. Eight possible combinations of haplotypes can be inferred from this multilocus genotype, two of whom are shown here. Depending on the location of mutations to either DNA strand, the individual's NAT2 genotype composed of two multilocus haplotypes will not be the same. Moreover, an incorrect resolution of mutation linkage patterns may entail an error in individual phenotype prediction: the subject will be classified either as a slow or as a rapid acetylator depending on the haplotypic combination chosen. Symbol (*) points at mutations leading to a decrease in NAT2 enzyme activity, while symbol (×) indicates those with no impact on the acetylator phenotype.
Figure 2Linkage disequilibrium (r2 value) between SNP markers in the NAT2 locus. Graphical representation of the disequilibrium matrices obtained through computation of the r2 coefficient between each pair of markers, for the Spanish, Korean and Black South African samples. The British and Nicaraguan samples provided patterns and levels of LD comparable to those of the Spanish data. For each marker pair, GOLD [60] plotted the color-coded pairwise r2 statistics at the Cartesian coordinates corresponding to marker location, and the plots were completed by interpolation. These graphs point out the strong level of LD between markers at positions 341, 481, and 803, as well as between SNPs located at 282 and 590: these markers are thus strongly predictive of one another. In Black South Africans, LD patterns are less pronounced and more diffuse across marker pairs.
Performance of the four computational methods in haplotype identification, as measured by the Iindex.
| Hapar | PL-EM | Haplotyper | PHASE | |
| 258 Spanish [45] | 0.933 | 0.933 | 0.933 | 0.933 |
| 137 Nicaraguans [30] | 0.952 | 0.952 | 0.909 | 0.952 |
| 112 UK Caucasians [24] | 1 | 1 | 1 | 1 |
| 101 Black South Africans [24] | 0.917 | 0.917 | 0.917 | 1 |
| 1000 Koreans [31] | - * | 1 | 1 | 1 |
Numbers in brackets indicate the number of haplotypes for which an error of prediction was made. * The size of the Korean sample was too large to be correctly handled by the Hapar program.
Individual error rate in haplotype reconstruction
| PL-EM | Haplotyper | PHASE | |
| 258 Spanish [45] | 0.39% | 0.39% | 0.39% |
| 137 Nicaraguans [30] | 2.19% | 3.65% | 2.19% |
| 112 UK Caucasians [24] | 0.89% | 0% | 0% |
| 101 Black South Africans [24] | 3.96% | 3.96% | 2.97% |
| 1000 Koreans [31] | 0.30% | 0.30% | 0.30% |
The individual error rate is defined as the ratio of erroneous phase calls to the total number of phase calls (see text).
Index of similarity (I) between haplotype frequencies estimated with and without molecular haplotyping information.
| PL-EM | PHASE | |
| 258 Spanish [45] | 0.996 | 0.996 |
| 137 Nicaraguans [30] | 0.986 | 0.986 |
| 112 UK Caucasians [24] | 0.994 | 0.998 |
| 101 Black South Africans [24] | 0.981 | 0.988 |
| 1000 Koreans [31] | 0.997 | 0.998 |
Computations of Iindices were based on the haplotype frequency estimates obtained by considering all possible haplotype configurations (with a nonzero probability) inferred for each subject, weighted by their estimated probability. Note that such estimates can be rather different from those obtained by gene counting on the basis of the "best" reconstruction (that is, when only the most probable pair of haplotype is selected for each sampled individual). Since Haplotyper only provides a summary of the frequency with which each haplotype occurred in the "best" reconstruction, it was excluded from the comparison.
Figure 3The change coefficient (C) as a function of haplotype frequency. The change coefficient reflects the discrepancy between haplotype frequencies deduced from phase-known data and those estimated computationally (here with the PHASE program). All haplotypes occurring in any of the five population samples are considered.
Average change coefficients of PL-EM and PHASE programs computed for three classes of haplotype frequency.
| Haplotype frequency | |||
| < 1% | 1–5% | > 5% | |
| PL-EM | 30.6% | 8.3% | 1.2% |
| PHASE | 17.4% | 6.8% | 1.2% |
The five phase-resolved NAT2 molecular data sets investigated.
| 258 Spanish [45] | 66.7% | 0.65 | |
| 137 Nicaraguans [30] | 59.1% | 0.70 | 0.072 |
| 112 UK Caucasians [24] | 52.7% | 0.69 | 0.222 |
| 101 Black South Africans [24] | 63.4% | 0.86 | 0.122 |
| 1000 Koreans [31] | 50.0% | 0.52 |
All population samples were genotyped for the same seven nucleotide changes (G191A, C282T, T341C, C481T, G590A, A803G, G857A), except Koreans where the C190T mutation was investigated instead of G191A.
aProportion of multiply heterozygous individuals with ambiguous genotype whose phase has been resolved molecularly.
bExpected heterozygosities for the NAT2 haplotyped system were estimated as where n is the number of gene copies in the sample, and pi is the sample frequency of the i-th haplotype.
cThe significance of deviations from Hardy-Weinberg equilibrium was tested for genotypic data with known gametic phase using the random-permutation procedure implemented in the Arlequin package [51]: a Fisher's exact test using a Markov chain random walk algorithm was performed for each data set. The resulting p-values were considered significant if inferior to 0.05 (significant p-values are shown in bold).