| Literature DB >> 17067372 |
Cédric Coulonges1, Olivier Delaneau, Manon Girard, Hervé Do, Ronald Adkins, Jean-Louis Spadoni, Jean-François Zagury.
Abstract
BACKGROUND: Genetic association studies aim at finding correlations between a disease state and genetic variations such as SNPs or combinations of SNPs, termed haplotypes. Some haplotypes have a particular biological meaning such as the ones derived from SNPs located in the promoters, or the ones derived from non synonymous SNPs. All these haplotypes are "subhaplotypes" because they refer only to a part of the SNPs found in the gene. Until now, subhaplotypes were directly computed from the very SNPs chosen to constitute them, without taking into account the rest of the information corresponding to the other SNPs located in the gene. In the present work, we describe an alternative approach, called the "global method", which takes into account all the SNPs known in the region and compare the efficacy of the two "direct" and "global" methods.Entities:
Mesh:
Substances:
Year: 2006 PMID: 17067372 PMCID: PMC1636337 DOI: 10.1186/1471-2156-7-50
Source DB: PubMed Journal: BMC Genet ISSN: 1471-2156 Impact factor: 2.797
Figure 1Graphical representation comparing the individual error rates (IER) between the direct and global methods. This figure presents the detailed graphs of the average error rates obtained by the 2 subhaplotyping methods, "direct" (in white) and "global" (in black), when they rely on the resolution with maximum probability (Rmax) produced by PHASE. Each graph corresponds to a different level of missing data introduced in the GH1 genotypic dataset (0%, 2%, 5%, 10%, 15% and 20%) and presents the mean of IER of all the replicates tested. There error rate obtained by the global method is always lower.
Error rates obtained according to the level of missing information in the GH1 dataset
| MD | Method | IER | SimER | IF | Res rate |
| 0% | |||||
| Local_3snp | 1.56% | 0.52% | 0.9904 | 100% | |
| Local_5snp | 3.57% | 0.72% | 0.9784 | 100% | |
| Local_7snp | 5.57% | 0.83% | 0.9704 | 100% | |
| 2% | |||||
| Local_3snp | 2.57% | 0.68% | 0.9898 | 100% | |
| Local_5snp | 5.26% | 1.20% | 0.9792 | 100% | |
| Local_7snp | 8.70% | 1.03% | 0.961 | 100% | |
| 5% | |||||
| Local_3snp | 3.65% | 0.79% | 0.9894 | 100% | |
| Local_5snp | 7.68% | 1.20% | 0.972 | 100% | |
| Local_7snp | 11.00% | 1.37% | 0.959 | 100% | |
| 10% | |||||
| Local_3snp | 6.60% | 1.33% | 0.983 | 99.87% | |
| Local_5snp | 12.91% | 1.86% | 0.962 | 100% | |
| Local_7snp | 16.41% | 1.88% | 0.946 | 100% | |
| 15% | |||||
| Local_3snp | 9.49% | 1.84% | 0.977 | 99.45% | |
| Local_5snp | 17.00% | 2.33% | 0.953 | 100% | |
| Local_7snp | 21.87% | 2.35% | 0.931 | 100% | |
| 20% | |||||
| Local_3snp | 12.45% | 2.41% | 0.97 | 97.02% | |
| Local_5snp | 22.15% | 3.02% | 0.936 | 99.60% | |
| Local_7snp | 26.44% | 2.81% | 0.916 | 100% | |
Summary of the mean error rates obtained by each subhaplotyping method when they used the Rmax resolution produced by PHASE. The average individual error rate (IER) and similarity error rate (SimER) were computed according to the level of missing data (MD) introduced in the population (0%, 2%, 5%, 10%, 15%, 20%). Tests were performed on randomly selected SNP subsets of size 3, 5, and 7 taken out of the 14 SNPs present in the GH1 genomic dataset (see text and Material and Methods). This table presents the average of the IF coefficients which compares the accuracy of the subhaplotypes frequencies found by each subhaplotyping method. The global method fares always better.
Error rates obtained according to the level of missing information in the APOE and simulated datasets
| MD | Method | IER | SimER | IF | Res rate |
| 0% | |||||
| Local | 4.88% | 1.22% | 0.97 | 100% | |
| 2% | |||||
| Local | 5.12% | 1.19% | 0.972 | 100% | |
| 5% | |||||
| Local | 5.64% | 1.83% | 0.978 | 100% | |
| 10% | |||||
| Local | 6.89% | 1.91% | 0.972 | 100% | |
| 15% | |||||
| Local | 10.33% | 1.97% | 0.964 | 99.75% | |
| 20% | |||||
| Local | 15.21% | 2.54% | 0.943 | 99.21% | |
| MD | Method | IER | SimER | IF | Res rate |
| 0% | |||||
| Local | 0.81% | 0.10% | 0.989 | 100% | |
| 2% | |||||
| Local | 1.26% | 0.13% | 0.982 | 100% | |
| 5% | |||||
| Local | 1.46% | 0.18% | 0.975 | 100% | |
| 10% | |||||
| Local | 2.65% | 0.32% | 0.961 | 100% | |
| 15% | |||||
| Local | 4.80% | 0.59% | 0.947 | 100% | |
| 20% | |||||
| Local | 8.70% | 1.06% | 0.934 | 100% | |
Summary of the average error rates found when working with subhaplotypes based on 4 SNPs out of 9 in the APOE genomic dataset and 10 SNPs out of 30 in the 10 simulated datasets.
IER: individual error rate, Res Rate: resolution rate, SimER: similarity error rate, MD: Missing data, IF: frequency error rate (see Material and Methods).
Error rates found by each method when using cut-offs for the probabilities provided by PHASE
| MD | Abs IER | Rel IER | Res rate | |
| 0% | ||||
| Local | 2.56% | 2.65% | 96.54% | |
| 2% | ||||
| Local | 2.97% | 3.05% | 94.78% | |
| 5% | ||||
| Local | 3.17% | 3.51% | 90.42% | |
| 10% | ||||
| Local | 4.07% | 5.00% | 81.50% | |
| 15% | Global | 5.83% | 6.56% | 88.88% |
| 20% | ||||
| Local | 6.00% | 8.81% | 68.12% | |
| MD | Abs IER | Rel IER | Res rate | |
| 0% | ||||
| Local | 3.50% | 3.50% | 99.83% | |
| 2% | ||||
| Local | 4.18% | 4.26% | 97.47% | |
| 5% | ||||
| Local | 5.08% | 5.32% | 95.45% | |
| 10% | ||||
| Local | 7.24% | 8.04% | 90.06% | |
| 15% | ||||
| Local | 9.50% | 10.95% | 86.83% | |
| 20% | ||||
| Local | 11.96% | 14.45% | 82.77% | |
This table presents the summary of average error rates when using a cut-off on resolution probability given by PHASE instead of Rmax when choosing the resolution obtained by each method. The 2 cut-offs chosen were respectively 50% and 70%. The error rate is almost always lower than with Rmax, but the number of assigned haplotypes strongly decreased.
IER: individual error rate, Abs IER: absolute IER, Rel IER: relative IER, Res Rate: resolution rate, MD: Missing Data.
Error rates of the combination method
| MD | Method | Abs IER | Rel IER | SimER | Res rate |
| 2% | combi_3snp | 1.83% | 1.84% | 0.17% | 100.00% |
| combi_5snp | 2.56% | 2.57% | 0.36% | 99.58% | |
| combi_7snp | 4.89% | 4.89% | 0.54% | 99.23% | |
| 5% | combi_3snp | 2.72% | 2.72% | 0.29% | 100.00% |
| combi_5snp | 3.64% | 3.69% | 0.57% | 98.39% | |
| combi_7snp | 6.09% | 6.27% | 0.60% | 97.15% | |
| 10% | combi_3snp | 3.16% | 3.22% | 0.45% | 98.06% |
| combi_5snp | 5.60% | 5.81% | 0.68% | 96.40% | |
| combi_7snp | 8.12% | 8.54% | 0.67% | 95.12% | |
| 15% | combi_3snp | 5.03% | 5.09% | 0.61% | 98.81% |
| combi_5snp | 5.98% | 6.31% | 0.82% | 94.70% | |
| combi_7snp | 6.62% | 7.36% | 0.77% | 89.95% | |
| 20% | combi_3snp | 4.09% | 4.56% | 0.83% | 89.78% |
| combi_5snp | 8.16% | 8.70% | 0.96% | 93.80% | |
| combi_7snp | 6.43% | 7.34% | 0.90% | 87.54% | |
This table presents the summary of average error rates when using the combination method (see Material and Methods). In that case, we present the example of the results obtained for subsets of 5 SNPs.
IER: individual error rate, Abs IER: absolute IER, Rel IER: relative IER, SimER: similarity error rate, Res Rate: resolution rate, MD: Missing Data.
Modification of the results obtained in the GRIV case-control study when using the various subhaplotyping methods
| IL10Receptor | Exon | 0.026 | *0.103 | 0.093 |
| IL4Receptor | Promoter | 0.019 | *0.072 | *0.088 |
| IL6 | Promoter | 0.059 | 0.012 | *0.009 |
*Best method regarding the missing data level.
This table presents the p-values found for the Fisher's exact tests comparing the subhaplotypes distributions between seropositive patients of the GRIV cohort (cases) and seronegative subjects (controls). The subhaplotypes were computed either with the direct Rmax method as previously published [2, 43], or with the global Rmax method, or with the combination Rmax method described in our study. The percentage of missing information was respectively 6.7%, 11.1% and 14.8% for the IL10R, IL4R, and IL6 genotypic data. One can see that some signals which were previously published as positive (p < 0.05) using the direct method become negative, while some signals which were previously published as negative become much stronger thanks to the novel subhaplotyping methods. For IL10R we have a deficit of information for cases and as consequences a lower percentage of assigned haplotypes in the combination method which is more restrictive.
A.H cases: percentage of assigned haplotypes attributed in the tests for cases
A.H control: percentage of assigned haplotypes attributed in the tests for controls.