| Literature DB >> 27942229 |
Romdhane Rekaya1, Shannon Smith2, El Hamidi Hay3, Nourhene Farhat4, Samuel E Aggrey5.
Abstract
Errors in the binary status of some response traits are frequent in human, animal, and plant applications. These error rates tend to differ between cases and controls because diagnostic and screening tests have different sensitivity and specificity. This increases the inaccuracies of classifying individuals into correct groups, giving rise to both false-positive and false-negative cases. The analysis of these noisy binary responses due to misclassification will undoubtedly reduce the statistical power of genome-wide association studies (GWAS). A threshold model that accommodates varying diagnostic errors between cases and controls was investigated. A simulation study was carried out where several binary data sets (case-control) were generated with varying effects for the most influential single nucleotide polymorphisms (SNPs) and different diagnostic error rate for cases and controls. Each simulated data set consisted of 2000 individuals. Ignoring misclassification resulted in biased estimates of true influential SNP effects and inflated estimates for true noninfluential markers. A substantial reduction in bias and increase in accuracy ranging from 12% to 32% was observed when the misclassification procedure was invoked. In fact, the majority of influential SNPs that were not identified using the noisy data were captured using the proposed method. Additionally, truly misclassified binary records were identified with high probability using the proposed method. The superiority of the proposed method was maintained across different simulation parameters (misclassification rates and odds ratios) attesting to its robustness.Entities:
Keywords: binary responses; misclassification; sensitivity; specificity
Year: 2016 PMID: 27942229 PMCID: PMC5138056 DOI: 10.2147/TACG.S122250
Source DB: PubMed Journal: Appl Clin Genet ISSN: 1178-704X
Summary of the posterior distribution of the misclassification probability (π) for the four simulation scenarios (averaged over five replicates)
| True | Moderate | Extreme
| |||||||
|---|---|---|---|---|---|---|---|---|---|
| PM | PSD | PM | PSD | ||||||
|
| |||||||||
| 5% | 0% | 0.04 | 0.002 | 0.006 | 0.0003 | 0.05 | 0.002 | 0.006 | 0.0003 |
| 7% | 3% | 0.05 | 0.02 | 0.008 | 0.004 | 0.06 | 0.02 | 0.007 | 0.004 |
Note:
Moderate effects for influential single nucleotide polymorphisms.
Abbreviations: PM, posterior mean; PSD, posterior standard deviation.
Figure 1Average posterior misclassification probability for the 54 miscoded observations (A: moderate and C: extreme) and the 1946 correctly coded observations (B: moderate and D: extreme) when the misclassification rates were set to 5% and 0%.
Figure 2Average posterior misclassification probability for the 98 miscoded observations (A: moderate and C: extreme) and the 1902 correctly coded observations (B: moderate and D: extreme) when the misclassification rates were set to 7% and 3%.
Percent of misclassified individuals correctly identified on the basis of two cutoff probabilities across the four simulation scenarios
| Cutoff probability | D1
| D2
| D3
| D4
| ||||
|---|---|---|---|---|---|---|---|---|
| Misclass | Correct | Misclass | Correct | Misclass | Correct | Misclass | Correct | |
| Hard | 0.65 | 0 | 0.94 | 0 | 0.44 | 0 | 0.97 | 0 |
| Soft | 1.00 | 0 | 0.98 | 0 | 0.86 | 0 | 1.00 | 0 |
Notes: Hard: cutoff probability was set at 0.5. Soft: cutoff probability was equal to the overall mean of the probabilities of being misclassified over the entire data set plus two standard deviations. Misclass: individuals who were misclassified. Correct: correctly coded individuals. The following data sets were simulated: 5% and 0% miscoding rates and moderate OR (D1) or extreme OR (D2); 7 and 3% miscoding rates and moderate OR (D3) or extreme OR (D4).
Abbreviation: OR, odds ratio.
Correlation between true* and estimated SNP effects under four simulation scenarios using noise data analyzed with threshold models either ignoring (M2) or contemplating (M3) misclassification
| Model | 5% and 0%
| 7% and 3%
| ||
|---|---|---|---|---|
| Moderate | Extreme | Moderate | Extreme | |
| M2 | 0.894 | 0.777 | 0.807 | 0.675 |
| M3 | 0.969 | 0.911 | 0.907 | 0.892 |
Notes:
True effects were calculated based on analysis of the true data (M1).
Moderate effects for influential SNPs. M1: true data analyzed with a standard model. M2: noisy data analyzed with threshold model ignoring misclassification. M3: noisy data analyzed with threshold model contemplating misclassification (proposed method).
Abbreviation: SNP, single nucleotide polymorphism.
Figure 3Distribution of SNP effects for 5% and 0% misclassification rates. The effects are sorted in decreasing order based on estimates using M1 when odds ratios of influential SNPs are moderate (A) and extreme (B). M1: true data analyzed with a standard model. M2: noisy data analyzed with threshold model ignoring misclassification. M3: noisy data analyzed with threshold model contemplating misclassification (proposed method).
Abbreviation: SNP, single nucleotide polymorphism.
Figure 4Distribution of SNP effects for 7% and 3% misclassification rates. The effects are sorted in decreasing order based on estimates using M1 when odds ratios of influential SNPs are moderate (A) and extreme (B). M1: true data analyzed with a standard model. M2: noisy data analyzed with threshold model ignoring misclassification. M3: noisy data analyzed with threshold model contemplating misclassification (proposed method).
Abbreviation: SNP, single nucleotide polymorphism.