| Literature DB >> 24369108 |
Shannon Smith, El Hamidi Hay, Nourhene Farhat, Romdhane Rekaya1.
Abstract
BACKGROUND: Misclassification has been shown to have a high prevalence in binary responses in both livestock and human populations. Leaving these errors uncorrected before analyses will have a negative impact on the overall goal of genome-wide association studies (GWAS) including reducing predictive power. A liability threshold model that contemplates misclassification was developed to assess the effects of mis-diagnostic errors on GWAS. Four simulated scenarios of case-control datasets were generated. Each dataset consisted of 2000 individuals and was analyzed with varying odds ratios of the influential SNPs and misclassification rates of 5% and 10%.Entities:
Mesh:
Year: 2013 PMID: 24369108 PMCID: PMC3879434 DOI: 10.1186/1471-2156-14-124
Source DB: PubMed Journal: BMC Genet ISSN: 1471-2156 Impact factor: 2.797
Summary of the posterior distribution of the misclassification probability ( ) for the four simulation scenarios (averaged over 10 replicates)
| True | PM2 | HPD95% | PM | HPD95% |
| 5% | 0.03 | 0.01-0.05 | 0.04 | 0.03-0.06 |
| 10% | 0.06 | 0.04-0.09 | 0.07 | 0.06-0.09 |
1Moderate effects for influential SNPs; 2 PM = Posterior mean; 3 HPD95% = High probability density interval.
Correlation between true and estimated SNP effects under four simulation scenarios using noisy data (M2) and the proposed approach (M3)
| M2 | 0.828 | 0.664 | 0.714 | 0.558 |
| M3 | 0.925 | 0.843 | 0.864 | 0.815 |
1True effects were calculated based on analysis of the true data (M1); 2Moderate effects for influential SNPs.
Figure 1Distribution of SNP effects for 5% misclassification rate. The effects are sorted in decreasing order based on estimates using M1 when odds ratios of influential SNPs are moderate (A) and extreme (B). M1: misclassification was not present in the data. M2: misclassification was present in the data set but was not addressed. M3: misclassification was addressed using the proposed method.
Figure 2Distribution of SNP effects for 10% misclassification rate. The effects are sorted in decreasing order based on estimates using M1 when odds ratios of influential SNPs are moderate (A) and extreme (B). M1: misclassification was not present in the data. M2: misclassification was present in the data set but was not addressed. M3: misclassification was addressed using the proposed method.
Number of the top 10% (15 SNPs) most influential SNPs that were correctly identified for all simulation scenarios using the noisy data (M2) and the proposed approach (M3)
| M2 | 12 | 10 | 10 | 9 |
| M3 | 14 | 13 | 13 | 12 |
1Moderate and extreme OR for influential SNPs.
Figure 3Average posterior misclassification probability for the 113 miscoded observations (a: moderate and c: extreme) and the 1887 correctly coded observations (b: moderate and d: extreme) when the misclassification rate was set to 5%.
Figure 4Average posterior misclassification probability for the 205 miscoded observations (a: moderate and c: extreme) and the 1795 correctly coded observations (b: moderate and d: extreme) when the misclassification rate was set to 10%.
Percent of misclassified individuals correctly identified based on two cutoff probabilities across the four simulation scenarios
| | Misclass2 | Correct | Misclass | Correct | Misclass | Correct | Misclass | Correct |
| Hard1 | 0.27 | 0 | 0.95 | 0 | 0.24 | 0 | 0.90 | 0 |
| Soft | 0.94 | 0 | 0.99 | 0 | 0.79 | 0 | 0.97 | 0 |
1Hard: cut off probability was set at 0.5. Soft: cut off probability was equal to the overall mean of the probabilities of being misclassified over the entire dataset plus two standard deviations; 2Misclass: individuals which were misclassified. Correct: Correctly coded individuals.