| Literature DB >> 23762230 |
Andrew McDavid1, Paul K Crane, Katherine M Newton, David R Crosslin, Wayne McCormick, Noah Weston, Kelly Ehrlich, Eugene Hart, Robert Harrison, Walter A Kukull, Carla Rottscheit, Peggy Peissig, Elisha Stefanski, Catherine A McCarty, Rebecca Lynn Zuvich, Marylyn D Ritchie, Jonathan L Haines, Joshua C Denny, Gerard D Schellenberg, Mariza de Andrade, Iftikhar Kullo, Rongling Li, Daniel Mirel, Andrew Crenshaw, James D Bowen, Ge Li, Debby Tsuang, Susan McCurry, Linda Teri, Eric B Larson, Gail P Jarvik, Chris S Carlson.
Abstract
The feasibility of using imperfectly phenotyped "silver standard" samples identified from electronic medical record diagnoses is considered in genetic association studies when these samples might be combined with an existing set of samples phenotyped with a gold standard technique. An analytic expression is derived for the power of a chi-square test of independence using either research-quality case/control samples alone, or augmented with silver standard data. The subset of the parameter space where inclusion of silver standard samples increases statistical power is identified. A case study of dementia subjects identified from electronic medical records from the Electronic Medical Records and Genomics (eMERGE) network, combined with subjects from two studies specifically targeting dementia, verifies these results.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23762230 PMCID: PMC3677889 DOI: 10.1371/journal.pone.0063481
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
The parameter space Ω considered in simulation of power.
| Parameter | Levels considered in simulations |
| R (number of gold controls per gold case) | 1, 2, 4 |
| γca (# silver cases per gold case) | 0, 1, 4 |
| γco (# silver controls per gold control) | 0, 1, 4 |
| φ (positive predictive value of silver case) | 0.6, 0.8, 1 |
| θ (negative predictive value of silver control) | 0.6, 0.8, 1 |
| RRAA (Relative risk in risk allele homozygote) | 1 |
| Nco (# gold controls) | 200, 1000, 5000 |
| m (risk allele frequency) | 5%, 10%, 30% |
| k (disease prevalence) | 0.1%, 1%, 30% |
| Genetic risk model | Dominant, recessive, or multiplicative |
denotes null model with no genetic risk.
Figure 1Error in asymptotic model.
Percentage error in asymptotic model plotted against non-centrality, λ(ω), for various ω in Ω. Grey bands give approximate 90% bounds on percentage error, such that 90% of realizations in any λ-interval lie inside the region enclosed by the error bands.
Figure 2A subset with increasing power.
Values of φ (y axis, between. 4 and 1) and R (x axis, values between 1 and 4) for which power is decreasing (dark) and increasing (light). Each panel shows a combination of prevalence, k by row (.05, .3) and homozygous relative risk RRAA by column, range 2–7. Prevalences <.05 are not shown here because of similarity to the panels for k = .05.
Parameters to estimate asymptotic power and results from association study for eMERGE gold standard (N = 2526) cohort.
| SNP | Nearest Gene | Het OR | MAF | Gold P |
| rs4938933# | MS4A4A |
| 0.39 | 0.18 |
| rs9349407# | CD2AP |
| 0.27 | 0.16 |
| rs11767557 | EPHA1 |
| 0.19 | 0.18 |
| rs3865444 | CD33 |
| 0.3 | 0.23 |
| rs6701713 | CR1 |
| 0.2 | 0.02 |
| rs1532278# | CLU | 0.89 | 0.36 | 0.59 |
| rs7561528 | BIN1 |
| 0.35 | 0.48 |
| rs561655# | PICALM |
| 0.34 | 0.58 |
| rs2075650 | APOE | 2.2 | 0.12 | <1e-21 |
| rs3752246# | ABCA7 | 1.13 | 0.19 | 0.40 |
Abbreviations: MAF, Minor Allele Frequency; Het OR, Heterozygous Odds Ratio; Gold P, P value in gold standard participants.
#denotes imputed loci.
Figure 3Empirical power and asymptotic power.
Comparison of empirical power (E) to asymptotic (A) for various γ and R = 1 (solid line), R = 2 (dashed), R = 4 (dotted) at two loci that nominally replicate in the gold standard subset. Power is shown as 20th percentile of X2 statistics over 1000 bootstrapped replicates for empirical graphs or as 20th percentile of the chi squared distribution for asymptotic graphs, with non-centrality determined from genotypic disease model given in table 2.
Participants by institution and genotyping center in combined gold/silver standard association study.
| Batch | Genotyping center | MF | VU | GH | MAYO | Total |
| 1 | CIDR | 222 | 270 | 2791 | 0 | 3283 |
| 2 | CIDR | 231 | 0 | 0 | 0 | 231 |
| 3 | BROAD | 0 | 0 | 0 | 265 | 265 |
| Total | 453 | 270 | 2791 | 265 | 3779 | |
Abbreviations: MF, Marshfield Clinic Personalized Medicine Research Project; VU, Vanderbilt University BioVU; GH, Group Health/University of Washington Adult Changes in Thought and Alzheimer's Disease Patient Registry; MAYO, Mayo Clinic biobank; CIDR, Center for Inherited Disease Research.
Concordance between power in logistic regression and slope of power curve given by equation 6.
| R | φ | RRAA = 1.4 | RRAA = 3 | RRAA = 9 |
| 1 | 0.6 | 1 | 1 | 1 |
| 1 | 0.8 | 0.75 | 1 | 1 |
| 2 | 0.6 | 0.33 | 0.83 | 1 |
| 2 | 0.8 | 0.92 | 1 | 1 |
| 4 | 0.6 | 0.75 | 0.92 | 0.83 |
| 4 | 0.8 | 0.92 | 1 | 1 |
Proportion of scenarios in which observed change in power agreed with predicted slope of chi-square power. The observed change in power is calculated as the sign of the difference of the median likelihood ratio statistic at γca = 0 and at γca = 1. 160 parameter values were considered, a subset of the parameter space described in table 1. 500 realizations at γca = 0 and at γca = 1 of each combination were undertaken to find the median likelihood ratio statistics.