Literature DB >> 23762230

Enhancing the power of genetic association studies through the use of silver standard cases derived from electronic medical records.

Andrew McDavid¹, Paul K Crane, Katherine M Newton, David R Crosslin, Wayne McCormick, Noah Weston, Kelly Ehrlich, Eugene Hart, Robert Harrison, Walter A Kukull, Carla Rottscheit, Peggy Peissig, Elisha Stefanski, Catherine A McCarty, Rebecca Lynn Zuvich, Marylyn D Ritchie, Jonathan L Haines, Joshua C Denny, Gerard D Schellenberg, Mariza de Andrade, Iftikhar Kullo, Rongling Li, Daniel Mirel, Andrew Crenshaw, James D Bowen, Ge Li, Debby Tsuang, Susan McCurry, Linda Teri, Eric B Larson, Gail P Jarvik, Chris S Carlson.

Abstract

The feasibility of using imperfectly phenotyped "silver standard" samples identified from electronic medical record diagnoses is considered in genetic association studies when these samples might be combined with an existing set of samples phenotyped with a gold standard technique. An analytic expression is derived for the power of a chi-square test of independence using either research-quality case/control samples alone, or augmented with silver standard data. The subset of the parameter space where inclusion of silver standard samples increases statistical power is identified. A case study of dementia subjects identified from electronic medical records from the Electronic Medical Records and Genomics (eMERGE) network, combined with subjects from two studies specifically targeting dementia, verifies these results.

Entities: CellLine Chemical Disease Gene Mutation Species

Mesh：

Substances：
Biomarkers

Year: 2013 PMID： 23762230 PMCID： PMC3677889 DOI： 10.1371/journal.pone.0063481

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

Genome-wide association studies (GWAS) increasingly examine conditions for which cases are difficult or expensive to ascertain using traditional research approaches, such as rare adverse reactions to medications. On the other hand, as more health systems computerize their health data into Electronic Medical Records (EMRs), biobanks linked to EMRs would offer a rich source of potential cases, if suitable criteria for distinguishing cases were developed. Such “silver standard” EMR-derived criteria would likely have lower positive predictive value (PPV) of phenotype than the methods used in a traditional study of a disease, but researchers who used such a regime could augment the size of their studies for only the cost of data mining and informatics. Immediately some practical concerns arise, such as whether inclusion of cases identified using a silver standard with a lower PPV might dilute the power of a study to detect a true genetic association. We address this concern by deriving an analytic expression for the power to detect an association using the chi-square test of independence, and confirm this expression by simulation. This analytic expression allows us to identify a subset of the parameter space Ω that characterizes a combined gold/silver study design that obtains increased power. The asymptotic expression and simulation framework are published in the R package bimetallic, available on cran.r-project.org, for researchers who wish to evaluate their own studies. The increased power of this subset is then validated in real data from a GWAS of dementia risk from the Electronic Medical Records and Genomics (eMERGE) network [1]. In eMERGE, genome-wide Single Nucleotide Polymorphism (SNP) data were obtained from participants in five distinct healthcare systems, and linked to the longitudinal EMR data available at each site. At one site, participants with genome-wide SNP data were drawn from a prospective cohort study designed to detect incident dementia cases, with cognitive ability measured at two-year intervals after enrollment. The other sites had EMR data for their consenting participants. Because genotype data were available from the other sites, the only cost to using these data in a GWAS of dementia was the effort required for informatics and analyses. Thus, we have the opportunity of using EMR-derived cases from the other sites to supplement the research-grade cohort study. The gold standard case and control data from the first site were used in the recently published multi-site Alzheimer's Disease Genetics Consortium GWAS of Alzheimer's disease [2]. In that study of gold standard cases and controls, nine different SNPs were associated with late-onset Alzheimer's disease (AD) at genome-wide significance levels (P<5×10−8), while one SNP had suggestive levels of association. Using these ten SNPs as positive controls, we compared the strength of association within eMERGE between analyses using solely the gold standard samples (n = 2526), or gold standard samples augmented with silver standard samples (n = 3369).

Methods

The power of a chi-square of independence with misclassification

Several authors have considered the effect of misclassification on estimation and inference in categorical responses. For chi-square tests on contingency tables, misclassification does not alter type-I error rates, but does reduce power [3]. The asymptotic power for chi-square tests with a given alternate hypothesis is known [4]. Others have applied this finding in the context of case-control genetic association studies under a constant rate of phenotype error and found an expression for the increase in sample size required to maintain constant power per percentage increase in misclassification [5].

Chi-square tests

Let G be a 2 by 3 table of observed counts of genotypes given presence or absence of a dichotomous trait i. Let Gi. be the marginal totals for the ith row (trait status) defined asand similarly G.j is the marginal total for the jth column (genotype). Under the null hypothesis of independence between rows and columns, the expected number in cell i, j is given by where N is the total number of counts in the table. Then the statistic defined byis distributed chi-square with 2 degrees of freedom under the null hypothesis of no association.

Distribution under alternative hypothesis

Under an alternative hypothesis of dependence on genotype frequency to trait status, X is distributed non-central chi-square with a non-centrality parameter λ that depends on the difference between the case and control genotype counts. Edwards [5], adopting results originally described by Mitra [4], showed that if M trait-present and M trait-deficient individuals are sampled thenwhere m and m are the (conditional) frequencies of genotype k in the trait-present and trait-deficient populations, respectively. With a non-centrality parameter λ and a desired significance value of α, the power to detect an association is given by, where is the cumulative distribution function (CDF) of the non-central chi-square distribution with 2 degrees of freedom and is the quantile function of the (central) chi-square distribution with 2 degrees of freedom. We note that power monotonically increases as λ increases. Finding the power under a multipart phenotypic misclassification model is a matter of deriving the relationship of the genotypic disease risk and the misclassification parameters on m and m. To do that we need to specify parameters for phenotypic misclassification and genotypic disease risk models.

A multipart phenotypic misclassification model

We draw a distinction between the terms “affected,” “unaffected,” ‘case,” and “control.” We consider affected/unaffected status to be a latent random variable Z that is unobserved in the silver standard subjects. Instead, a researcher observes X, the case or control criteria, for instance a set of criteria in an EMR. This leads to a 2×2 confusion matrix for silver standard subjects, with elements giving P(Z|X) values in silver standard subjects. Denote the diagonal elements as φ and θ. These elements are equivalent to the positive and negative predicted values, respectively, of the EMR criteria. A crucial assumption here is that X and genotype are conditionally independent given Z, i.e., genotype influences observed case/control status only through Z. Note that this implies that genotype errors are non-differential between cases and controls, a point which is further addressed in the Discussion. It also imposes a restriction on the path through which genotype affects the EMR phenotype. In particular, it could be the case that there are two conditions, Z and Z' which both are detected as X. In this case, the genotypes associated with Z or Z' will both appear to be associated with X, so one cannot conclude that an association between X and genotype is due to Z alone unless one can rule out the presence of the intermediary Z'. When this conditional independence can be assumed to hold, then for gold standard subjects, the researcher directly observes Z, or equivalently, takes φ and θ to be unity. In practice this may not be a realistic assumption. Freeing φ and θ is an easy extension computationally, however, complicates exposition symbolically. A simple modification of the “setup.chisq” function in bimetallic allows an investigator to freely specify all classification rates, however we do not treat this possibility in this paper. We express the numbers of gold and silver standard cases and controls primarily in terms of the number of gold standard controls, Nco, and base the number of gold standard cases, silver standard controls and silver standard cases with the ratios R, γca and γco. Table 1 enumerates these relationships. With the numbers and ratios of cases and controls defined thusly, total numbers of trait-present subjects M and trait-deficient subjects M are given by and .

Table 1

The parameter space Ω considered in simulation of power.

Parameter	Levels considered in simulations
R (number of gold controls per gold case)	1, 2, 4
γ_ca (# silver cases per gold case)	0, 1, 4
γ_co (# silver controls per gold control)	0, 1, 4
φ (positive predictive value of silver case)	0.6, 0.8, 1
θ (negative predictive value of silver control)	0.6, 0.8, 1
RR_AA (Relative risk in risk allele homozygote)	1†, 1.4, 3, 9
N_co (# gold controls)	200, 1000, 5000
m (risk allele frequency)	5%, 10%, 30%
k (disease prevalence)	0.1%, 1%, 30%
Genetic risk model	Dominant, recessive, or multiplicative

denotes null model with no genetic risk.

denotes null model with no genetic risk. The 2-by-3 matrix gives the affected and unaffected conditional genotype frequencies. (In the section below, a genotypic disease risk model that could be used to populate is described.) Then the permuted genotype frequencies in the silver standard population, the 2-by-3 matrix , is given by the matrix product ′ ·, where ′ is the matrix transpose of confusion matrix . This, for example, yields for the first cell in : Then the case (m and control (m ) conditional genotype frequencies in a mixed silver/gold study, may be expressed in terms of , and the phenotype misclassification model. The mixed frequencies m and m are merely weighted averages of and given by and Combining equations 2, 3 and 4 and simplifying yields λ in terms of the multipart phenotypic misclassification model: Using expression (5) and the fact that power is monotonic in λ allows the calculation of the marginal effect of adding a single silver standard case (or control), while holding other disease parameters fixed by finding, for example, If , then power increases with the inclusion of silver standard cases.

Genotypic Disease Risk Model

We adopt here a simplified version of Purcell's model for discrete traits [6], but any model for genotype frequencies conditioned on a dichotomous phenotype may be used. A bi-allelic locus in a diploid organism with genotypes AA, Aa and aa is assumed. For simplicity, it is assumed that Hardy-Weinberg holds at the margin for a locus with minor allele frequency m. Let the disease prevalence be given by P(Aff) = k and the homozygous relative risk be given by . We consider three models of allelic risk: dominant, recessive and multiplicative, corresponding to heterozygous relative risks equal to 1, RRAA and RRAA 1/2. Using the law of total probability, P(Aff|aa) may be expressed in terms offrom which the rest of the genotypic conditional disease probabilities may be derived. Note that the model is over-specified, in that some parameter values induce P(Aff|AA) or P(Aff|Aa) >1, which we refer to as “unphysical” parameter values.

Simulation Studies

We compared the power estimates derived in (5) to power in a multifactorial simulation of over 72900 different values in the 10-dimensional parameter space (4×39 –5832 unphysical values). Of the 72900 combinations, 19683 correspond to models having no association between genotype and phenotype, allowing examination of the sampling distribution of X2 from expression 1 under a null hypothesis. Table 1 shows the parameter values considered in the simulation. The values in simulation were selected in an attempt to bound the set of plausible parameters values and thus exhaustively test the validity of the approximation, rather than being the most likely values an investigator would consider. We wish to evaluate the fidelity of the asymptotic approximation of X2 to its true sampling distribution, as determined through stochastic simulation. So we simulated 500 replicates of each ω in Ω and calculated X2. We calculated , the value of the non-centrality parameter in equation 5 induced by ω. The 20th percentile of all X2, was adopted as the point of comparison. Since 80% of all realizations exceed this threshold, this percentile corresponds to the significance value achieved if power was fixed at 80% and type-I error allowed to vary. We compared against , the 20th percentile of the non-central chi-square distribution with two degrees of freedom and non-centrality parameter . The percentage error of using the asymptotic approximation was calculated asand was plotted for various ω in Figure 1.

Figure 1

Error in asymptotic model.

Error in asymptotic model.

Percentage error in asymptotic model plotted against non-centrality, λ(ω), for various ω in Ω. Grey bands give approximate 90% bounds on percentage error, such that 90% of realizations in any λ-interval lie inside the region enclosed by the error bands. In figure 2, the sign of λ', ie, the change in power, for a representative subset of the parameter space is depicted.

Figure 2

A subset with increasing power.

Values of φ (y axis, between. 4 and 1) and R (x axis, values between 1 and 4) for which power is decreasing (dark) and increasing (light). Each panel shows a combination of prevalence, k by row (.05, .3) and homozygous relative risk RRAA by column, range 2–7. Prevalences <.05 are not shown here because of similarity to the panels for k = .05.

A subset with increasing power.

Power at ADGC-identified SNPs using silver standard cases

In order to empirically validate our models, we used the genotypes of gold and silver standard eMERGE participants (described below) to compare power under expression 5 to a bootstrapped estimate of power under the 2×3 chi-square test of independence. Ten loci identified in two recent GWAS of AD [2], [7] were considered. Two loci had evidence of association using a chi-square test for independence between genotype and phenotype at P<.05 using gold standard participants (N = 2526) alone. We then considered a series of hypothetical studies including both gold and silver standard participants in varying ratios. Genotypes from the combined studies were resampled many times, to approximate the sampling distribution of X2 statistics through bootstrapping. More exactly, to compute the bootstrapped estimate, we sampled with replacement the genotypes at the locus in question, conditional on the genotypes belonging to the set of gold standard dementia cases, silver standard cases or gold standard controls in our study. Various ratios of gold and silver standard cases were used, corresponding to different γ and R in the model described above. 1000 replicates per locus per ratio-combination were found to calculate , the 20th percentile all X2 statistics. Then we compared to the asymptotic value, , by making assumptions about disease prevalence (k = 0.13), risk model (multiplicative) and the PPV of the EMR criteria (φ = 0.7). The minor allele frequencies and odds ratios (ORs) assumed are taken from the replication cohort from and are given by table 2, except for rs2075650, for which a homozygous OR of 3.2 was assumed.

Table 2

Parameters to estimate asymptotic power and results from association study for eMERGE gold standard (N = 2526) cohort.

SNP	Nearest Gene	Het OR	MAF	Gold P
rs4938933#	MS4A4A	0.88	0.39	0.18
rs9349407#	CD2AP	1.12	0.27	0.16
rs11767557	EPHA1	0.87	0.19	0.18
rs3865444	CD33	0.89	0.3	0.23
rs6701713	CR1	1.16	0.2	0.02
rs1532278#	CLU	0.89	0.36	0.59
rs7561528	BIN1	1.17	0.35	0.48
rs561655#	PICALM	0.87	0.34	0.58
rs2075650	APOE	2.2	0.12	<1e-21
rs3752246#	ABCA7	1.13	0.19	0.40

Abbreviations: MAF, Minor Allele Frequency; Het OR, Heterozygous Odds Ratio; Gold P, P value in gold standard participants.

#denotes imputed loci.

Abbreviations: MAF, Minor Allele Frequency; Het OR, Heterozygous Odds Ratio; Gold P, P value in gold standard participants. #denotes imputed loci. Results are presented in figure 3 below.

Figure 3

Empirical power and asymptotic power.

Comparison of empirical power (E) to asymptotic (A) for various γ and R = 1 (solid line), R = 2 (dashed), R = 4 (dotted) at two loci that nominally replicate in the gold standard subset. Power is shown as 20th percentile of X2 statistics over 1000 bootstrapped replicates for empirical graphs or as 20th percentile of the chi squared distribution for asymptotic graphs, with non-centrality determined from genotypic disease model given in table 2.

Empirical power and asymptotic power.

Gold standard cases and controls

Participants with gold standard case and control status were drawn from a study and its planned successor based at Group Health Cooperative in Seattle, a large health maintenance organization. The initial study provided only cases. The University of Washington/Group Health Cooperative Alzheimer's Disease Patient Registry (ADPR) provided 243 cases of AD. Case identification methods of the UW/GHC ADPR have been published [8]. Potential early AD cases were identified from a number of clinical data sources and were brought in for thorough neuropsychological and neurological examinations, from 1987 to 1996. Dementia was diagnosed using Diagnostic and Statistical Manual (DSM) III-R or IV criteria [9], and AD using NINCDS-ADRDA criteria [10]. DNA was extracted as previously described [11]. The succeeding study, the Adult Changes in Thought (ACT) study, provided both gold standard cases and all of the gold standard controls used in this study. Both studies have the same grant number and PI (U01 AG 06781, Eric Larson, PI). ACT includes urban and suburban elderly populations from a stable health management organization [12], [13]. ACT began as a cohort of 2,581 cognitively intact participants older than 64 years. Later, an expansion cohort (n = 811) was enrolled. Currently the study employs a continuous enrollment strategy to maintain approximately 2000 at-risk persons in the study, resulting in a total enrollment of 4,600 participants as of June 2012. ACT has an exemplary Completeness of Follow-up Index (95.6%) [14]. ACT participants are administered the Cognitive Abilities Screening Instrument [15] at baseline and again every 2 years. A 2-stage screening process is used to identify dementia cases; Cognitive Ability scores ≤85 prompt a dementia evaluation. Informant, subject, or staff reports of cognitive difficulties also trigger evaluation. The 2nd stage diagnostic examination includes neuropsychological testing and a neurological exam. Medical records are abstracted for standard labs and neuroimaging reports. If any are unavailable in the prior year they are requested. These data are used to complete DSM-IV diagnostic criteria for dementia and subtypes [9], NINCDS-ADRDA criteria for AD [10], and criteria for vascular dementia [9], [16], [17], [18]. All clinical data are reviewed at a consensus conference. These procedures are unchanged –and are conducted by the same personnel – as the ADPR case finding methods described earlier. ACT dementia and AD incidence rates are consistent with those found worldwide [12]. The IRB granted a waiver of consent for eMERGE for deceased ACT and ADPR participants, but required re-consent for living participants. We asked participants for consent by mail; participants with an imminent visit we asked in person. Participants were very receptive; we had an acceptance rate of 86%. We also made a great effort to obtain consent from the legally authorized representative for participants who had developed dementia. There were 391 individuals from ACT included in eMERGE genotyping with probable or possible AD, 121 with other forms of dementia, and 2,065 controls without dementia.

Ethics

The institutional review board of the Group Health Research Institute approved this study. Participants from the gold standard cohort gave written consent for genetic analyses either under the auspices of the Genetic Differences study (R01AG007584, Walter Kukull, PI) or gave written consent as indicated above. Participants in the silver-standard cohorts (described below) gave consent as follows: Marshfield [1] and Mayo [19] participants gave written consent upon enrollment in their respective studies or biobanks. Vanderbilt's participants gave written consent to entry in the BioVU biobank on their consent-to-treatment forms before blood is drawn for clinical purposes, or could opt out at that time [20].

Derivation of an EMR-based silver standard Alzheimer's case definition

We used data from the ACT study to develop an EMR-derived silver standard case definition [21]. The development set consisted of 537 cases meeting DSM-IV criteria for dementia and 2915 dementia-free controls. We divided the development set into training and test sets to avoid overfitting. No participants in our development set were under 65 years old, but we also set an a priori exclusion on participants younger than 65 years at first ICD-9 code/medication fill to screen out early onset AD participants that might be found in younger populations. We considered several sources of data for the model, including ICD-9 codes for Alzheimer's disease and dementia, specialty of the healthcare provider using those codes, other events at visits producing those codes such as neuroimaging tests or laboratory tests used to diagnose dementia subtypes (e.g. TSH or B12 levels), and pharmacy data for medications used to treat dementia such as memantine, donepezil, galantamine, or rivastigmine. We considered data for each case up to the date they were evaluated for dementia by the ACT study. We did this because the ACT study notifies the primary care providers of enrollees whom the study identifies as having dementia, and this notification likely influences subsequent medical care and resulting ICD-9 codes. This truncation mechanism limits the severity and overtness of the dementia cases. As dementia progresses, it becomes more clinically obvious, and less likely to be missed. Our choice to truncate clinical data at the time of dementia diagnosis suggests that our PPV will be a conservative estimate of the accuracy of silver standard case criteria in other settings, where cases could have any level of dementia severity. Our first priority for the EMR-derived case definition was to maximize the PPV of our case definition, and the second priority was to maximize sensitivity of the definition. The criteria “five or more qualifying ICD-9 codes, or one or more Alzheimer's medication fills” provided the highest PPV and sensitivity in our training set. In the test set, these criteria yielded a PPV of 0.73 and a sensitivity of 55%. The specific drugs and ICD-9 codes are indicated in Document S1. For the purposes of power calculations, we assume that the PPV found in the ACT study cases serves as a lower bound on the PPV in the other sites when they applied the algorithm for the reasons detailed above. Note that there may be differences between silver standard sites in the PPV. This differential classification will not impact our power calculations as long as the PPV used in the calculations is indeed a lower bound for all of the silver standard sites.

Source of silver standard cases

Three other sites within eMERGE were selected to implement the EMR-case definition on the basis of the participant demographics at the sites. The Marshfield Clinic Personalized Medicine Research Project, the Mayo Clinic's biobank and Vanderbilt University's biobank BioVU have been described previously [1],[19],[20],[22]. Data from Marshfield included a subset (N = 153) that had previously been evaluated, also with an EMR-based algorithm. As these individuals were identified from a clinical delivery system rather than a research study design that characterized our gold-standard cases, we treated the Marshfield cases, including the subset previously re-evaluated as published in [23], as silver-standard cases in the analyses presented here. The number of cases contributed from each of these sites is listed in table 3.

Table 3

Participants by institution and genotyping center in combined gold/silver standard association study.

Batch	Genotyping center	MF	VU	GH	MAYO	Total
1	CIDR	222	270	2791	0	3283
2	CIDR	231	0	0	0	231
3	BROAD	0	0	0	265	265
Total		453	270	2791	265	3779

Abbreviations: MF, Marshfield Clinic Personalized Medicine Research Project; VU, Vanderbilt University BioVU; GH, Group Health/University of Washington Adult Changes in Thought and Alzheimer's Disease Patient Registry; MAYO, Mayo Clinic biobank; CIDR, Center for Inherited Disease Research. Since the negative predictive value of EMR proxies for absence of dementia appeared to be poor, we did not pursue adding silver standard controls. The high prevalence of dementia in elderly populations means that the specificity of any proxy must be quite high for the negative predictive value to be acceptable. In conditions with lower background prevalence and more dramatic clinical features likely to be picked up in the course of routine clinical care, searching for silver standard controls would be more reasonable.

Genome-wide SNP methods

The Group Health, Mayo Clinic, Marshfield, and Vanderbilt samples were genotyped at the Center for Inherited Disease Research at John Hopkins University or the Broad Institute of Harvard and the Massachusetts Institute of Technology; 84% of samples were genotyped in batch 1, which we treated as the primary dataset for the purposes of quality control. The samples in batch 1 were block randomized by phenotype and study center, with assay plate as the blocking factor. All sets and samples were genotyped on the Illumina Human660W-Quadv1_A array. Genotypes were called by the respective genotyping centers using the software package GenomeStudio. We undertook an extensive quality control process, using software packages PLINK v1.07 [24] and R [25] and following published protocols [26]. We began with a total of 3,779 unique samples. We tested for and removed sex-discrepant samples, samples with significant kinship and samples with non-European ancestry [27]. Population structure appeared to be well controlled (genomic control coefficient 1.003). All samples exceeded a call rate of 98%. After the above filtering steps, a set of 672 gold standard cases, 1854 gold standard controls and 843 silver standard cases remained (total n = 3,369, 89% of the original samples). We also examined the quality of individual SNPs with several metrics. We received genotypes at a total of approximately 560,000 SNPs from the genotyping centers. We removed monomorphic SNPs, SNPs with call rates lower than 98%, and SNPs with more than 1 replicate discrepancy. We screened for technical artifacts in the genotype clustering between the primary and secondary datasets by using common controls. For some loci in our bootstrap power comparison (noted in table 2), we imputed the value of the locus because it was not genotyped directly. We used the software package IMPUTE2, with 120 European samples from the 1000 Genome Project [28] and a multi-ethnic set of 1920 samples from HapMap phase 3 as reference panels [29], [30]. The studies are available from dbGAP under accession number phs000234.v1.p1.

Results

Simulation study

Figure 1 shows E(ω), the percentage error of between the sampling distribution imposed by (5) and the simulated, true distribution of X2 statistics under a wide array of values considered in Ω. For small values of λ, the asymptotic distribution does depart from the true distribution. However, as λ increases, the error decreases and the asymptotic distribution better approximates the true distribution. Thus for larger sample sizes, one may use the analytic expression for slope of the asymptotic power function, as available with the function “dlambda” in bimetallic to test the benefit of including a silver-standard case under a desired study design. In the null models, a two-sided Kolmogorov-Smirnov test finds a decisive lack of fit to . (P<10−) over all the 7.8×10 realizations of X. This is indicative of the asymptotic convergence of X to . Indeed, as the effective sample size of the simulation increases, the goodness of fit increases, such that there is no evidence of departure from (Kolmogorov-Smirnov P = 0.8) for N ≥4000 evenly split between cases and controls (194,400 realizations of X considered). Figure S1 suggests that under simulation type-I error is maintained at nominal levels: the P-values from null models are uniformly distributed. Figure S2 illustrates bias in point estimates of allelic ORs under the misclassification model. In particular, ORs are biased towards one. This bias is a function of φ and γca. However, since estimates of ORs in GWAS are biased (away from one) inherently when the samples used to ascertain significance are also used to estimate ORs [31], we do not believe this is a practical impediment for investigators who wish to use the combined study designs we describe here for discovery of linked loci. We do recommend that investigators locate an independent replication set (measured without error), or utilize double sampling methods previously described for the purposes of calculating ORs [32], [33].

A subset of Ω for which power is increasing in γca

If (6) is positive, power increases with the addition of silver standard cases to the analysis. We demonstrate a range of disease/diagnosis models for which this is true in Figure 2. We examine here a multiplicative risk model (RRAA = RRAa 2) and a risk allele with population frequency m = 0.3, and plot positive predictive value of silver standard diagnosis (φ) versus gold control:gold case ratio (R) for combinations of disease prevalence (k) and relative risk in risk homozygotes (RRAA), but note that these results hold qualitatively for many other risk models and m (data not shown). The most important relationship observed is between R and φ. The inclusion of silver standard cases with relatively low PPV (φ) can still increase the power of a study if the ratio of controls to cases (R) is relatively high. Other minor features of the model are that smaller values of RRAA allow smaller values of φ at large R, and that risk models that result in high penetrance SNPs (such as the k = 0.3 and RRAA >5 panels) require larger φ for all R.

Application to previously identified SNPs using silver standard cases

Figure 3 plots the 20th percentile of X2 statistics for two AD risk loci under various ratios of gold and silver standard cases and controls, described above in Methods. These loci replicate at P<.05 in the gold standard case/control set. The “empirical power” curves (suffixed with E) are determined by bootstrap at each abscissa of γ. The asymptotic power (suffixed with A) is determined by , with ω parameters given by table 2. Power is plotted for ratios of γ (abscissa) and for control:case ratios R of 1, 2 and 4. Although power is systematically overestimated at rs2075650 and underestimated at rs6701713, the shape of empirical curves matches the shape of asymptotic curves: higher R yield marginally greater returns to γ, and for R = 1, power is reduced by including silver standard cases. Some reasons for systematic deviation of empirical power from asymptotic power are described in the Discussion below.

Discussion

Samples with high-density genotyping data available across multiple phenotypes in an EMR are a potentially valuable resource for genomic association studies. Our study demonstrates that even for a disease with a relatively low PPV of EMR diagnosis, there are realistic scenarios for which the addition of silver standard participants boosts the power to detect a true association. We find that there is no inflation of type-I error under such scenarios. There is very good agreement between asymptotic and simulated power, and good agreement between bootstrapped and asymptotic power. However, estimated asymptotic power deviated modestly from the true power to detect a disease. The genetic risk model is not identifiable from the data alone, so these deviations may stem from incorrect assumptions on the mode of inheritance (dominant, recessive, or otherwise). An overestimate of power could be indicative of phenotype or genotype error in our GWAS samples. Although there is much discussion of bias in ORs derived from GWAS and factors that inflate the type I error rate [31], [34], we know of no study comparing observed to expected power for well-characterized risk loci. We find that two parameters should have greatest influence on investigators contemplating augmentation of their GWAS with silver standard samples. Of greatest import is the control:case ratio for gold standard phenotypes, R. When excess gold standard controls are available (high R ratio) the inclusion of silver standard cases yields the greatest improvements in power. Inversely, at small R, scenarios exist such that incorporating additional silver standard cases reduces power. Of secondary importance is the PPV of the criteria used to identify silver standard cases. We show in figure 2 that there exists a minimum PPV for silver standard cases to result in positive power for hypothesized small effect sizes. This minimum is subject to other parameters of the risk model, but is typically around 0.6. The evidence presented here that differential error in phenotype classification has limited and predictable effect on hypothesis testing must be tempered by the fact that silver-standard predictive values are unlikely to be known without access to a cohort measured along both dimensions. Indeed, as described in the Methods, the existence of the GHC cohort, measurable by both criteria was intrinsic to the development of the EMR criteria. However, since it is the PPV (as opposed to the sensitivity or specificity) that needs to be estimated, conducting a secondary chart review or additional diagnostic tests in a subset of the silver standard population will suffice. There are successful examples of this approach described for peripheral arterial disease [19], diabetes [35] and other phenotypes [36]. Although this manuscript suggests that combining cases from multiple studies can yield improvements in power, we recommend unified genotyping of the experiment, so that phenotype (eg, gold and silver standard cases and controls) may be randomized across nuisance factors (like plate or chip version). Caution must be exercised when such randomization cannot be performed, since differential genotyping error between phenotypes can result in not only reduced power, but also spurious findings [37], [38]. In practice, additional data will need to be collected unless the experimental design is flawless. This data could take the form of double sampling all genotypes in a subset of subjects, which allows the estimation of error rate for each locus and application of tests that efficiently use such double sampling to correct for differential genotype error [39] This additional data could also simply be the validation of interesting findings via an alternate, lower throughput technology in which appropriate experimental design is applied. We readily acknowledge that chi-square tests are unlikely to be optimal in many GWAS. However, we believe a characterization of their power is useful due to the existence of closed-form formulae. This makes it feasible to consider a variety of scenarios, and to examine power at the margin of an additional sample, with the expectation that the qualitative results will continue to hold in more complex models. Replacing the chi-squared test with logistic regression in a representative subset of the parameter space considered in the simulation study supports this assertion. In 88% of scenarios, the predicted slope of the power curve given by expression 6 matched the observed change in power after adding silver standard cases. See table 4.

Table 4

Concordance between power in logistic regression and slope of power curve given by equation 6.

R	φ	RR_AA = 1.4	RR_AA = 3	RR_AA = 9
1	0.6	1	1	1
1	0.8	0.75	1	1
2	0.6	0.33	0.83	1
2	0.8	0.92	1	1
4	0.6	0.75	0.92	0.83
4	0.8	0.92	1	1

Proportion of scenarios in which observed change in power agreed with predicted slope of chi-square power. The observed change in power is calculated as the sign of the difference of the median likelihood ratio statistic at γca = 0 and at γca = 1. 160 parameter values were considered, a subset of the parameter space described in table 1. 500 realizations at γca = 0 and at γca = 1 of each combination were undertaken to find the median likelihood ratio statistics. In conclusion, the re-use of samples with available high-density genotype data and rich phenotypic data (such as in an EMR) can cost-effectively enhance statistical power under a range of realistic scenarios. P-values from null models in simulation are approximately uniformly distributed. (EPS) Click here for additional data file. Bias in point estimates of odds ratio for various φ and γca with k = .01, R = 4, m = .3, RR (EPS) Click here for additional data file. ICD-9 Codes and Medications defining silver standard cases from EMR. (XLS) Click here for additional data file.

33 in total

1. AN INVESTIGATION OF THE EFFECT OF MISCLASSIFICATION ON THE PROPERTIES OF CHI-2-TESTS IN THE ANALYSIS OF CATEGORICAL DATA.

Authors: V L MOTE; R L ANDERSON
Journal: Biometrika Date: 1965-06 Impact factor: 2.445

2. A cost-effective statistical method to correct for differential genotype misclassification when performing case-control genetic association.

Authors: Douglas Londono; Chad Haynes; Francisco M De La Vega; Stephen J Finch; Derek Gordon
Journal: Hum Hered Date: 2010-07-03 Impact factor: 0.444

3. Effects of differential genotyping error rate on the type I error probability of case-control studies.

Authors: Valentina Moskvina; Nick Craddock; Peter Holmans; Michael J Owen; Michael C O'Donovan
Journal: Hum Hered Date: 2006-04-06 Impact factor: 0.444

4. Exercise is associated with reduced risk for incident dementia among persons 65 years of age and older.

Authors: Eric B Larson; Li Wang; James D Bowen; Wayne C McCormick; Linda Teri; Paul Crane; Walter Kukull
Journal: Ann Intern Med Date: 2006-01-17 Impact factor: 25.391

5. A map of human genome variation from population-scale sequencing.

Authors: Gonçalo R Abecasis; David Altshuler; Adam Auton; Lisa D Brooks; Richard M Durbin; Richard A Gibbs; Matt E Hurles; Gil A McVean
Journal: Nature Date: 2010-10-28 Impact factor: 49.962

6. The Cognitive Abilities Screening Instrument (CASI): a practical test for cross-cultural epidemiological studies of dementia.

Authors: E L Teng; K Hasegawa; A Homma; Y Imai; E Larson; A Graves; K Sugimoto; T Yamaguchi; H Sasaki; D Chiu
Journal: Int Psychogeriatr Date: 1994 Impact factor: 3.878

Review 7. A comprehensive review of genetic association studies.

Authors: Joel N Hirschhorn; Kirk Lohmueller; Edward Byrne; Kurt Hirschhorn
Journal: Genet Med Date: 2002 Mar-Apr Impact factor: 8.822

8. Common variants at MS4A4/MS4A6E, CD2AP, CD33 and EPHA1 are associated with late-onset Alzheimer's disease.

Authors: Adam C Naj; Gyungah Jun; Gary W Beecham; Li-San Wang; Badri Narayan Vardarajan; Jacqueline Buros; Paul J Gallins; Joseph D Buxbaum; Gail P Jarvik; Paul K Crane; Eric B Larson; Thomas D Bird; Bradley F Boeve; Neill R Graff-Radford; Philip L De Jager; Denis Evans; Julie A Schneider; Minerva M Carrasquillo; Nilufer Ertekin-Taner; Steven G Younkin; Carlos Cruchaga; John S K Kauwe; Petra Nowotny; Patricia Kramer; John Hardy; Matthew J Huentelman; Amanda J Myers; Michael M Barmada; F Yesim Demirci; Clinton T Baldwin; Robert C Green; Ekaterina Rogaeva; Peter St George-Hyslop; Steven E Arnold; Robert Barber; Thomas Beach; Eileen H Bigio; James D Bowen; Adam Boxer; James R Burke; Nigel J Cairns; Chris S Carlson; Regina M Carney; Steven L Carroll; Helena C Chui; David G Clark; Jason Corneveaux; Carl W Cotman; Jeffrey L Cummings; Charles DeCarli; Steven T DeKosky; Ramon Diaz-Arrastia; Malcolm Dick; Dennis W Dickson; William G Ellis; Kelley M Faber; Kenneth B Fallon; Martin R Farlow; Steven Ferris; Matthew P Frosch; Douglas R Galasko; Mary Ganguli; Marla Gearing; Daniel H Geschwind; Bernardino Ghetti; John R Gilbert; Sid Gilman; Bruno Giordani; Jonathan D Glass; John H Growdon; Ronald L Hamilton; Lindy E Harrell; Elizabeth Head; Lawrence S Honig; Christine M Hulette; Bradley T Hyman; Gregory A Jicha; Lee-Way Jin; Nancy Johnson; Jason Karlawish; Anna Karydas; Jeffrey A Kaye; Ronald Kim; Edward H Koo; Neil W Kowall; James J Lah; Allan I Levey; Andrew P Lieberman; Oscar L Lopez; Wendy J Mack; Daniel C Marson; Frank Martiniuk; Deborah C Mash; Eliezer Masliah; Wayne C McCormick; Susan M McCurry; Andrew N McDavid; Ann C McKee; Marsel Mesulam; Bruce L Miller; Carol A Miller; Joshua W Miller; Joseph E Parisi; Daniel P Perl; Elaine Peskind; Ronald C Petersen; Wayne W Poon; Joseph F Quinn; Ruchita A Rajbhandary; Murray Raskind; Barry Reisberg; John M Ringman; Erik D Roberson; Roger N Rosenberg; Mary Sano; Lon S Schneider; William Seeley; Michael L Shelanski; Michael A Slifer; Charles D Smith; Joshua A Sonnen; Salvatore Spina; Robert A Stern; Rudolph E Tanzi; John Q Trojanowski; Juan C Troncoso; Vivianna M Van Deerlin; Harry V Vinters; Jean Paul Vonsattel; Sandra Weintraub; Kathleen A Welsh-Bohmer; Jennifer Williamson; Randall L Woltjer; Laura B Cantwell; Beth A Dombroski; Duane Beekly; Kathryn L Lunetta; Eden R Martin; M Ilyas Kamboh; Andrew J Saykin; Eric M Reiman; David A Bennett; John C Morris; Thomas J Montine; Alison M Goate; Deborah Blacker; Debby W Tsuang; Hakon Hakonarson; Walter A Kukull; Tatiana M Foroud; Jonathan L Haines; Richard Mayeux; Margaret A Pericak-Vance; Lindsay A Farrer; Gerard D Schellenberg
Journal: Nat Genet Date: 2011-04-03 Impact factor: 38.330

9. LRTae: improving statistical power for genetic association with case/control data when phenotype and/or genotype misclassification errors are present.

Authors: Sandra Barral; Chad Haynes; Millicent Stone; Derek Gordon
Journal: BMC Genet Date: 2006-04-27 Impact factor: 2.797

10. Power and sample size calculations in the presence of phenotype errors for case/control genetic association studies.

Authors: Brian J Edwards; Chad Haynes; Mark A Levenstien; Stephen J Finch; Derek Gordon
Journal: BMC Genet Date: 2005-04-08 Impact factor: 2.797

12 in total

Review 1. Unravelling the human genome-phenome relationship using phenome-wide association studies.

Authors: William S Bush; Matthew T Oetjens; Dana C Crawford
Journal: Nat Rev Genet Date: 2016-02-15 Impact factor: 53.242

2. Diagnosed dementia and the risk of motor vehicle crash among older drivers.

Authors: Laura A Fraade-Blanar; Ryan N Hansen; Kwun Chuen G Chan; Jeanne M Sears; Hilaire J Thompson; Paul K Crane; Beth E Ebel
Journal: Accid Anal Prev Date: 2018-03-07

3. Improving the power of genetic association tests with imperfect phenotype derived from electronic medical records.

Authors: Jennifer A Sinnott; Wei Dai; Katherine P Liao; Stanley Y Shaw; Ashwin N Ananthakrishnan; Vivian S Gainer; Elizabeth W Karlson; Susanne Churchill; Peter Szolovits; Shawn Murphy; Isaac Kohane; Robert Plenge; Tianxi Cai
Journal: Hum Genet Date: 2014-07-26 Impact factor: 4.132

4. TESTING POPULATION-SPECIFIC QUANTITATIVE TRAIT ASSOCIATIONS FOR CLINICAL OUTCOME RELEVANCE IN A BIOREPOSITORY LINKED TO ELECTRONIC HEALTH RECORDS: LPA AND MYOCARDIAL INFARCTION IN AFRICAN AMERICANS.

Authors: Logan Dumitrescu; Kirsten E Diggins; Robert Goodloe; Dana C Crawford
Journal: Pac Symp Biocomput Date: 2016

5. Phenotype validation in electronic health records based genetic association studies.

Authors: Lu Wang; Scott M Damrauer; Hong Zhang; Alan X Zhang; Rui Xiao; Jason H Moore; Jinbo Chen
Journal: Genet Epidemiol Date: 2017-10-11 Impact factor: 2.135

6. Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data.

Authors: Joshua C Denny; Lisa Bastarache; Marylyn D Ritchie; Robert J Carroll; Raquel Zink; Jonathan D Mosley; Julie R Field; Jill M Pulley; Andrea H Ramirez; Erica Bowton; Melissa A Basford; David S Carrell; Peggy L Peissig; Abel N Kho; Jennifer A Pacheco; Luke V Rasmussen; David R Crosslin; Paul K Crane; Jyotishman Pathak; Suzette J Bielinski; Sarah A Pendergrass; Hua Xu; Lucia A Hindorff; Rongling Li; Teri A Manolio; Christopher G Chute; Rex L Chisholm; Eric B Larson; Gail P Jarvik; Murray H Brilliant; Catherine A McCarty; Iftikhar J Kullo; Jonathan L Haines; Dana C Crawford; Daniel R Masys; Dan M Roden
Journal: Nat Biotechnol Date: 2013-12 Impact factor: 54.908

7. GWAS and Beyond: Using Omics Approaches to Interpret SNP Associations.

Authors: Hung-Hsin Chen; Lauren E Petty; William Bush; Adam C Naj; Jennifer E Below
Journal: Curr Genet Med Rep Date: 2019-02-14

8. The effects of electronic medical record phenotyping details on genetic association studies: HDL-C as a case study.

Authors: Logan Dumitrescu; Robert Goodloe; Yukiko Bradford; Eric Farber-Eger; Jonathan Boston; Dana C Crawford
Journal: BioData Min Date: 2015-05-06 Impact factor: 2.522

Review 9. eMERGEing progress in genomics-the first seven years.

Authors: Dana C Crawford; David R Crosslin; Gerard Tromp; Iftikhar J Kullo; Helena Kuivaniemi; M Geoffrey Hayes; Joshua C Denny; William S Bush; Jonathan L Haines; Dan M Roden; Catherine A McCarty; Gail P Jarvik; Marylyn D Ritchie
Journal: Front Genet Date: 2014-06-17 Impact factor: 4.599

10. Comparing high-dimensional confounder control methods for rapid cohort studies from electronic health records.

Authors: Yen Sia Low; Blanca Gallego; Nigam Haresh Shah
Journal: J Comp Eff Res Date: 2015-12-04 Impact factor: 1.744