Anurag Verma1,2, Yuki Bradford1, Scott Dudek1, Anastasia M Lucas1, Shefali S Verma1,2, Sarah A Pendergrass3, Marylyn D Ritchie4,5. 1. Department of Genetics and Institute for Biomedical Informatics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA, USA. 2. The Huck Institutes of the Life Science, Pennsylvania State University, University Park, PA, USA. 3. Biomedical and Translational Informatics, Geisinger, Danville, PA, USA. 4. Department of Genetics and Institute for Biomedical Informatics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA, USA. marylyn@pennmedicine.upenn.edu. 5. The Huck Institutes of the Life Science, Pennsylvania State University, University Park, PA, USA. marylyn@pennmedicine.upenn.edu.
Abstract
BACKGROUND: Phenome-wide association studies (PheWAS) are a high-throughput approach to evaluate comprehensive associations between genetic variants and a wide range of phenotypic measures. PheWAS has varying sample sizes for quantitative traits, and variable numbers of cases and controls for binary traits across the many phenotypes of interest, which can affect the statistical power to detect associations. The motivation of this study is to investigate the various parameters which affect the estimation of statistical power in PheWAS, including sample size, case-control ratio, minor allele frequency, and disease penetrance. RESULTS: We performed a PheWAS simulation study, where we investigated variations in statistical power based on different parameters, such as overall sample size, number of cases, case-control ratio, minor allele frequency, and disease penetrance. The simulation was performed on both binary and quantitative phenotypic measures. Our simulation on binary traits suggests that the number of cases has more impact on statistical power than the case to control ratio; also, we found that a sample size of 200 cases or more maintains the statistical power to identify associations for common variants. For quantitative traits, a sample size of 1000 or more individuals performed best in the power calculations. We focused on common genetic variants (MAF > 0.01) in this study; however, in future studies, we will be extending this effort to perform similar simulations on rare variants. CONCLUSIONS: This study provides a series of PheWAS simulation analyses that can be used to estimate statistical power for some potential scenarios. These results can be used to provide guidelines for appropriate study design for future PheWAS analyses.
BACKGROUND: Phenome-wide association studies (PheWAS) are a high-throughput approach to evaluate comprehensive associations between genetic variants and a wide range of phenotypic measures. PheWAS has varying sample sizes for quantitative traits, and variable numbers of cases and controls for binary traits across the many phenotypes of interest, which can affect the statistical power to detect associations. The motivation of this study is to investigate the various parameters which affect the estimation of statistical power in PheWAS, including sample size, case-control ratio, minor allele frequency, and disease penetrance. RESULTS: We performed a PheWAS simulation study, where we investigated variations in statistical power based on different parameters, such as overall sample size, number of cases, case-control ratio, minor allele frequency, and disease penetrance. The simulation was performed on both binary and quantitative phenotypic measures. Our simulation on binary traits suggests that the number of cases has more impact on statistical power than the case to control ratio; also, we found that a sample size of 200 cases or more maintains the statistical power to identify associations for common variants. For quantitative traits, a sample size of 1000 or more individuals performed best in the power calculations. We focused on common genetic variants (MAF > 0.01) in this study; however, in future studies, we will be extending this effort to perform similar simulations on rare variants. CONCLUSIONS: This study provides a series of PheWAS simulation analyses that can be used to estimate statistical power for some potential scenarios. These results can be used to provide guidelines for appropriate study design for future PheWAS analyses.
Entities:
Keywords:
EHR; ICD-9 codes; PheWAS; Power analysis; Simulation study
Phenome-wide association study (PheWAS) has been implemented in a variety of different studies, like within the eMERGE network [1-5], using electronic health record (EHR) information that includes international classification of disease version 9 (ICD-9) code based diagnoses, laboratory test measurements and demographic information [6-12]. Other PheWAS have used data from epidemiological studies [13, 14], as well as clinical trials [8, 9] such as the AIDS clinical trial group (ACTG), which consist of measurements for different clinical domains like pharmacology, metabolism, virology, and immunology [15, 16]. Cohorts like these with a large number of measurements for every individual have made PheWAS a practical approach when scanning over hundreds and thousands of phenotypes in a high-throughput way. PheWAS generates genetic association hypotheses for further study and provides insights through cross-phenotype associations.Unlike genome-wide association studies (GWAS) where one phenotype is investigated in a study population, PheWAS uses a wide range of phenotypes collected for a variety of reasons for each dataset, often with minimal curation. Thus, in PheWAS, the data collected for different measurements can vary considerably in sample size, including the numbers of cases for diagnoses, depending on the rarity of the diagnosis. This makes the estimation of statistical power for PheWAS a challenge. For example, in electronic health record (EHR) data, one of the most commonly used data types to define case-control status is through ICD-9 codes; these codes provide information on disease diagnosis, procedures, and medications in the form of three- to five-digit codes. The longitudinal ICD-9 data collected over many years varies between patients due to multiple factors, such as differences in the frequency of patient visits, differences in length of records due to different start and end dates, and incomplete patient medical history. These factors generate sparseness and missing information in the data and, hence, variability in the number of cases, the case-control ratio, and the overall sample size in case-control study designs. These factors can then affect downstream association testing. Three issues exist for measures with low sample sizes: 1) low statistical power to identify or replicate genetic associations and, 2) potentially biased estimates in analyses with low sample size, and 3) an increase in multiple hypothesis testing burden through including low powered phenotypes that may not provide insights but increase the number of statistical tests.In Table 1, we provide the range of sample sizes, including case and control counts from already published PheWAS analyses. It highlights that some studies have included binary phenotype with case counts as low as 11 and for quantitative traits, we found phenotypes with sample size as low as 48 individuals (Table 1). Using the information from these existing studies, our goal for this study is to investigate differences in the power estimates by testing different parameters commonly used in a PheWAS analysis. We investigated these parameters using a simulation approach to characterize and determine the effect of case numbers or case-control ratio in a case-control study, as well as the sample size for quantitative trait PheWAS. With our findings from the simulation analysis, we provided recommendations to consider for future PheWAS study design.
Table 1
Distribution of Samples in Published PheWAS. In this table, we highlight the range of cases and controls for PheWAS on binary phenotypes and range of sample sizes for the quantitative phenotypes used in few PheWAS analyses in literature
Study
Min/Max Case Counts Range
Min/Max Control Count Range
Binary Phenotypes (EHR-based ICD-9 codes)
Denny et al. [23]
25/8185
4206/13746
Hebbring et al. [9]
11/3450
NA
Namjou et al. [24]
23/1131
1840/3609
Simonti et al. [25]
20/8128
794/13258
Shameer et al. [26]
22/6082
NA
Karnes et al. [11]
40/12197
9354/28737
Quantitative Phenotypes
Study
Sample Size Range
Karaca et al. [27]
323–973
Hall et al. [14]
258–6363
Moore and Verma et al. [16]
1175–3269
Pendergrass et al. [13]
48–1331
Distribution of Samples in Published PheWAS. In this table, we highlight the range of cases and controls for PheWAS on binary phenotypes and range of sample sizes for the quantitative phenotypes used in few PheWAS analyses in literature
Methods
Simulation design
To investigate the power of the PheWAS approach and the sample size requirements, we simulated an additive genetic model with a range of effect sizes, case-control ratios, and minor allele frequencies. The simulation design assumes no confounding effects due to the environmental exposure and covariates. Genotypes were simulated as SNPs, which were drawn according to Hardy Weinberg Equilibrium (HWE) probabilities. Since multiple diseases and phenotypes are investigated in an EHR-based PheWAS, the prevalence of the diseases could differ in the study population. In our simulated datasets, we used a constant disease prevalence, which is represented as ß0 in the regression model and set to 0.1 [17]. We represented the effect sizes in terms of penetrance functions, which is a combination of genotypes and the risk of diseases. The penetrance function is useful in estimating the probability of disease given the genotype in a specific population. It is used to assign the case disease status to samples whose genotypes are influencing the disease risk and vice versa for controls. We used a custom script written in R [18] to generate random population-based samples with genotypes in HWE and their phenotypes using different input parameters as shown in Fig. 1. We simulated both binary and quantitative trait phenotypes. For quantitative trait phenotypes, the same penetrance function was used to generate the normalized distribution of phenotypes where genotypes of samples at the upper and lower end of the distribution are associated with phenotype values. All the samples in the simulated datasets were drawn at random, so there is no relatedness among the individuals.
Fig. 1
Simulation workflow. We designed a simulation approach to test different testing parameters for a PheWAS analysis and their effect on the power estimates. For each combination of testing parameters, we generated 1000 simulated datasets. Then, we performed association testing using logistic and linear regression for binary and quantitative phenotypes respectively. The power estimates were calculated for each combination of parameter setting at significance level of α = 0.00025 (Binary trait) and α = 0.004 (Quantitative trait)
Simulation workflow. We designed a simulation approach to test different testing parameters for a PheWAS analysis and their effect on the power estimates. For each combination of testing parameters, we generated 1000 simulated datasets. Then, we performed association testing using logistic and linear regression for binary and quantitative phenotypes respectively. The power estimates were calculated for each combination of parameter setting at significance level of α = 0.00025 (Binary trait) and α = 0.004 (Quantitative trait)For binary phenotypes, we generated the simulated datasets by varying the following parameter settings: cases, case-control ratios, SNP minor allele frequencies, and disease penetrance. In this simulation study, we only investigated a study population with unbalanced and unmatched cases and controls. For example, we simulated a dataset with a random set of 30 individuals. The parameter settings for this simulated dataset were as follows: case-control ratio = 1:2, cases = 10, controls = 20, disease penetrance =0.15, and SNP MAF = 0.01. The simulated dataset was generated for four SNPs and 10 phenotypes, including one SNP-phenotype model with signal, and other models were simulated as noise. The noise was added to evaluate any systematic bias in the power estimates. The noise SNPs were generated by randomly assigning the genotypes in the study population but keeping the MAF the same as the signal SNP. We randomly assigned the cases for the noise phenotypes. Under each parameter setting, we generated 1000 datasets and then calculated associations using logistic regression. Please refer to Fig. 1 for all the different combinations of parameter values used for simulation.For the continuous or quantitative trait simulations, we investigated the power estimates similarly by varying the sample size, minor allele frequencies, and disease penetrance. The simulated dataset was generated for four SNPs and one phenotype, with one signal SNP-phenotype model, and the rest was noise data. We generated 3 noise SNPs as in the binary phenotype simulations. Again, we generated 1000 datasets for each parameter setting and then used linear regression to calculate associations with the quantitative trait. Please refer to Fig. 1 for all the different combinations of parameter settings used for the quantitative trait simulations. All the association testing for binary and quantitative phenotypes was performed using PLATO [19].We calculated the power estimates by counting the number of associations below an alpha value based on total number of tests within each set of 1000 simulated datasets for all parameter settings. For binary trait, we used α = 0.00025 (0.01/40) and for the quantitative trait, we used α = 0.004 (0.01/4).
Results
Binary trait simulations
We designed a simulation approach with different combinations of genotype and phenotype parameters and then performed association testing so as to investigate the factors that could influence the power to detect the signal.In Fig. 2, we show trends in the estimates of power at α = 0.00025 for the parameters used in the simulations. First, we observed an increase in power with an increase in penetrance irrespective of any change in other parameters, and this is expected as highly penetrant diseases are more likely to be identified even with small numbers of samples (this is due to high effect size). We also determined that the ratio between cases and controls does not have much impact on the power. The number of cases is what primarily influences the power to detect genetic associations. For example, as shown in Fig. 2, the case-control ratio has a negligible effect on power, whereas with the increase in case numbers, we see the increase in the power to identify an association. These simulations also show the importance of minor allele frequency threshold when calculating associations on genotype models with an additive effect. Here, we find that all of the simulation models showed increased performance, with minor allele frequency greater than 5%. The models with lower frequency variants (MAF between 1% and 5%) did not reach 100% power until the case threshold of 1000 samples, and it was only represented in the model simulated with high disease penetrance. We observed that the common variants (MAF > 1%) have signal when there are 200 or more cases. The Type 1 error for the parameter settings used to design simulation dataset is well controlled (Additional file 1: Figure S1) and we show an example of Type 1 error for one parameter set simulated data with cases = 200 in Fig. 3.
Fig. 2
Binary Trait Power Results. The power estimates of the simulations are represented in the gradient color. Each panel represents the power estimate for a specific parameter setting. For each panel, minor allele frequencies are represented on the y-axis; the disease penetrance appear on the x-axis; the case numbers appear on the top, and the respective case-control ratios are on the right side of the box
Fig. 3
Binary Trait Type I Errors. The plot shows the Type I errors for different parameter settings at cases = 200. Each panel represents the different case-control ratio used for the simulation dataset. The Type I error on the y-axis is calculated based on the number of false positive association below significance level of α = 0.00025. The disease penetrance is represented on the x-axis and each colored point represent different MAF used in the simulations
Binary Trait Power Results. The power estimates of the simulations are represented in the gradient color. Each panel represents the power estimate for a specific parameter setting. For each panel, minor allele frequencies are represented on the y-axis; the disease penetrance appear on the x-axis; the case numbers appear on the top, and the respective case-control ratios are on the right side of the boxBinary Trait Type I Errors. The plot shows the Type I errors for different parameter settings at cases = 200. Each panel represents the different case-control ratio used for the simulation dataset. The Type I error on the y-axis is calculated based on the number of false positive association below significance level of α = 0.00025. The disease penetrance is represented on the x-axis and each colored point represent different MAF used in the simulations
Quantitative trait simulations
We also performed similar simulation analyses on quantitative traits (such as clinical lab variables) to identify a sample size threshold for multi-phenotype -based studies like PheWAS. For quantitative traits, we used different sample sizes for the simulation, ranging from 10 to 25,000, as these are based on sample sizes we observed in previously published EHR or clinical trials based PheWAS analyses (Table 1) [20, 21]. As shown in Fig. 4, we observed almost no power until the dataset had approximately 1000 samples for a phenotype with a penetrance of 0.15 and; as expected, we saw the increase in power with higher penetrance, even in smaller sample sizes. Around the sample size of 1000, we see an increase in power, with a slight variation with different minor allele frequencies. Again, variants with rare minor allele did not perform well until we had reached a sample size of 1000 and a penetrance of 0.3. These quantitative trait simulations suggest that a threshold of 1000 samples for models with MAF greater than 5% in PheWAS is recommended. Also, for the variants with MAF < 5% require either association analyses with much larger sample sizes or a different statistical approach to evaluate rare variants. As shown in Fig. 5, the Type I error for quantitative trait simulations are also well controlled.
Fig. 4
Quantitative Trait Power Results. Power estimates of the simulations are represented in the gradient color. Each panel represents the sample size of the simulated dataset, and for each panel, the minor allele frequencies are represented on the y-axis; the disease penetrance is on the x-axis
Fig. 5
Quantitative Trait Type I errors. The plot shows the Type I errors for different parameter settings for quantitative trait simulations. Each panel represents the different sample sizes. The Type I error on the y-axis is calculated based on the number of false positive association below significance level of α = 0.00025. The disease penetrance is represented on the x-axis and each colored point represent different MAF used in the simulations
Quantitative Trait Power Results. Power estimates of the simulations are represented in the gradient color. Each panel represents the sample size of the simulated dataset, and for each panel, the minor allele frequencies are represented on the y-axis; the disease penetrance is on the x-axisQuantitative Trait Type I errors. The plot shows the Type I errors for different parameter settings for quantitative trait simulations. Each panel represents the different sample sizes. The Type I error on the y-axis is calculated based on the number of false positive association below significance level of α = 0.00025. The disease penetrance is represented on the x-axis and each colored point represent different MAF used in the simulations
Discussion
PheWAS provides the genomic landscape for multiple phenotypes, but a challenge of PheWAS is the range of sample sizes and case numbers inherent when using a wide range of data instead of a single phenotype like in GWAS. For example, there are 14,025 possible ICD-9 diagnosis codes and 3824 procedure codes in a EHR system used by healthcare providers. With the introduction of ICD-10, the number of ICD-based codes has further increased to approximately 66,000. Testing 14,025 diagnosis codes for association with up to 1 million or more genetic variants results in a very high multiple testing burden. Usually, a large fraction of codes have very low case numbers due to the rarity of the diagnoses, and thus, they may not be sufficiently powered for association detection. For example, Geisinger Health System (GHS) is one of the largest healthcare providers in central Pennsylvania, with an EHR system including ~ 1.2 million unique patients. Using the EHR data in Geisinger for ~ 100,000 consented participants in the MyCode Community Health Initiative [20], we evaluated the extent of the variability in the number of ICD-9 codes by case count. In order to account for misdiagnosis, we defined a patient as a case for an ICD-9 code only when a patient had three or more independent visits where that specific code was represented in the patient’s record. Out of 14,025 codes for data collected between the years 1996 and 2015 for ICD-9 codes alone, 33% were not present at all and ~ 30% ICD-9 based diagnoses had less than 10 patients with that code (case count < 10). In Fig. 6, we show the trends on ICD-9 codes with cases at different thresholds and even after dropping out more than 60% of the ICD-9 codes, there are still 3568 ICD-9 codes with 10 or more patients labeled as cases. This can increase the number of ICD-9 code based phenotypes for association testing, and it adds to the multiple hypothesis burden. In our binary trait simulations, we show that we need 200 or more cases to have enough power to detect genetic associations for common variants (MAF > 0.01). So, at that threshold, there are 831 ICD-9 codes with at least 200 patients with status as the case (as shown in Fig. 6). Based on our simulations, we recommend using 200 as a case threshold for the common variant PheWAS analysis.
Fig. 6
A number of ICD-9 codes, by case threshold, are generated from EHR data of 100,000 Geisinger’s MyCode participants. The x-axis represents the different case thresholds we used in the binary data simulations. The y-axis, it shows the number of ICD-9 codes for each case threshold (> = cases) in 100,000 MyCode participants. For each ICD-9 code, the cases are defined as the individuals with three or more visits to the clinic
A number of ICD-9 codes, by case threshold, are generated from EHR data of 100,000 Geisinger’s MyCode participants. The x-axis represents the different case thresholds we used in the binary data simulations. The y-axis, it shows the number of ICD-9 codes for each case threshold (> = cases) in 100,000 MyCode participants. For each ICD-9 code, the cases are defined as the individuals with three or more visits to the clinicUsing the findings from these simulations, we addressed three issues related to low sample size and its impact on PheWAS approach. First, the impact of low sample size is evident in quantitative trait simulations, which suggests that the sample size of 1000 individuals for each phenotype is important to consider in the study design. However, in binary trait simulations, we observed that overall sample size does not affect the power, but instead the number of cases drives the power estimates. Secondly, low Type 1 error across all parameter settings (Figs. 3 and 5, Additional file 1: Figure S1) shows no systematic bias in the regression method. However, low sample size or low case numbers will not have enough statistical power to detect the associations. Lastly, we demonstrate that using the above-suggested thresholds of case numbers for binary traits and sample size for quantitative traits can help with the selection of phenotypes and reduce the number of tests and; hence, this can reduce the multiple hypothesis testing burden.
Limitations
Using the simulation approach, we were able to identify the parameters impacting the power to determine genetic associations and we provided recommendations for PheWAS analysis design. However, there can be other factors that can influence the power of PheWAS analysis. We primarily ran all the simulations based on a regression model (linear or logistic regression), but there are now many other statistical methods for phenome-wide association analysis [22]. Further extensions of these simulation studies to explore other statistical methods will be important. We limited our investigation in this study to an additive effect of genotypes. However, it will be important to investigate other genetic effects as well such as dominant, recessive, weighted, and interaction. There are also other factors that can influence the power estimates; these include environmental exposure, confounding covariates (age, sex, and ancestry. In the future, we will plan to include such factors in our simulation design.
Conclusions
PheWAS have become a common tool to explore the genotype-phenotype landscape of large biobanks linked to comprehensive phenotype/trait data collections as in EHRs, clinical trials, or epidemiological cohort studies. This high-throughput analysis approach has been met with much success in recent years [4, 6, 14]. However, the community has been lacking guidance for making study design decisions regarding sample size, case to control ratios, and minor allele frequency thresholds. At present, there is not a PheWAS Power Calculator available to researchers. Thus, we implemented a large-scale simulation study to provide some guidelines for understanding the statistical power of PheWAS analyses under different scenarios. We believe these simulation results provide the needed power estimates for future PheWAS analysis decisions.Figure S1. Binary Trait Type I Errors. The plot shows the Type I errors for different parameter settings. Each panel represents the different case number on the top and case-control ratio on the right which was used for the simulation dataset. The Type I error on the y-axis is calculated based on the number of false positive association below significance level of α = 0.00025. The disease penetrance is represented on the x-axis and each colored point represent different MAF used in the simulations. (PNG 379 kb)Summary Results: The summary results used to generate the figures. (XLSX 56 kb)
Authors: Corinne N Simonti; Benjamin Vernot; Lisa Bastarache; Erwin Bottinger; David S Carrell; Rex L Chisholm; David R Crosslin; Scott J Hebbring; Gail P Jarvik; Iftikhar J Kullo; Rongling Li; Jyotishman Pathak; Marylyn D Ritchie; Dan M Roden; Shefali S Verma; Gerard Tromp; Jeffrey D Prato; William S Bush; Joshua M Akey; Joshua C Denny; John A Capra Journal: Science Date: 2016-02-12 Impact factor: 47.728
Authors: Catherine A McCarty; Rex L Chisholm; Christopher G Chute; Iftikhar J Kullo; Gail P Jarvik; Eric B Larson; Rongling Li; Daniel R Masys; Marylyn D Ritchie; Dan M Roden; Jeffery P Struewing; Wendy A Wolf Journal: BMC Med Genomics Date: 2011-01-26 Impact factor: 3.063
Authors: Joshua C Denny; Marylyn D Ritchie; Melissa A Basford; Jill M Pulley; Lisa Bastarache; Kristin Brown-Gentry; Deede Wang; Dan R Masys; Dan M Roden; Dana C Crawford Journal: Bioinformatics Date: 2010-03-24 Impact factor: 6.937
Authors: Jason H Karnes; Lisa Bastarache; Christian M Shaffer; Silvana Gaudieri; Yaomin Xu; Andrew M Glazer; Jonathan D Mosley; Shilin Zhao; Soumya Raychaudhuri; Simon Mallal; Zhan Ye; John G Mayer; Murray H Brilliant; Scott J Hebbring; Dan M Roden; Elizabeth J Phillips; Joshua C Denny Journal: Sci Transl Med Date: 2017-05-10 Impact factor: 17.956
Authors: Joshua C Denny; Lisa Bastarache; Marylyn D Ritchie; Robert J Carroll; Raquel Zink; Jonathan D Mosley; Julie R Field; Jill M Pulley; Andrea H Ramirez; Erica Bowton; Melissa A Basford; David S Carrell; Peggy L Peissig; Abel N Kho; Jennifer A Pacheco; Luke V Rasmussen; David R Crosslin; Paul K Crane; Jyotishman Pathak; Suzette J Bielinski; Sarah A Pendergrass; Hua Xu; Lucia A Hindorff; Rongling Li; Teri A Manolio; Christopher G Chute; Rex L Chisholm; Eric B Larson; Gail P Jarvik; Murray H Brilliant; Catherine A McCarty; Iftikhar J Kullo; Jonathan L Haines; Dana C Crawford; Daniel R Masys; Dan M Roden Journal: Nat Biotechnol Date: 2013-12 Impact factor: 54.908
Authors: Omri Gottesman; Helena Kuivaniemi; Gerard Tromp; W Andrew Faucett; Rongling Li; Teri A Manolio; Saskia C Sanderson; Joseph Kannry; Randi Zinberg; Melissa A Basford; Murray Brilliant; David J Carey; Rex L Chisholm; Christopher G Chute; John J Connolly; David Crosslin; Joshua C Denny; Carlos J Gallego; Jonathan L Haines; Hakon Hakonarson; John Harley; Gail P Jarvik; Isaac Kohane; Iftikhar J Kullo; Eric B Larson; Catherine McCarty; Marylyn D Ritchie; Dan M Roden; Maureen E Smith; Erwin P Böttinger; Marc S Williams Journal: Genet Med Date: 2013-06-06 Impact factor: 8.822
Authors: Molly A Hall; Anurag Verma; Kristin D Brown-Gentry; Robert Goodloe; Jonathan Boston; Sarah Wilson; Bob McClellan; Cara Sutcliffe; Holly H Dilks; Nila B Gillani; Hailing Jin; Ping Mayo; Melissa Allen; Nathalie Schnetz-Boutaud; Dana C Crawford; Marylyn D Ritchie; Sarah A Pendergrass Journal: PLoS Genet Date: 2014-12-04 Impact factor: 5.917
Authors: Molly A Hall; John Wallace; Anastasia Lucas; Dokyoon Kim; Anna O Basile; Shefali S Verma; Cathy A McCarty; Murray H Brilliant; Peggy L Peissig; Terrie E Kitchner; Anurag Verma; Sarah A Pendergrass; Scott M Dudek; Jason H Moore; Marylyn D Ritchie Journal: Nat Commun Date: 2017-10-27 Impact factor: 14.919
Authors: Anurag Verma; Anastasia Lucas; Shefali S Verma; Yu Zhang; Navya Josyula; Anqa Khan; Dustin N Hartzel; Daniel R Lavage; Joseph Leader; Marylyn D Ritchie; Sarah A Pendergrass Journal: Am J Hum Genet Date: 2018-03-29 Impact factor: 11.025
Authors: Ridge Dershem; Raghu P R Metpally; Kirk Jeffreys; Sarathbabu Krishnamurthy; Diane T Smelser; Michal Hershfinkel; David J Carey; Janet D Robishaw; Gerda E Breitwieser Journal: J Biol Chem Date: 2019-10-18 Impact factor: 5.157
Authors: Robert H George Markowitz; Abigail Leavitt LaBella; Mingjian Shi; Antonis Rokas; John A Capra; Jane F Ferguson; Jonathan D Mosley; Seth R Bordenstein Journal: Proc Natl Acad Sci U S A Date: 2022-06-24 Impact factor: 12.779
Authors: Katherine A Fawcett; Kijoung Song; Guoqing Qian; Aliki-Eleni Farmaki; Richard Packer; Catherine John; Nick Shrine; Raquel Granell; Sue Ring; Nicholas J Timpson; Laura M Yerges-Armstrong; Richard Eastell; Louise V Wain; Robert A Scott; Martin D Tobin; Ian P Hall Journal: ERJ Open Res Date: 2021-05-10
Authors: Xue Li; Xiangrui Meng; Athina Spiliopoulou; Maria Timofeeva; Wei-Qi Wei; Aliya Gifford; Xia Shen; Yazhou He; Tim Varley; Paul McKeigue; Ioanna Tzoulaki; Alan F Wright; Peter Joshi; Joshua C Denny; Harry Campbell; Evropi Theodoratou Journal: Ann Rheum Dis Date: 2018-02-06 Impact factor: 19.103
Authors: Dipender Gill; Beben Benyamin; Luke S P Moore; Grace Monori; Ang Zhou; Fotios Koskeridis; Evangelos Evangelou; Mike Laffan; Ann P Walker; Konstantinos K Tsilidis; Abbas Dehghan; Paul Elliott; Elina Hyppönen; Ioanna Tzoulaki Journal: PLoS Med Date: 2019-06-20 Impact factor: 11.069
Authors: Maya S Safarova; Benjamin A Satterfield; Xiao Fan; Erin E Austin; Zhan Ye; Lisa Bastarache; Neil Zheng; Marylyn D Ritchie; Kenneth M Borthwick; Marc S Williams; Eric B Larson; Aaron Scrol; Gail P Jarvik; David R Crosslin; Kathleen Leppig; Laura J Rasmussen-Torvik; Sarah A Pendergrass; Amy C Sturm; Bahram Namjou; Amy Sanghavi Shah; Robert J Carroll; Wendy K Chung; Wei-Qi Wei; QiPing Feng; C Michael Stein; Dan M Roden; Teri A Manolio; Daniel J Schaid; Joshua C Denny; Scott J Hebbring; Mariza de Andrade; Iftikhar J Kullo Journal: NPJ Genom Med Date: 2019-02-11 Impact factor: 8.617
Authors: Xinyuan Zhang; Yogasudha Veturi; Shefali Verma; William Bone; Anurag Verma; Anastasia Lucas; Scott Hebbring; Joshua C Denny; Ian B Stanaway; Gail P Jarvik; David Crosslin; Eric B Larson; Laura Rasmussen-Torvik; Sarah A Pendergrass; Jordan W Smoller; Hakon Hakonarson; Patrick Sleiman; Chunhua Weng; David Fasel; Wei-Qi Wei; Iftikhar Kullo; Daniel Schaid; Wendy K Chung; Marylyn D Ritchie Journal: Pac Symp Biocomput Date: 2019