| Literature DB >> 29459775 |
Laura J Corbin1,2, Vanessa Y Tan1,2, David A Hughes1,2, Kaitlin H Wade1,2, Dirk S Paul3,4, Katherine E Tansey5, Frances Butcher6, Frank Dudbridge7, Joanna M Howson3, Momodou W Jallow8,9, Catherine John7, Nathalie Kingston10, Cecilia M Lindgren11,12,13,14, Michael O'Donavan15, Stephen O'Rahilly16, Michael J Owen15, Colin N A Palmer17, Ewan R Pearson17, Robert A Scott18, David A van Heel19, John Whittaker8,20, Tim Frayling21, Martin D Tobin7,22, Louise V Wain7,22, George Davey Smith1,2, David M Evans1,2,23, Fredrik Karpe24,25, Mark I McCarthy12,24,25, John Danesh3,4,26,27, Paul W Franks24,28,29,30, Nicholas J Timpson31,32.
Abstract
Detailed phenotyping is required to deepen our understanding of the biological mechanisms behind genetic associations. In addition, the impact of potentially modifiable risk factors on disease requires analytical frameworks that allow causal inference. Here, we discuss the characteristics of Recall-by-Genotype (RbG) as a study design aimed at addressing both these needs. We describe two broad scenarios for the application of RbG: studies using single variants and those using multiple variants. We consider the efficacy and practicality of the RbG approach, provide a catalogue of UK-based resources for such studies and present an online RbG study planner.Entities:
Mesh:
Year: 2018 PMID: 29459775 PMCID: PMC5818506 DOI: 10.1038/s41467-018-03109-y
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1Properties of RbG strata compared to randomised control trials. a For randomised controlled trials (RCTs), participants are randomly allocated to intervention or control groups. Randomisation should equally distribute any confounding variables between the two groups. b For Recall-by-Genotype (RbG) studies, strata are defined by genotype and, analogous to RCTs, potential confounding factors are equally distributed between groups. Hence, RbG studies are not subject to reverse causality or confounding factors with respect to the phenotype under study
Fig. 2Contrast between phenotype and genotype-based sampling strategies. Histograms show the distributions of a body mass index (BMI) and b the BMI genetic risk score (GRS) in the Avon Longitudinal Study of Parents and Children (ALSPAC). For a description of the ALSPAC data, please see Supplementary Note 2. Red bars represent the top and bottom 30% of these distributions. Mean differences in BMI, systolic blood pressure (SBP) and confounding factors (alcohol, income and education) were compared between the top and bottom 30% of the a BMI and b BMI GRS distribution. a For extreme-phenotype recall studies, participants at the extreme ends of the phenotypic distribution are invited to participate in the study. As an exemplar of this, phenotype data from 1855 individuals in ALSPAC was used. While differences in BMI and SBP are observed between the top and bottom 30% of the BMI distribution, extreme-phenotype sampling strategies are often prone to confounding and potential reverse causality (as shown by the association of the recalled strata with confounding factors). b In contrast, RbG studies have the ability to generate reliable gradients of biological difference in combination with essentially randomised groups. As an exemplar of this, genetic data from 1420 individuals in ALSPAC was used to generate a BMI GRS. Differences in BMI and SBP are observed between the top and bottom 30% of the BMI GRS distribution that are not prone to confounding and reverse causality (as shown by the lack of association of the recalled strata with confounding factors)
Fig. 3Comparative power: RbGsv versus random recall study design. a Top panel: A comparison of power (y-axis) achieved by an RbGsv study design versus a random sample selection design for a given minor allele frequency (MAF) and standardized per-allele effect size. The x-axis is the total sample size of the recall experiment. Solid lines represent the situation where an equal number of major and minor homozygotes are recruited. Dashed lines represent the situation where an equal number of major homozygotes and heterozygotes are recruited. Lower panel: A representation of the difference (y-axis) between the power within an RbGsv study design and that from the equivalent random recall experiment. Solid lines represent the situation where an equal number of major and minor homozygotes are recruited. Dashed lines represent the situation where an equal number of major homozygotes and heterozygotes are recruited. b An illustration of the expected number of participants with genotypic data (y-axis) needed in order to recruit sufficient minor homozygotes or heterozygotes for a given RbGsv study sample size (x-axis) and minor allele frequency (MAF) (assuming HWE and a 100% participation rate). Solid lines represent the situation where an equal number of major and minor homozygotes are recruited. Dashed lines represent the situation where an equal number of major homozygotes and heterozygotes are recruited. For details of how the power calculations were carried out, see Supplementary Note 1. Here we assume a Type I error rate (alpha) of 0.05 and equal-sized genotype groups
Fig. 4Comparative power: RbGmv versus random recall study design. a Top panel: A comparison of power (y-axis) achieved by an RbGmv study design versus a random sample selection design for a given (variance in exposure explained by the genetic risk score (GRS)) and percentile. The x-axis is the total sample size. Lower panel: A representation of the difference (y-axis) between the power within an RbGmv study design and that from the equivalent random recall experiment. In both the top and bottom panels, solid lines represent the situation where the variance in outcome explained by exposure () is equal to 0.3 and dashed lines represent the situation where is equal to 0.1. b An illustration of the minimum recruitment rate needed in order to recruit sufficient study participants for a given RbGmv study sample size (x-axis) and percentile. Solid lines represent the situation where the size of the genotyped cohort (or biobank) is equal to 5000 people and dashed lines represent the situation where the size of the genotyped cohort (or biobank) is equal to 10,000 people. For details of how the power calculations were carried out, see Supplementary Note 1. Here we use the analytical method and assume a Type I error rate (alpha) of 0.05 and equal-sized genotype groups. The ‘percentile’ is the threshold used to recruit from the GRS distribution in the genotyped cohort (or biobank) in the RbGmv study (e.g., percentile 5 corresponds to recruitment from the top and bottom 5%)
UK patient and population-based studies available for RbG studies
| Study | Sample size | Local phenotypic expertise | Patient group/ population sample |
|---|---|---|---|
| The Avon Longitudinal Study of Parents and Children (ALSPAC) | ~9000 (mother child duos) & ~2000 trios. Smaller number of children of index participants (third gen) | Lifecourse epidemiology—birth cohort (‘complete’ phenotyping) | Population-based cohort |
| East London Genes & Health (ELGH) | 26,476 (at Nov. 2017, actively recruiting, total sample size 100 k) | Human knockouts, primary care e-health records, diabetes and cardiovascular | Population-based cohort (Bangladeshi and Pakistani ethnicity, age > 16) |
| EXtended Cohort for E-health, Environment and DNA (EXCEED) | Over 9300 recruits to date; recruitment planned to continue to 10,000 | Cardiovascular, respiratory, renal, metabolic, infectious disease and cancer | Population-based cohort (aged 30–69) |
| Exeter 10,000 (EXTEND) | 10,000 | Type 2 diabetes, ischaemic heart disease, vascular function and healthy ageing | Population-based sample (based in Exeter; enriched for patients with diabetes; aged > 18) |
| Genetics of Diabetes and Audit Research Tayside Study (GoDARTS) | 9439 cases and 8187 controls | Complete EMR linkage, type 2 diabetes, heart disease, asthma and cancer | Case−control cohort |
| INTERVAL | 50,000 | >6000 molecular phenotypes, including serum NMR metabolomics, plasma MS lipidomics and metabolomics, plasma proteomics, Sysmex FBC, hepcidin and others | Population-based sample of healthy blood donors |
| National Centre for Mental Health | Over 10,000 | Mental health conditions | Population-based cohort (variety of mental health conditions; all ages; primarily Wales-based) |
| The Oxford Biobank | 7900 | Metabolic and anthropometric, obesity | Random, population-based sample of healthy 30–50-year-old men and women (Oxfordshire) |
| Scottish Health Research Register (SHARE) | 50,000 samples obtained. 155,000 consented for spare blood interception | Complete EMR linkage. Type 2 diabetes, heart disease, asthma and cancer. Mobile App Patient Reported Outcomes. | Population-based cohort |
| Generation Scotland: Scottish Family Health Study (GS:SFHS) | 20,032 | Complete EHR linkage, urinary traits and kidney disease, eye phenotypes, family based data analysis | Family-based population cohort |
NMR, nuclear magnetic resonance; MS, mass spectrometry; EHR, electronic health record; EMR, electronic medical records; FBC, full blood count. An expanded version of this table with additional information can be found in Supplementary Table 1