The Precision Medicine Initiative[1] announced by President Obama aims to be a novel approach to disease treatment and prevention by allowing for individual variability in genes, environment, and lifestyle. Although its immediate goal is to support clinical trials of targeted cancer therapies based on a tumor’s molecular signature, about half of its $200+ million budget for this year is aimed at a new cohort study of 1 million people for research on etiology and prevention.[1,2] Here, I focus on the relevance of genomic research to personalized prevention.[3-6]In his seminal article entitled “Sick individuals and sick populations,” Rose[7] distinguished the 2 primary goals of epidemiology: discovering “the determinants of individual cases and the determinants of incidence rate.” He called the corresponding strategies for prevention as the “high-risk approach,” which seeks to protect susceptible individuals, and as the “population approach,” which seeks to control the underlying causes of incidence. In the genomics era, these correspond to using an individual’s genetic profile to target primary and secondary prevention versus classical public health strategies to clean up the environment or promote healthful behaviors so as to reduce the overall burden of disease. Population-wide interventions may provide large benefits to the population as a whole while offering little benefit to most individuals (the “prevention paradox”[8]), so the trade-off depends in part on how much risk is confined to an identifiable minority.This commentary discusses 2 main themes: the evaluation of targeted versus population-wide interventions and the need for causal inference methods to accomplish this. I will illustrate these points with examples from primary prevention (air pollution regulation and genetic targeting of smoking cessation) and from secondary prevention (colonoscopies, as discussed in the companion article[9]).
THE COUNTER-FACTUAL NATURE OF PREVENTION RESEARCH
It is often argued that the importance of research on gene–environment (G × E) interactions is that it could lead to novel prevention strategies based on modifiable risk factors. The difficulty with using observational studies to evaluate prevention programs lies in the causal interpretation of the effect of an intervention. Even if an exposure is associated with disease across individuals, it does not follow that changing exposure would change any individual’s disease risk. The general idea of counter-factual inference can be understood in terms of a hypothetical 2 × 2 table[10] comparing the potential outcomes of the same individual under the alternative exposure scenarios. The outcomes of subjects on the main diagonal are unaffected by exposure (“doomed” or “immune”). The only ones for whom exposure has any effect (caused or prevented) are those in the off-diagonal cells. The net benefit of removing exposure is thus the difference in the size of these two cells. Unfortunately, it is only in a crossover trial that we observe individual outcomes under both exposure scenarios and these are only possible for acute responses and unethical for hazardous exposures. In observational studies, we see only the margins—the distribution of outcomes among exposed and among unexposed individuals—and these are different individuals. We therefore have to assume there is no uncontrolled confounding, that is, that the groups are comparable in terms of the distribution of other risk factors, or at least that all known confounders have been taken into account in the analysis. Only a randomized controlled trial can ensure that, and only in expectation. Causal inference[11-15] changes the target of inference from association across individuals to the mean difference within individuals in their expected outcomes under exposed and unexposed scenarios (the “average causal effect”). Details of the various procedures differ, but one relatively easy to understand approach begins with a model for the “propensity” for exposure based on the available confounder information and then reweights subjects inversely to examine the effect of exposure in a hypothetically unconfounded population.Hernán et al.[16] have refined the potential outcomes approach based on inverse probability weighting, illustrating the different conclusions obtained in the case of AZT (zidovudine) treatment in HIV/AIDS survival and reconciling the conflicting results on hormone replacement therapy use and cardiovascular disease from observational epidemiology studies and the randomized Women’s Health Initiative trial.[17] None of those examples bear directly on primary prevention; however, so I begin with an example applied to population-based prevention.
Causal Inference for Population-wide Primary Prevention
One of the most effective interventions for reducing all-cause mortality nationwide has been the Clean Air Act of 1970. A huge body of epidemiologic evidence has demonstrated associations between ambient air pollution and various health outcomes. But how can we assess the impact specifically of the Clean Air Act on mortality? Zigler et al.[18] used causal inference to compare the counties of southwest United States that were in and out of compliance in 1990 in terms of changes in air pollution and mortality over the following decade. Since counties could differ in ways that might confound any comparisons, they developed a spatial hierarchical model for the propensity to be in compliance and then computed the expected pollution and mortality outcomes for each county under their observed compliance status and their counterfactual ones. They found that the average causal effect of regulation was a 2.7% reduction in mortality and an improvement in air quality attributable to regulation. However, only about half the change in mortality was directly attributable to improvements in ozone and PM10, whereas the rest was mediated by changes in other risk factors (possibly including other pollutants). This approach to evaluating policy could be helpful in many other areas besides air pollution.[19]
GENETIC TARGETING PREVENTION PROGRAMS
Any prevention effort presumes that we have an effective intervention and that intervention changes outcomes. Next, would either be improved by targeting high-risk populations, say by genetics? This then presumes that we have a way of identifying high-risk individuals, assuming we can obtain genotypes or a surrogate like family history from the population at risk. Finally, is the intervention more effective in that group (gene–treatment interaction)?
Individual Variability
Although many people see some heavy smokers living to old age without lung cancer while some nonsmokers get lung cancer at an early age as evidence of interindividual variability in sensitivity to tobacco smoking (e.g.,[21]), Peto[21] argued that this observation is consistent with cancer simply being a stochastic process with a homogeneous baseline risk: “To ask why a particular individual failed to get cancer is probably as meaningless as asking why a particular uranium atom failed to decay.” Even if there were no interindividual variation, we would still see some smokers surviving to old age and other nonsmokers getting cancer at young ages. Of course, we have identified some differences in risk due to known genetic variants, health status, contextual factors, and so on, and family studies show that there must be more genetic variation still to be discovered.The predictive value of a risk model is usually evaluated by its receiver operating characteristic curve. For Crohn’s disease and type I diabetes, the area under the curve (AUC) is substantial, but for most cancers, the predictive value of genetic risk scores is modest (AUCs in the range of 55%–70%),[22-24] comparable to the AUCs obtained from models based only on family history or nongenetic factors, like the Gail models for breast[25] or colorectal[26] cancers. Furthermore, the addition of genetic markers to nongenetic risk factors is often modest, typically about a 5% improvement in the AUC.[27,28] This will improve with larger consortia, but there may be a practical limit. For example, meta-analysis of height in 134,000 subjects has identified more than 100 variants, but they explain only about 7% of the total phenotypic variation; to raise that figure just to 15% would require a sample size of almost 500,000 individuals.[29] Even the projected AUCs for all future single nucleotide polymorphisms will likely still be modest for most cancers.[30,31]
Gene–Environment Interactions
G × E interactions have been studied for specific genes and specific exposures with limited success for several decades, the poster child being NAT2 and smoking for bladder cancer.[32] Only recently have genome-wide interaction studies been done for specific exposures and metabolomic data used for environment-wide association studies[33] and G × E-wide interaction studies.[34-36] More efficient statistical methods are being developed for such studies,[37] but applications to date have been disappointing. Colorectal cancer is an obvious candidate, as there are 14 established environmental risk and protective factors, most of them modifiable. Based on these findings, several prevention trials have been launched,[38-46] aimed at prevention of further adenomas among individuals who have already had at least one. While not designed as genetically targeted trials, some postrandomization gene–treatment interaction analyses have been reported.[47,48] There have been numerous observational studies of G × E interactions with candidate genes,[49-54] but a comprehensive analysis of a genetic risk score composed of 27 genome-wide association study single nucleotide polymorphisms and these 14 environmental factors found no significant multiplicative interactions, and only weak additive interactions with height, processed meat, and hormone replacement therapy use. For other cancers, the yield of significant G × E interactions has been more rewarding: breast with body mass, age at menarche and parity,[55] bladder with tobacco smoking,[56] and lung with asbestos,[57,58] as well as for other diseases like asthma,[33,59,60] diabetes,[61] and stroke.[62]
Genetic Targeting for Primary Prevention
All smokers would benefit from quitting, but is that benefit any greater for those who are genetically at highest risk? A randomized placebo-controlled trial compared two smoking cessation treatments, the nicotine patch or varenicline, stratified by the nicotine metabolite ratio, a phenotypic assay of the activity level of the CYP2D6 enzyme that metabolizes nicotine and cotinine.[63] They found that varenicline was more efficacious than the patch and had fewer side effects in normal metabolizers but not in slow metabolizers.
Genetic Targeting for Secondary Prevention
A recent perspective[64] asked how much of the nearly 50% declines in colorectal cancer incidence and mortality since 1975 could be attributed to screening. Nine randomized controlled trials showed effectiveness of fecal occult blood testing and sigmoidoscopy, but they argued that much of the decline occurred before any effect of recent increases in screening. (Other simulations,[65] however, have attributed about half of the decline to screening.) The discussed several possible explanations, including improvements in treatment and early detection for mortality and changes in risk factors for incidence. But neither article considered personalized prevention. Hsu et al,[66] however, have provided a model for predicting the age at start of screening that would provide the same benefit conditional on gender, family history, and 27 genetic variants as would population-wide screening at age 50.The complex dependence of an individual’s screening behavior on their own and family member’s screening histories could make screening look deleterious if high-risk individuals are more prone to get screened. In the companion article,[9] I describe the application of causal inference methods to evaluating population-wide and targeted screening programs, using simulation and analysis of a large case–control study.[67,68] However, it is not clear that the predicted reductions in the number needed to screen by any of the targeted approaches compared with population-wide screening are enough to justify their greater complexity, which would require obtaining information on risk factors, family history, or genetics before deciding on an appropriate screening schedule. Although there may be a benefit in terms of reduced cancer incidence, it is not clear what the impact on mortality would be.[69] This would require either a cohort study of cancer deaths in relation to screening (which would have the same problems of self-selection of screening behavior) or a randomized controlled trial. A full cost–benefit analysis would also have to address false negative and false positives, acceptability, the costs of targeting, and weigh the benefits of reduced incidence and mortality, all in relation to other screening modalities like occult blood and DNA-based tests.[70-72]
CONCLUSIONS
Although the genomics revolution has potential to transform prevention in addition to treatment, we still have a long way to go to more effectively identify who should be targeted. Environmental interventions like air pollution regulation cannot be targeted to any subgroup, genetic, or otherwise. Others, like antismoking campaigns, exercise, or diets, could in principle, but this might not be practical or cost efficient. The most efficient would be to identify those at high genetic sensitivity to avoidable exposures. But simply predicting genetic risks is not sufficient: we need evidence of G × E interaction. We should not let the enthusiasm for personalized medicine distract us from opportunities for classical public health approaches to prevention.
ABOUT THE AUTHOR
DUNCAN C. THOMAS is a Professor of Biostatistics and holds the Verna Richter Chair in Cancer Research at the USC/Norris Comprehensive Cancer Center. His primary research interests are in the development of statistical methods for genetic and environmental epidemiology.
Authors: Marilyn C Cornelis; Eric J Tchetgen Tchetgen; Liming Liang; Lu Qi; Nilanjan Chatterjee; Frank B Hu; Peter Kraft Journal: Am J Epidemiol Date: 2011-12-22 Impact factor: 4.897
Authors: Jun Wang; Amit D Joshi; Román Corral; Kimberly D Siegmund; Loïc Le Marchand; Maria Elena Martinez; Robert W Haile; Dennis J Ahnen; Robert S Sandler; Peter Lance; Mariana C Stern Journal: Int J Cancer Date: 2011-08-08 Impact factor: 7.396
Authors: Jane C Figueiredo; Leila A Mott; Edward Giovannucci; Kana Wu; Bernard Cole; Matthew J Grainge; Richard F Logan; John A Baron Journal: Int J Cancer Date: 2011-04-01 Impact factor: 7.396
Authors: Elizabeth D Kantor; Carolyn M Hutter; Jessica Minnier; Sonja I Berndt; Hermann Brenner; Bette J Caan; Peter T Campbell; Christopher S Carlson; Graham Casey; Andrew T Chan; Jenny Chang-Claude; Stephen J Chanock; Michelle Cotterchio; Mengmeng Du; David Duggan; Charles S Fuchs; Edward L Giovannucci; Jian Gong; Tabitha A Harrison; Richard B Hayes; Brian E Henderson; Michael Hoffmeister; John L Hopper; Mark A Jenkins; Shuo Jiao; Laurence N Kolonel; Loic Le Marchand; Mathieu Lemire; Jing Ma; Polly A Newcomb; Heather M Ochs-Balcom; Bethann M Pflugeisen; John D Potter; Anja Rudolph; Robert E Schoen; Daniela Seminara; Martha L Slattery; Deanna L Stelling; Fridtjof Thomas; Mark Thornquist; Cornelia M Ulrich; Greg S Warnick; Brent W Zanke; Ulrike Peters; Li Hsu; Emily White Journal: Cancer Epidemiol Biomarkers Prev Date: 2014-07-03 Impact factor: 4.254
Authors: Maria Elena Martinez; Thomas G O'Brien; Kimberly E Fultz; Naveen Babbar; Hagit Yerushalmi; Ning Qu; Yongjun Guo; David Boorman; Janine Einspahr; David S Alberts; Eugene W Gerner Journal: Proc Natl Acad Sci U S A Date: 2003-06-16 Impact factor: 11.205
Authors: Thomas F Imperiale; David F Ransohoff; Steven H Itzkowitz; Theodore R Levin; Philip Lavin; Graham P Lidgard; David A Ahlquist; Barry M Berger Journal: N Engl J Med Date: 2014-03-19 Impact factor: 91.245
Authors: Ke-Cheng Chen; Shih-Wei Tsai; Ruei-Hao Shie; Chian Zeng; Hsiao-Yu Yang Journal: Int J Environ Res Public Health Date: 2022-01-21 Impact factor: 3.390