OBJECTIVES: This scoping review synthesizes the recent literature on precision public health and the influence of predictive models on health equity with the intent to highlight central concepts for each topic and identify research opportunities for the biomedical informatics community. METHODS: Searches were conducted using PubMed for publications between 2017-01-01 and 2019-12-31. RESULTS: Precision public health is defined as the use of data and evidence to tailor interventions to the characteristics of a single population. It differs from precision medicine in terms of its focus on populations and the limited role of human genomics. High-resolution spatial analysis in a global health context and application of genomics to infectious organisms are areas of progress. Opportunities for informatics research include (i) the development of frameworks for measuring non-clinical concepts, such as social position, (ii) the development of methods for learning from similar populations, and (iii) the evaluation of precision public health implementations. Just as the effects of interventions can differ across populations, predictive models can perform systematically differently across subpopulations due to information bias, sampling bias, random error, and the choice of the output. Algorithm developers, professional societies, and governments can take steps to prevent and mitigate these biases. However, even if the steps to avoid bias are clear in theory, they can be very challenging to accomplish in practice. CONCLUSIONS: Both precision public health and predictive modelling require careful consideration in how subpopulations are defined and access to data on subpopulations can be challenging. While the theory for both topics has advanced considerably, there is much work to be done in understanding how to implement and evaluate these approaches in practice. Georg Thieme Verlag KG Stuttgart.
OBJECTIVES: This scoping review synthesizes the recent literature on precision public health and the influence of predictive models on health equity with the intent to highlight central concepts for each topic and identify research opportunities for the biomedical informatics community. METHODS: Searches were conducted using PubMed for publications between 2017-01-01 and 2019-12-31. RESULTS: Precision public health is defined as the use of data and evidence to tailor interventions to the characteristics of a single population. It differs from precision medicine in terms of its focus on populations and the limited role of human genomics. High-resolution spatial analysis in a global health context and application of genomics to infectious organisms are areas of progress. Opportunities for informatics research include (i) the development of frameworks for measuring non-clinical concepts, such as social position, (ii) the development of methods for learning from similar populations, and (iii) the evaluation of precision public health implementations. Just as the effects of interventions can differ across populations, predictive models can perform systematically differently across subpopulations due to information bias, sampling bias, random error, and the choice of the output. Algorithm developers, professional societies, and governments can take steps to prevent and mitigate these biases. However, even if the steps to avoid bias are clear in theory, they can be very challenging to accomplish in practice. CONCLUSIONS: Both precision public health and predictive modelling require careful consideration in how subpopulations are defined and access to data on subpopulations can be challenging. While the theory for both topics has advanced considerably, there is much work to be done in understanding how to implement and evaluate these approaches in practice. Georg Thieme Verlag KG Stuttgart.
Precision public health and the influence of predictive models on health equity are two topics that have received considerable attention recently. Common drivers for both topics include the increasing amount of data available and advances in statistical and machine learning methods. This scoping review synthesizes the recent literature on these two topics with the intent to highlight central concepts for each topic and identify research opportunities for the biomedical informatics community.
Methods
Searches were conducted using PubMed with a single query for each topic and publication dates between 2017-01-01 and 2019-12-31, the date of the final search. Each title, and the abstract if necessary, was reviewed to determine if the article addressed the topic. More specifically, articles were sought that explicitly considered and commented on the topic. Potentially relevant articles were retrieved, and references reviewed to identify other relevant articles. The search for “precision public health” returned 77 articles and 20 were retained. There were 4 articles matching this query prior to Jan 1, 2017. The search for “equity and (algorithm or prediction)” returned 156 articles and after review 15 were retained. There were 204 articles matching this query prior to Jan 1, 2017.
Results
Precision Public Health
The most generally accepted definition of precision public health (PPH) is the use of data and evidence to tailor interventions to the characteristics of a single population
1
. Achieving ‘precision’ requires high-resolution surveillance data to set priorities that are tailored to a specific population and the ability to select an evidence-based public health intervention best matched to the characteristics of a population
2
3
. In this section, the author reviews recent publications about PPH to characterize the issue, to examine how the concept has been applied, and to identify opportunities for informatics research.Although precision medicine
4
laid the foundation for precision public health, there are two important distinctions between ‘precision’ in medicine and in public health. One fundamental difference is the unit of interest, namely a population as compared to a single patient
5
. Although some have argued that the aggregate effects of precision medicine can improve population health one patient at a time
6
, there is broad support in the public health community for considering populations explicitly
7
so that equity across sub-populations can be assessed, and inequities addressed. The second distinction is that the role of genomic information is currently much more limited in PPH. However, the concept of PPH has been applied to population health genomics
8
, for example pharmacogenomics
9
10
, and some have proposed that polygenic risk scores may have application within a PPH context
11
.While a consensus appears to be emerging around the concept of PPH, the topic is not without controversy, which has led some junior investigators to argue for the importance of continuing research in this area
12
. Most notably, some have suggested that PPH could divert attention away from the broader determinants of health towards clinical concepts, which tend to be measured with greater accuracy
13
. For example, the social determinants of health, such as social status, are generally not measured well, so they may play a limited role in characterizing populations and identifying optimal interventions
7
14
. Concerns have also been raised about attempts to use genomic data in PPH, due to the limited availability of such data for many sub-populations and the usually small effect size at a population scale compared to other determinants of health
7
. Even if genomic data are not used, some have argued that PPH is simply a new term for what public health has always done
1
15
, although others have countered that PPH highlights the role of ‘Big Data’ and computational methods in targeting public health interventions to improve population health
3
.Research in PPH has tended to address the ‘diagnostic’ (
i.e
., measurement and priority setting) more so than the ‘therapeutic’ (
i.e
., selection and management of evidence-based public health interventions) aspects of PPH. In terms of measurement, advances in methods such as spatial statistics have allowed highly accurate sub-regional mapping of population characteristics in global health research. For example, one method estimates the resolution of 5 km by 5 km cells for the African continent, HIV prevalence
16
and exclusive breastfeeding until 6 months of age
17
. In another study, the authors created similar high-resolution estimates of educational attainment across low and middle-income countries
18
. Such high-resolution measurement of important health indicators facilitates consideration of intervention strategies with greater precision than was previously possible
19
20
. This application of PPH in a global health context has been called ‘precision global health’
21
, and it draws on a range of technologies and methods to improve measurement in global health
20
21
.From a therapeutic or public health intervention perspective, some of the most promising applications of PPH have been in infectious disease control. In this context, which some have called ‘precision epidemiology’
22
, the genomics of the infectious organism plays a central role
22
23
24
. Through the identification of transmission networks
25
and optimal antimicrobial therapy
26
rapid sequencing and analysis of the genomes of infectious organisms can be used to identify the best strategies for preventing transmission and treating disease. However, some have noted that genomic data on organisms is rarely sufficient to understand the mechanisms of disease transmission, and that data for ‘deep phenotyping’, or describing precisely the characteristics of disease, are also needed
22
.While the concept of PPH is gaining traction and initial research results in global health and infectious diseases are promising, there remain many opportunities for informatics research in this area. From a measurement or diagnostic perspective, widely adopted frameworks for measuring non-clinical concepts, such as social position
14
are needed to consistently classify populations for precisely identifying interventions
27
. These advances in measurement, especially through the use of new and large data sources, have some overlap with the concept of digital epidemiology
28
. In terms of therapy or the identification of interventions that best matched with a population, most efforts in PPH have so far relied on experts to interpret the data and identify interventions. However, as in precision medicine, there is considerable opportunity to develop methods for learning from similar populations
29
and for using causal reasoning to integrate evidence from different sources to estimate the effect of an intervention in a specific population
30
. For example, if public health agencies were to systematically record and share information about implemented interventions along with characteristics of populations and effects of interventions, it would create a foundation for a “learning public health system”. Finally, there are many research opportunities in the implementation and evaluation of PPH strategies, although some have noted that the digitization of public health practice is a prerequisite for implementation of PPH
31
.
Prediction and Equity
The ethical and equitable distribution of healthcare resources and health outcomes, a focus of this yearbook, has long been an explicit goal of modern health systems and is central to global sustainable development goals
32
. In the context of the recent renaissance of machine learning
33
, and particularly deep neural networks and reinforcement learning, the potential for prediction models in clinical medicine and public health to exacerbate health inequities is increasingly recognized
34
35
. In this section, we review recent publications on this topic to characterize the issue, examine how model biases may have inequitable effects, and identify what can be done to prevent and mitigate these biases.At a high level, prediction models in healthcare, whether statistical or machine-learning in nature, take inputs in the form of patient data and produce an output, usually in the form of a probability or a predicted class. From a population or public health perspective, if the validity of the outputs differs systematically across subpopulations then the use of the model to guide decisions in practice can exacerbate health inequalities. For example, in a justice context, a model predicting recidivism to guide decisions about granting parole, could increase sex-based inequalities if it systematically overpredicts recidivism in women
36
. In a healthcare context, a model that systematically underpredicts the resources needed by black patients, could increase racial inequalities if it is used to direct proportionally more resources to white patients
37
.Models can perform systematically differently across subpopulations due to information bias, sampling bias, random error, and the choice of the output
34
38
39
40
. Information bias can occur where the quality or amount of data differs systematically between subpopulations. For example, people from areas with lower socioeconomic status tend to visit more clinicians and have fewer tests ordered, which could produce systematic differences in data held within electronic medical records
38
41
. Sampling bias can arise when the proportion sampled differs systematically across subpopulations. For example, an algorithm trained to predict depression from language used on Facebook
42
may not work well when applied to text from teenagers, who are less likely to use that platform
43
. However even if sampling is uniform across subpopulations, for subpopulations with fewer individuals, the number available for training a model may be too low to achieve an acceptable precision when making predictions
38
. The choice of the output for the model to predict can also be a source of bias if the output is not aligned with what the model is expected to accomplish
39
. For example, using healthcare cost as an output, as opposed to a composite of cost and health status, can reinforce existing racial inequalities in the allocation of healthcare resources
37
44
.Algorithm developers, professional societies, and governments can take steps to prevent and mitigate the biases described. Many papers have been published suggesting steps that model developers can take to address biases that may lead to inequalities
34
38
39
40
45
. Suggested actions have been identified at the stages of model conception, model training and testing, and deployment and monitoring
34
. Mistaking the objective
39
is a potential problem at the conception stage that may be prevented by consulting with diverse groups, considering the ethical implications of using the model, and ensuring that its outputs are aligned with its intended use in the health system
34
38
. In model training, authors have identified pitfalls
39
and challenges
45
, with suggestions to build and test algorithms in diverse socioeconomic health systems
38
and to measure important metrics
46
and allocation across subpopulations
34
. Once implemented and in routine use, it is possible to monitor the outputs and associated outcomes using automated alerts and through feedback from patients and clinicians
34
.Although the steps to avoid biases may be clear in theory, some have noted that taking these steps in practice can be very challenging
47
. In particular, access to data across diverse settings has been noted as a challenge
48
. A potential approach, proposed in New Zealand, is to develop a national data resource that model developers can use to generate predictive models of cardiovascular disease
49
. Recommendations to avoid biases in machine learning were also made in a report on artificial intelligence in medicine from the National Academies in the US
50
and a similar report commissioned by the National Health System in the UK
51
.
Discussion
The definition of subpopulations or population strata is central to both of the topics explored in this survey. In precision public health, a subpopulation must be identified in order to characterize the needs and identify the interventions that best matched those needs. In assessing the potential impact of a model on health equity, subpopulations must also be identified, so that the distribution of model outcomes across subpopulations can be assessed. In other words, PPH tends to focus within a subpopulation to optimize interventions for that subpopulation while assessment of equity tends to look across subpopulations to fairly distribute resources across subpopulations. The approach to defining subpopulations tends to differ, however, with subpopulations generally defined spatially based on geographical boundaries in PPH, especially in global health settings. In contrast, for prediction models, subpopulations are usually defined by individual characteristics, such as sex and ethnicity. Interestingly, although the definition of subpopulations is central to both topics, there appears to be little explicit consideration in the literature of how subpopulations should be defined. In both cases though, the most important characteristics of subpopulations (
e.g
., sex, social status) are those that may modify or influence the inference or prediction. If both PPH and prediction modeling are to be applied in an equitable and effective manner, then renewed attention must be given to ensuring that data are available to measure and model the most important characteristics of subpopulations.Unfortunately, accessing data to characterize subpopulations is a challenge central to both topics. The measurement aspect of PPH and the assessment of equity in prediction models both tend to rely on the secondary analysis of data originally collected for other purposes. However, such data tend to not be uniformly available across subpopulations, which can be problematic, especially if the non-uniform coverage is not acknowledged explicitly. The implication for PPH is that the needs will be assessed with greater precision in some subpopulations, which may lead to more effective interventions in those populations, potentially increasing inequalities. A similar situation also exists for the therapeutic aspect of PPH. If interventions are less likely to be evaluated in some subpopulations, then the evidence about interventions in those subpopulations will be limited, making it difficult to identify optimal interventions. For prediction models, as discussed above, non-uniform data across subpopulations can result in differing model performance across sub-populations, which can exacerbate inequalities.In addition to improvements in data, advances in training are necessary if the potential benefits of PPH and prediction modeling are to be realized. Both topics, like biomedical informatics, are at the intersection of multiple disciplines and draw on a range of methods. While trainees in some programs in biomedical informatics may be exposed to aspects of prediction modeling and PPH, these topics may not be addressed directly. In other fields, such as epidemiology, biostatistics, and computer science, training tends to address some, but not all, of the underlying methods needed to successfully develop, implement, and evaluate PPH approaches and prediction models. Education programs in biomedical informatics and related disciplines could benefit from a more direct and wholistic consideration of both PPH and prediction modeling as examples of how multiple methods and perspectives are relevant to advancing public health.Finally, both topics examined in this survey are somewhat abstract, in that they define overarching frameworks which are meant to be helpful in advancing public health, including health equity. However, the practical implementation of these frameworks can be challenging. In PPH, there has been considerably more focus on increasing precision in measurement than on how to use this improved precision to identify optimal interventions for subpopulations. Similarly, for mitigating biases in prediction models that may exacerbate inequities, the strategies proposed, such as building and testing algorithms in diverse socioeconomic health systems, are likely to be challenging in practice.
Conclusion
The theory underlying precision public health and the prevention and mitigation of biases in prediction models to advance health equity have advanced considerably. Driven by the increasing availability of data and advances in statistics and machine learning, researchers and practitioners are increasingly applying and evaluating these frameworks. However, there remains much work to be done to understand how to implement and evaluate these concepts in practice. Most notably, there is a need to clarify how subpopulations are defined, to ensure that data are available to measure important characteristics of subpopulations, and to adequately train researchers and practitioners in these frameworks and the underlying methods upon which they depend.
Authors: Alvin Rajkomar; Michaela Hardt; Michael D Howell; Greg Corrado; Marshall H Chin Journal: Ann Intern Med Date: 2018-12-04 Impact factor: 25.391
Authors: Gregory L Armstrong; Duncan R MacCannell; Jill Taylor; Heather A Carleton; Elizabeth B Neuhaus; Richard S Bradbury; James E Posey; Marta Gwinn Journal: N Engl J Med Date: 2019-12-26 Impact factor: 91.245
Authors: Natalia V Bhattacharjee; Lauren E Schaeffer; Laurie B Marczak; Jennifer M Ross; Scott J Swartz; James Albright; William M Gardner; Chloe Shields; Amber Sligar; Megan F Schipp; Brandon V Pickering; Nathaniel J Henry; Kimberly B Johnson; Celia Louie; Michael A Cork; Krista M Steuben; Alice Lazzar-Atwood; Dan Lu; Damaris K Kinyoki; Aaron Osgood-Zimmerman; Lucas Earl; Jonathan F Mosser; Aniruddha Deshpande; Roy Burstein; Lauren P Woyczynski; Katherine F Wilson; John D VanderHeide; Kirsten E Wiens; Robert C Reiner; Ellen G Piwoz; Rahul Rawat; Benn Sartorius; Nicole Davis Weaver; Molly R Nixon; David L Smith; Nicholas J Kassebaum; Emmanuela Gakidou; Stephen S Lim; Ali H Mokdad; Christopher J L Murray; Laura Dwyer-Lindgren; Simon I Hay Journal: Nat Med Date: 2019-07-22 Impact factor: 53.440
Authors: Anna Odone; Stefan Buttigieg; Walter Ricciardi; Natasha Azzopardi-Muscat; Anthony Staines Journal: Eur J Public Health Date: 2019-10-01 Impact factor: 3.367
Authors: Amy Abernethy; Laura Adams; Meredith Barrett; Christine Bechtel; Patricia Brennan; Atul Butte; Judith Faulkner; Elaine Fontaine; Stephen Friedhoff; John Halamka; Michael Howell; Kevin Johnson; Peter Long; Deven McGraw; Redonda Miller; Peter Lee; Jonathan Perlin; Donald Rucker; Lew Sandy; Lucia Savage; Lisa Stump; Paul Tang; Eric Topol; Reed Tuckson; Kristen Valdes Journal: NAM Perspect Date: 2022-06-27