| Literature DB >> 31701090 |
Elisabeth Engl1, Peter Smittenaar1, Sema K Sgaier1,2,3.
Abstract
One-size-fits-all interventions that aim to change behavior are a missed opportunity to improve human health and well-being, as they do not target the different reasons that drive people's choices and behaviors. Psycho-behavioral segmentation is an approach to uncover such differences and enable the design of targeted interventions, but is rarely implemented at scale in global development. In part, this may be due to the many choices program designers and data scientists face, and the lack of available guidance through the process. Effective segmentation encompasses conceptualization and selection of the dimensions to segment on, which often requires the design of suitable qualitative and quantitative primary research. The choice of algorithm and its parameters also profoundly shape the resulting output and how useful the results are in the field. Analytical outputs are not self-explanatory and need to be subjectively evaluated and described. Finally, segments can be prioritized and targeted with matching interventions via appropriate channels. Here, we provide an end-to-end overview of all the stages from planning, designing field-based research, analyzing, and implementing a psycho-behavioral segmentation solution. We illustrate the choices and critical steps along the way, and discuss a case study of segmentation for voluntary medical male circumcision that implemented the method described here. Though our examples mostly draw on health interventions in the developing world, the principles in this approach can be used in any context where understanding human heterogeneity in driving behavior change is valuable. Copyright:Entities:
Keywords: Psycho-behavioral segmentation; behavioral science; cluster analysis; global health and development; targeted intervention design; human heterogeneity; machine learning; unsupervised learning
Year: 2019 PMID: 31701090 PMCID: PMC6820452 DOI: 10.12688/gatesopenres.13029.2
Source DB: PubMed Journal: Gates Open Res ISSN: 2572-4754
Figure 1. Process flow for end-to-end segmentation.
This pathway captures the overall and detailed steps and decision points, as well as target outputs, at each step from conceptualization to research design, implementation, analysis, and targeting. It is meant to assume the maximum workload for a program, where a de novo data set needs to be generated through primary research. Other scenarios are possible. For example, after defining a goal and structuring potential drivers, programs may find they can use existing data sets, in which case they might jump straight to cluster analysis. In any case, critical steps are interlinked and depend strongly on each other. They should therefore be considered together from the start.
Converting behavioral drivers into survey items for data collection.
These sample items, linked to distinct drivers of behavior [4], are an excerpt from a questionnaire used to survey Zimbabwean men aged 15–29 to investigate how men relate to voluntary male medical circumcision [3]. VMMC - Voluntary medical male circumcision.
| Construct /
| Scale | Sample item |
|---|---|---|
| Sample variables used as segmentation inputs | ||
|
|
| Once circumcised, a man cannot get any disease, so no longer has
|
|
|
| Based on what you know about circumcision, which of the following
|
|
|
| HIV/AIDS is a big problem in our country/community/for people I
|
| During the healing process after male circumcision, one experiences
| ||
|
| I believe that chances are high that I could get HIV | |
|
| It is difficult to control whether you get HIV or not - even if I do my
| |
|
|
| Which of the following statements best describes your discussions
|
| Sample variables used for profiling | ||
|
|
| Who or which sources of information have encouraged you to start to
|
|
|
| What is your relationship status? |
|
|
| Are you circumcised? |
|
|
| Considering what you know about male circumcision, which
|
|
|
| How do you normally look for healthcare information on the internet? |
|
|
| I use condoms all the time with my current sexual partner(s) |
Algorithm choice is defined by the shape of the underlying input data, weighing advantages and disadvantages.
Distance-based clustering consists of relatively simple algorithms and is most widely used. Model-based clustering has many applications, and latent class analysis is particularly worthwhile when only categorical data exist. TwoStep clustering is a model-based method but also relies on estimated distances between individuals. As discussed in the text, supervised clustering using decision trees can be useful in select circumstances. “Manual segmentation” is added to emphasize that often an entire population is divided into subgroups for targeting without the use of any algorithm, e.g. by manually setting an age or a geographic cut-off based on prior knowledge or examination of descriptive statistics. Overview of common cluster analysis algorithms.
| Algorithm name | Input data | Particularly useful for | Advantages | Disadvantages | Example |
|---|---|---|---|---|---|
| Distance-based unsupervised clustering | |||||
|
| Dissimilarity
| Quick explorations, visualizing
| Versatile technique as it can
| Commits to joining most
| Clustering symptoms of
|
|
| Continuous
| Continuous data sets, quick
| Simple, widely used, many tools
| Variables need to be on same
| Translating health
|
|
| Dissimilarity
| Data with outliers (more robust than
| Works with any distance matrix.
| Takes slightly longer to
| Classification of incidence
|
| Distribution-based unsupervised clustering | |||||
|
| Categorical
| Data sets that are categorical or have
| One of few methods made to
| Lose information as continuous
| Heterogeneity in urban
|
|
| Multivariate
| Continuous data where it can be
| Where k-means assumes
| Inaccurate if continuous
| Consumer segmentation
|
|
| Continuous
| Large data (solves some computation
| Provides list of relevance of
| Order of records matters for
| Subgrouping outpatients of
|
| Supervised clustering | |||||
|
| Any mix of
| Segmentations with one outcome
| Segment construction very
| Population split by most
| Financial profiling of public
|
| Manual (no algorithm) | |||||
|
| Cut-offs
| Situations in which there is strong
| Fast, leverages domain
| Will never be better than
| Interventions targeting
|
Figure 2. Idealized and realistic outputs of k-means (upper panel) and hierarchical (lower panel) cluster analysis.
The reality of psycho-behavioral segmentation is that there are rarely perfectly-defined clusters. We illustrate this with simulated data showing a segmentation with two input variables in the context of getting a test for sexually-transmitted diseases (STDs): sensitivity to peer pressure to get the test, and perception of risk to health of STDs. In the idealized case (left side), there are clearly three clusters detected by the k-means algorithm, perfectly separated from one another. The dendrogram of these data in the bottom left also shows a clear point at which the tree can be 'cut' to define the clusters (as done in hierarchical clustering). In the real world (right side), however, variables are often normally distributed and noisy. K-means will still identify three segments because that number of segments was specified in advance, but clearly there is little actual ‘clustering’ of people. Similarly, the dendrogram of these data suggests potential cut-off points at several heights that seem equally reasonable. This is not to say clusters could not be useful in this case, as there are differences in sensitivity to peer pressure and risk perception between the segments. However, it could be reflected in tailored interventions that the individuals near the middle of the cloud are not all that different from one another. The code to generate these graphics is available at https://github.com/SurgoFoundation/segmentation.
Figure 3. The PACERS framework for segment prioritization.
Figure 4. Representative schematic of a segment typing tool.
For individual targeting, field workers or other stakeholders can use typing tools that quickly identify which segment an individual most likely belongs to. Splits in a decision tree-based typing tool can be based on categorical or continuous variables alike, and are chosen by the algorithm to identify members of each segment as accurately as possible. By giving responses to each question, a person is then allocated to a segment at the end of their path. Here, we show a hypothetical example of what a typing tool could look like to allocate a parent into existing segments relating to child vaccination behaviors. A parent in a given segment might be more or less likely to vaccinate their child, for different reasons. The field worker can then select an intervention or message that is most likely to resonate with that specific segment. For practicality, typing tools often stick to the three or four most predictive questions. However, that practicality has a tradeoff with typing accuracy: the more accurate a typing tool needs to be, the more questions must be asked.
A selection of practical tools for cluster segmentation.
This non-exhaustive selection provides a starting point to the practitioner to learn from case studies in the field, work through case examples of cluster segmentation using the R programming language, and refer to more detailed characterizations of the main clustering algorithms and their implementations in popular software packages.
| Resource | Description |
|---|---|
|
| |
|
| This manual by the Consultative Group to Assist the Poor (CGAP) focuses on planning
|
|
| Breakthrough ACTION, based at the Johns Hopkins Center for Communication Programs (CCP), give a brief overview of steps for successful segmentation programs and reference
|
|
| |
|
| Sample data analysis use case used at INSEAD, using hierarchical vs. k-means clustering
|
|
| Tutorial using k-means clustering in R |
|
| Overview of the main R packages for cluster analysis, with sample code |
|
| Overview and code examples of model-based clustering techniques for the R mclust
|
|
| |
|
| Highly technical yet practical statistical textbook; section 14.3 discusses clustering in
|
|
| In-depth technical overview of major clustering algorithms and their parameters |
|
| Overview of clustering functionality in STATA, a popular statistical program |
|
| IBM’s technical report on the TwoStep cluster algorithm for the SPSS software package |
|
| Extensive cluster analysis manual for the SYSTAT software package |