| Literature DB >> 30390641 |
Shi Yan1, Yu Heng Kwan2, Chuen Seng Tan3, Julian Thumboo4, Lian Leng Low5.
Abstract
BACKGROUND: Data-driven population segmentation analysis utilizes data analytics to divide a heterogeneous population into parsimonious and relatively homogenous groups with similar healthcare characteristics. It is a promising patient-centric analysis that enables effective integrated healthcare interventions specific for each segment. Although widely applied, there is no systematic review on the clinical application of data-driven population segmentation analysis.Entities:
Keywords: Data analytics; Health policy; Health services research; Population health; Population segmentation; Public health; Systematic review
Mesh:
Year: 2018 PMID: 30390641 PMCID: PMC6215625 DOI: 10.1186/s12874-018-0584-9
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.615
Fig. 1Flow chart of retrieval of articles
Characteristics of target population subjected to data-driven segmentation (n = 216)
| Population selection | No. of studies | Examples |
|---|---|---|
| Without specific diseases/conditions | 163 | A nationally representative sample of adults aged 18 years and over in Ireland [ |
| With specific diseases/conditions | 53 | |
| Patients with psychological conditions | 12 | US adults who were diagnosed with lifetime post-traumatic stress disorder in wave 2 of the National Epidemiologic Survey on Alcohol and Related Conditions [ |
| Patients with cancer | 9 | Consecutive referrals with a diagnosis of non-curable cancer to the Palliative Medicine Program at the Cleveland Clinic Foundation [ |
| Patients with respiratory conditions | 8 | Children 6–17 years of age who underwent standardized characterization in Severe Asthma Research Program [ |
| Patients with heart diseases | 5 | Elderly patients admitted with ischemic coronary heart disease and recruited in a clinical trial [ |
| Patients with HIV positive status | 3 | A random stratified sample of HIV/AIDS patients recruited in French hospital departments delivering HIV care [ |
| Patients with gastrointestinal conditions | 3 | Patients with intractable irritable bowel syndrome enrolled in a randomised controlled trial [ |
| Others | 13 | |
| Sample Size | ||
| < =500 | 49 | |
| 501–1000 | 41 | |
| 1001-10,000 | 87 | |
| 10,001–100,000 | 24 | |
| > =100,001 | 10 | |
| N.A. | 5 | |
| Country/Region | ||
| Multiple countries | 11 | |
| North America | 122 | |
| US | 109 | |
| Canada | 13 | |
| Europe | 60 | |
| UK | 24 | |
| Other European countries | 36 | |
| Asia | 13 | |
| Oceanian | 8 | |
| Africa | 2 | |
Abbreviations: HIV Human Immunodeficiency Virus, AIDS Acquired Immune Deficiency syndrome, US The United States of America, UK The United Kingdom, N.A. Not Available
Features of data used for data-driven population segmentation
| Data source | No. of studies | Examples |
|---|---|---|
| Primary | 46 | Conducting clinical interviews and administering questionnaires [ |
| Secondary | 168 | 12-month follow-up data from a randomized clinical trial [ |
| Both | 2 | |
| Settings | ||
| Healthcare institutions | 31 | Hospitals [ |
| Community | 181 | Primary schools [ |
| Both | 4 | |
Objectives of segmentation
| Objective (themes) | No. of studies | Examples |
|---|---|---|
| Resource Allocation | 12 | Patients were grouped into segments with distinct care utilization, based on six utilization variables: non-elective inpatient admissions, elective inpatient admissions, outpatient visits, GP practice visits, GP home visits, and prescriptions, creating eight distinct care user types [ |
| Health /Prognostic Index | 17 | Patients were divided into groups that will have similar risk of atrial fibrillation after coronary artery bypass graft, facilitating informed decision making regarding aggressive prophylaxis of atrial fibrillation [ |
| Health Grouping / Profiling | 216 | Individuals were divided into groups based on their dietary patterns: ‘traditional fish eaters’, ‘healthy eaters’, ‘average, less fish, less healthy’, ‘Western’, ‘traditional bread eaters’, and ‘alcohol users’ [ |
| Delivery of Healthcare Interventions | 50 | Participants in the Wellington Respiratory Survey were divided based on five distinct clinical phenotypes of airflow obstruction which may form the basis of a modified taxonomy for the disorders of airways obstruction and treatment specifically targeted at defined phenotypic groups, rather than asthma or COPD in general, which represents the current management approach [ |
Abbreviations: COPD Chronic Obstructive Pulmonary Diseases
Commonly used data-driven population segmentation methods
| Methods# | No. of studies | Advantages | Disadvantages | Notes |
|---|---|---|---|---|
| Unsupervised Classifications | ||||
| Latent class/profile/transition/growth analysis | 96 | 1. Can handle missing data [ | Can be computationally intensive, especially with datasets that contain thousands of observations [ | 1. Segmenting variables need to be categorical, continuous, and categorical at multiple time points for latent class analysis, latent profile analysis, and latent transition analysis respectively [ |
| K-means cluster analysis | 60 | 1. Can deal with very large datasets [ | 1. Might not guarantee reproducible solutions (may get a different solution for each set of specified seed points) [ | Users need to pre-specify the desired number of segments. |
| Hierarchical analysis | 50 | 1. Stopping rules are readily available (e.g. Duda’s pseudo T square statistic, and Calinski’s pseudo F statistic) to determine ideal cluster solutions [ | 1. Difficult to handle large datasets (sample size is preferably under 300–400, not exceeding 1000) [ | |
| Supervised Classification | ||||
| Decision Tree Methods (CHAID/CART) | 10 | 1. Can handle outliers and missing data [ | Models are based on splits that depend on previous splits; an error made in a higher split will propagate down [ | Users need to pre-specify dependent (or target) variables |
Abbreviations: CHAID Chi-square Automatic Interaction Detector, CART Classification and Regression Tree
# Some studies applied multiple methods in tandem or in combination
Segmentation outcome evaluations
| Number of segments (parsimony) | No. of studies | Examples |
|---|---|---|
| <=3 | 76 | A population of PTSD patients was segmented based on symptoms: “High-Symptom”, “Dysphoric”, and “Threat” [ |
| 4–5 | 98 | A group of children was divided into clusters of different patterns of sun protective behaviors: “Multiple protective behaviors”, “Clothing and shade”, “Pants only”, and “Low/inconsistent protective behaviors” [ |
| 6–9 | 55 | An adult population was segmented by dietary patterns: “Traditional Irish”, “Continental”, “Unhealthy foods”, “Light-meal foods & low-fat milk”, “Healthy foods”, and “Wholemeal bread & dessert” [ |
| > = 10 | 4 | A female population was divided into 43 groups based on mammography status, access to care, health behaviors (e.g. smoking), health status etc. 44 |
| Internal validation | ||
| Yes | 216 | The optimal number of clusters was assessed using the Bayesian Information Criterion [ |
| No | 0 | |
| External validation | ||
| Yes | 138 | Using risks of tonsillectomies and wheezing frequency to validate segmentation analysis based on symptoms of sleep disordered breathing [ |
| No | 78 | |
| Identifiability/Interpretability | ||
| Yes | 216 | Segmentation analysis of dietary patterns derived clusters that are easily identified as “Alcohol cluster”, “Meat cluster”, “Healthy cluster”, and “Refined sugars cluster” [ |
| No | 0 | |
| Substantiality | ||
| Yes | 216 | The smallest segment of a clustering analysis of asthma symptoms is composed of 15.8% of the population [ |
| No | 0 | |
| Stability | ||
| Yes | 10 | A segmentation analysis of a asthma patient population with 10-year follow up showed the segments remain relatively stable 10 years apart (probability of cluster membership in the same asthma cluster at both times varied between 54 to 88%) [ |
| No | 206 | |
| Actionability/Accessibility | ||
| Yes | 216 | A population is divided into segments with distinct sun protection behavioral patterns, for each of which future sun protection interventions tailored to specific subgroups can be designed and delivered to achieve meaningful behavioral changes [ |
| No | 0 | |