| Literature DB >> 27570641 |
Vibhu Agarwal1, Lichy Han1, Isaac Madan2, Shaurya Saluja2, Aaditya Shidham2, Nigam H Shah1.
Abstract
The steady rise in healthcare costs has deprived over 45 million Americans of healthcare services (1, 2) and has encouraged healthcare providers to look for opportunities to improve their operational efficiency. Prior studies have shown that evidence of healthcare seeking intent in Internet searches correlates well with healthcare resource utilization. Given the ubiquitous nature of mobile Internet search, we hypothesized that analyzing geo-tagged mobile search logs could enable us to machine-learn predictors of future patient visits. Using a de-identified dataset of geo-tagged mobile Internet search logs, we mined text and location patterns that are predictors of healthcare resource utilization and built statistical models that predict the probability of a user's future visit to a medical facility. Our efforts will enable the development of innovative methods for modeling and optimizing the use of healthcare resources-a crucial prerequisite for securing healthcare access for everyone in the days to come.Entities:
Year: 2016 PMID: 27570641 PMCID: PMC5001755
Source DB: PubMed Journal: AMIA Jt Summits Transl Sci Proc
Figure 1.Definition of patients and controls (A) Generation of search logs based on searches within geo-fences (B) Identifying searches proximal to medical facilities (C) Selection of patients and controls based on filtering criterion
Figure 2.Longitudinal partitioning of search log data over analyses windows consisting of 41 search days
Description of feature classes
| Class | Feature description | Aggregate | Day-wise | Day-wise |
|---|---|---|---|---|
| General | Number of searches | ✓ | ✓ | ✓ |
| Number of Healthcare related searches | ✓ | ✓ | ✓ | |
| Mean session duration | ✓ | ✓ | ✓ | |
| Mean length of search text | ✓ | ✓ | ✓ | |
| Semantic (SE) | Number of searches for a disease | ✓ | ✓ | ✓ |
| Number of searches for a drug | ✓ | ✓ | ✓ | |
| Number of searches for a medical device | ✓ | ✓ | ✓ | |
| Number of searches for a medical procedure | ✓ | ✓ | ✓ | |
| Number of searches containing one of 108 enriched (Chinese) words | ✓ | ✓ | ✓ | |
| Location (LO) | Number of searches mapped to one of 53 enriched location categories | ✓ | ✓ | ✗ |
| Number of searches who location labels contain one of 113 words | ✓ | ✗ | ✗ |
Figure 3.Framework for search text translation and mapping
Figure 4.Workflow to extract location features
Figure 5.Building and testing prediction models
Figure 6.Heat map of words in location
AUCs for aggregate, day-wise and day-wise offset feature sets
| Model | Aggregate | Day-wise | Day-wise offset | |||
|---|---|---|---|---|---|---|
| Training | Test | Training | Test | Training | Test | |
| Lasso | 0.918 | 0.624 | 0.899 | 0.509 | 0.967 | 0.813 |
| Ridge | 0.912 | 0.601 | 0.947 | 0.561 | 0.973 | 0.785 |
| Random Forest | 0.895 | 0.481 | 0.896 | 0.750 | 0.941 | 0.557 |
| Elastic Net | 0.918 | 0.621 | 0.914 | 0.532 | 0.966 | 0.812 |
| SVM, Radial Kernel | 0.998 | 0.583 | - | - | - | - |
| Forward Stepwise Selection | 0.924 | 0.633 | - | - | - | - |
Figure 7.ROC curves for (A) aggregate (B) day-wise and (C) day-wise offset feature sets
Top ranked features for best performing feature set models
| Aggregate | Day-wise | Day-wise offset | ||||
|---|---|---|---|---|---|---|
| 1 | SE | 哪, 哪个 (which) | SE | 视频 (video) | SE | 拉贝哪 (Bella Yao) |
| 2 | SE | 视频 (video) | LO | Accommodation, Hotel | SE | 家庭醫生訪問 (Home Doctor Visit) |
| 3 | SE | 至 (to) | LO | Health | SE | 范冰冰 (Bingbing Fan) |
| 4 | SE | 走 (go) | GE | Day 1 # Health searches | SE | 喝 (drink) |
| 5 | LO | “hua” | SE | 哪 (which) | SE | 哪个 (which) |
| 6 | LO | “mall” | LO | Safety, Police | SE | 买 (buy) |
| 7 | LO | “metro” | GE | Average IC | SE | 死 (die) |