| Literature DB >> 29446035 |
Shaun Comfort1, Sujan Perera2, Zoe Hudson3, Darren Dorrell3, Shawman Meireis3, Meenakshi Nagarajan2, Cartic Ramakrishnan2, Jennifer Fine3.
Abstract
INTRODUCTION: There is increasing interest in social digital media (SDM) as a data source for pharmacovigilance activities; however, SDM is considered a low information content data source for safety data. Given that pharmacovigilance itself operates in a high-noise, lower-validity environment without objective 'gold standards' beyond process definitions, the introduction of large volumes of SDM into the pharmacovigilance workflow has the potential to exacerbate issues with limited manual resources to perform adverse event identification and processing. Recent advances in medical informatics have resulted in methods for developing programs which can assist human experts in the detection of valid individual case safety reports (ICSRs) within SDM.Entities:
Mesh:
Year: 2018 PMID: 29446035 PMCID: PMC5966485 DOI: 10.1007/s40264-018-0641-7
Source DB: PubMed Journal: Drug Saf ISSN: 0114-5916 Impact factor: 5.606
Fig. 1Breakdown of source data and curated subsets. Blue boxes indicate the data batches received by the software development team in various stages of development. The green boxes contain the split between valid, invalid, and excluded individual case safety reports (ICSRs) in the respective dataset. Posts were excluded because they fell outside the scope of the proof-of-concept study (see Sect. 2)
Fig. 2Components of the individual case safety report (ICSR) classification framework. AE adverse event
Performance metrics
| Name | Value |
|---|---|
| True positive (tp) | No. of true positivesa |
| True negative (tn) | No. of true negativesa |
| False positive (fp) | No. of false positivesa |
| False negative (fn) | No. of false negativesa |
| Accuracy (Acc) |
|
| Gwet AC1 |
|
|
| |
| Area under the curve | Trapezoidal method [ |
SME subject matter expert
aTrue positive, negative, etc are based on SME-determined ‘ground truth’
Breakdown of the social media sources of the core dataset
| Social media site | Number of posts |
|---|---|
| 168,745 | |
| Online news and blogs | 106,336 |
| Tumblr | 32,961 |
| 2754 | |
| YouTube | 142 |
| Other | 251 |
| Total | 311,189 |
Contrived examples of social media posts containing valid and invalid ICSRs
| Valid ICSRs | Invalid ICSRs |
|---|---|
| Got the | This |
| I | |
| I took | The most common side effects of |
The terms in italics indicate drug keywords and the terms in bold indicate potential adverse event keywords and phrases
ICSR individual case safety report
Fig. 3Performance of Iteration I and Iteration II classifiers. a Graph of accuracy and Gwet AC1 for both classifiers. b Confusion matrix for Iteration I classifier. c Confusion matrix for Iteration II classifier
Fig. 4Performance of Iteration III classifier. a Plot of the receiver operator characteristic (ROC) curve of the Iteration III classifier. b Graph of area under the ROC curve and Gwet AC1 for the Iteration III classifier (average ± SD). c Confusion matrix of the five cross-validation results for the Iteration III classifier (average ± SD). AUC area under the curve, SD standard deviation
Fig. 5Performance of the Iteration III classifier on a blind set. a Confusion matrix of the blind testing set results for the Iteration III classifier. b Chart of count of false-positive results by reason. c Chart of average length of post by reason for false-positive result. AE adverse event, Avg average
Program evaluation and review technique estimate of time for a human to evaluate the digital media data collection
| Variables | Minimum | Maximum | Mode | Exp | SD | L90%CI* | U90%CI* |
|---|---|---|---|---|---|---|---|
| Posts ( | 311,189 | 311,189 | 311,189 | 311,189 | 0.0 | 311,189 | 311,189 |
| Human reading speed (wpm) | 136 | 232 | 178 | 180 | 16 | 153 | 206 |
| Post length (words) | 10 | 10,000 | 316 | 1879 | 1665 | 48 | 2101 |
| Read/ID speed (min/post) | 0.07 | 43.10 | 1.78 | 8.38 | 7.17 | 0.31 | 10.22 |
| Total evaluation time (min) | 22,882 | 13,413,319 | 554,001 | 2,608,701 | 2231,701 | 96,517 | 3,179,934 |
| Total evaluation time (h) | 381 | 223,555 | 9233 | 43,478 | 37,196 | 1609 | 52,999 |
Exp exponential, ICSR individual case safety report, L90%CI lower 90% confidence interval, SD standard deviation, U90%CI upper 90% confidence interval, wpm words per minute
* PERT Beta Function Approximate Confidence Intervals
| A machine learning classifier achieved substantial agreement with a human expert when classifying social digital media posts as valid individual case safety reports |
| This level of performance could not be achieved with a conventional rule and dictionary approach to classification |
| Combining a machine learning approach with human review has the potential to be an effective and scalable solution to the challenge of identifying individual case safety reports within social digital media posts |