| Literature DB >> 25749722 |
Vassilis G Koutkias1, Marie-Christine Jaulent.
Abstract
Computational signal detection constitutes a key element of postmarketing drug monitoring and surveillance. Diverse data sources are considered within the 'search space' of pharmacovigilance scientists, and respective data analysis methods are employed, all with their qualities and shortcomings, towards more timely and accurate signal detection. Recent systematic comparative studies highlighted not only event-based and data-source-based differential performance across methods but also their complementarity. These findings reinforce the arguments for exploiting all possible information sources for drug safety and the parallel use of multiple signal detection methods. Combinatorial signal detection has been pursued in few studies up to now, employing a rather limited number of methods and data sources but illustrating well-promising outcomes. However, the large-scale realization of this approach requires systematic frameworks to address the challenges of the concurrent analysis setting. In this paper, we argue that semantic technologies provide the means to address some of these challenges, and we particularly highlight their contribution in (a) annotating data sources and analysis methods with quality attributes to facilitate their selection given the analysis scope; (b) consistently defining study parameters such as health outcomes and drugs of interest, and providing guidance for study setup; (c) expressing analysis outcomes in a common format enabling data sharing and systematic comparisons; and (d) assessing/supporting the novelty of the aggregated outcomes through access to reference knowledge sources related to drug safety. A semantically-enriched framework can facilitate seamless access and use of different data sources and computational methods in an integrated fashion, bringing a new perspective for large-scale, knowledge-intensive signal detection.Entities:
Mesh:
Year: 2015 PMID: 25749722 PMCID: PMC4374117 DOI: 10.1007/s40264-015-0278-8
Source DB: PubMed Journal: Drug Saf ISSN: 0114-5916 Impact factor: 5.606
Data sources for signal detection: advantages, shortcomings and respective challenges for the application/development of computational signal detection methods based on empirical knowledge and the literature
| Signal source | Advantages | Shortcomings | Challenges |
|---|---|---|---|
| SRS databases | Highly relevant (specific focus on drug safety incident documentation) Controlled (data captured via predefined/standard forms) Coverage of diverse populations in international SRSs Public availability (in some cases, e.g. FAERS) | Insufficient reporting/missing or incomplete data/misattributed causal links [ Reporting bias [ Duplicate information [ Latency [ Difficult to account for confounding factors [ | Account for behavioral patterns in the reporting [ Cope with missing data and duplicates [ Account for the masking effect [ Use of independent information sources for hypothesis generation and assessment [ Identify complex safety patterns, exceeding single drug–adverse event pairs [ The ‘lack of denominatorʼ (i.e. only the number of people who are exposed to drugs and have the event known, not the number of people who are exposed to the drugs) [ |
| Observational healthcare databasesa | Longitudinal healthcare information maintained by professionals (healthcare or administrative stuff) No interviewer bias Enable active and real-time surveillance | Not designed for drug safety incident identification Complexities and potential inabilities to extract data (including access issues) Bias introduced by local terminologies/vocabularies [ | Need for sufficient data on drug exposure [ Multiple options available for defining health outcomes/events, and exposure [ Definition of data mappings in case applied to heterogeneous, diverse data [ Account for aspects such as confounding [ Replication of results [ |
| EHRs | Quality-controlled, detailed information (inpatient EHR data are supposed to provide accurate diagnosis, laboratory results, drug dosage and administration time) | Difficulty in acquiring an adequate sample size to cover diverse populations for drugs and events (however, population size can be increased by applying methods to combine data sources [ | |
| Administrative claims | Database may be significantly large, offering great variety in the population | Questionable information granularity and accuracy since they are maintained for billing purposes | |
| Free-text data sourcesb | Vast information content | Not designed for drug safety incident identification | Requirement for sophisticated linguistic processing to account for colloquial language, grammatical/spelling errors, etc. [ Context-based text mining [ |
| Clinical narratives | Produced by healthcare professionals Contain rich documentation of clinical conditions, treatments, and patient history | Complexities and potential inabilities to extract data (including access issues) Bias introduced by local documentation procedures [ | Account for temporal association among reports for the same patient [ Efficient big data management and analytics [ |
| Literature | Quality-controlled through peer-reviewing and (sometimes) indexing [ | May rely on assumptions and contain subjective conclusions | Cope with the varying strength of the provided evidence Utilize indexing annotations, apply pure text processing, or use both? [ |
| Patient-generated data | Real-time nature [ Large-scale data production [ | Highly subjective data [ Questionable reliability, validity and quality of data [ Duplicates (reproduction of content from users) | Encapsulate mechanisms for quality control [ Cope with missing data and filter duplicates [ Construct real-time surveillance methods [ Efficient big data management and analytics |
SRS spontaneous reporting system, FAERS FDA Adverse Event Reporting System, EHRs electronic health records
aThe features that are attributed to observational healthcare databases are also applicable to their subcategories, i.e. EHRs and administrative claims databases
bThe features that are attributed to free-text data sources are also applicable to their subcategories, i.e. clinical narratives, literature and patient-generated data
Summary of indicative comparative studies of signal detection methods: design and major findings
| Study | Data explored and methods applied | Comparison measure(s) | Gold standard | Analysis choices | Major findings |
|---|---|---|---|---|---|
|
| |||||
| van Holle and Bauchau [ |
| PPV primarily, and NPV, TP, FP, TN and FN secondarily | Events listed in a company’s Global Product Information System | Extensive parameterization of methods, i.e.: For the DP-based method, a total of 336 different combinations of four stratification factors (sex, age, etc.) and cut-off values were assessed For the TTO algorithm, 18 different combinations of alpha levels and time windows were investigated | TTO algorithm superior than MGPS, whatever the choice of parameter values Trade-off between Sp and Sn, and TTO dependent on data quality Suggestion to use both methods to benefit from the greater ability of TTO to detect TP signals, while avoiding signals being missed (or delayed) when the respective data are of low quality |
| Harpaz et al. [ |
| AUC | OMOP reference set [ | Analysis of performance at fixed levels of Sn and Sp Application of Youden’s weighted index to identify optimal signal thresholds Adoption of the broadest definition of events provided by OMOP ( | Multivariate modeling methods superior than DP-based methods DP-based methods simpler and faster to compute Not all events are equally detectable |
|
| |||||
| Ryan et al. [ |
| Threshold-based (i.e. Sn, Sp, PPV at RR thresholds) and threshold-free measures (e.g. AUC) | Nine drug-outcome pairs classified as ‘positive controls’ and 44 pairs classified as ‘negative controls’ | Multiple parameter settings explored per method | Many FP associations obtained from all methods No clear optimal algorithm (result dependent on the desired trade-off between Sn and Sp) |
| Schuemie et al. [ |
| AUC | Reference set of positive and negative controls for 10 (of the 23) important events proposed in EU-ADR [ | Assumption: For DP methods, the occurrence of the event of interest during a period of drug exposure constitutes a potential drug–event association Common settings applied for all methods to define exposures and outcomes | LEOPARD had a positive effect on the overall performance of all methods but some of the known ADRs were incorrectly flagged as protopathic bias LGPS and case–control adjusting for drug count slightly superior DP-based methods had lower performance, although not statistically significant Some ADRs were not detected by all methods |
| Reps et al. [ |
| Natural threshold based measures, Average precision at cut-off K, AUC | A set of known ADRs for specific drug families (NSAIDs, quinolones and calcium channel blocker drugs, with multiple drugs per category) | Assumption: all medical events that occur within 30 days of the drug prescription are considered as possible drug–event pairs (i.e. filtering chronic conditions) Multiple drugs from the same family were explored | No generally superior algorithm for all the drugs considered in the study None of the algorithms performed well at detecting rare events |
| Liu et al. [ |
| Precision, recall and | Two independent reference datasets of drug–event pairs: 1. 470 Drug–event pairs (10 drugs and 47 laboratory abnormalities) 2. 378 Drug–event pairs (9 drugs and 42 laboratory abnormalities) | Principle: Assess the correlation of abnormal laboratory results and specific drug administrations by comparing the outcomes of a drug-exposed group and a matched unexposed group A potential drug–laboratory test ADR involved an individual with a normal pre-drug laboratory test result who had an abnormal laboratory result after drug administration | Results varied according to the dataset: For the first dataset, ROR had the best For the second dataset, CHI, ROR, PRR, and Yule’s |
AUC area under the receiver operating characteristic curve, BCPNN Bayesian Confidence Propagation Neural Network, BHM Bayesian Hierarchical Model, CC case–control, DP disproportionality, EMR electronic medical record, ELR extended logistic regression, FN false negative, FP false positive, FAERS FDA Adverse Event Reporting System, GPS Gamma Poisson Shrinker, HUNT Highlighting Unexpected Temporal Association Rules Negating Temporal Association Rules, IRR incidence rate ratio, LGPS Longitudinal Gamma Poisson Shrinker, LEOPARD Longitudinal Evaluation of Observational Profiles of Adverse events Related to Drugs, LR logistic regression, MGPS multi-item Gamma Poisson Shrinker, MUTARA Mining Unexpected Temporary Association Rules given the Antecedent, NPV negative predictive value, NSAID non-steroidal anti-inflammatory drug, OMOP Observational Medical Outcomes Partnership, PPV positive predictive value, PRR proportional reporting ratio, Q3 quarter 3, RR Relative Risk, ROR reporting odds ratio, SCCS self-controlled case series, Sn sensitivity, Sp specificity, SRS spontaneous reporting system, THIN The Health Improvement Network, TN true negative, TP true positive, TPD temporal pattern discovery, TTO time-to-onset
Fig. 1Scaling-up computational signal detection towards combinatorial-integrated approaches: a the quite typical approach of one data source being explored by a single method in a ‘coupled’ fashion; b the benchmarking setting, i.e. one data source explored by various methods to enable the methods’ comparison; c studies assessing replication of outcomes, i.e. one method applied to various data sources (typically of the same type); and d the integrated perspective, i.e. various data sources of different types explored by diverse methods in parallel
Fig. 2Overview of the computational signal detection process in an integrated perspective. a Diverse data sources; b relevant computational signal detection methods per data source type; c the signal detection workflow; d stakeholders involved in signal detection for whom uniform-combined access to data and computational methods for signal detection shall be provided; e proposed add-ons for semantically-enriched, large-scale signal detection. ATC Anatomical Therapeutic Chemical classification system, EHR electronic health record, MedDRA Medical Dictionary for Regulatory Activities, NLP Natural Language Processing, OMOP Observational Medical Outcomes Partnership, SRS spontaneous reporting system
| A number of comparative studies assessing various signal detection methods applied to diverse types of data have highlighted the need for combinatorial-integrated approaches. |
| Large-scale integrated signal detection requires systematic frameworks in order to address the challenges posed within the underlying concurrent analysis setting. |
| Semantic technologies and tools may provide the means to address the challenges posed in integrated signal detection, and establish the basis for knowledge-intensive signal detection. |