| Literature DB >> 30279658 |
Émeline Courtois1, Antoine Pariente2, Francesco Salvo2, Étienne Volatier1, Pascale Tubert-Bitter1, Ismaïl Ahmed1.
Abstract
Classical methods used for signal detection in pharmacovigilance rely on disproportionality analysis of counts aggregating spontaneous reports of a given adverse drug reaction. In recent years, alternative methods have been proposed to analyze individual spontaneous reports such as penalized multiple logistic regression approaches. These approaches address some well-known biases resulting from disproportionality methods. However, while penalization accounts for computational constraints due to high-dimensional data, it raises the issue of determining the regularization parameter and eventually that of an error-controlling decision rule. We present a new automated signal detection strategy for pharmacovigilance systems, based on propensity scores (PS) in high dimension. PSs are increasingly used to assess a given association with high-dimensional observational healthcare databases in accounting for confusion bias. Our main aim was to develop a method having the same advantages as multiple regression approaches in dealing with bias, while relying on the statistical multiple comparison framework as regards decision thresholds, by considering false discovery rate (FDR)-based decision rules. We investigate four PS estimation methods in high dimension: a gradient tree boosting (GTB) algorithm from machine-learning and three variable selection algorithms. For each (drug, adverse event) pair, the PS is then applied as adjustment covariate or by using two kinds of weighting: inverse proportional treatment weighting and matching weights. The different versions of the new approach were compared to a univariate approach, which is a disproportionality method, and to two penalized multiple logistic regression approaches, directly applied on spontaneous reporting data. Performance was assessed through an empirical comparative study conducted on a reference signal set in the French national pharmacovigilance database (2000-2016) that was recently proposed for drug-induced liver injury. Multiple regression approaches performed better in detecting true positives and false positives. Nonetheless, the performances of the PS-based methods using matching weights was very similar to that of multiple regression and better than with the univariate approach. In addition to being able to control FDR statistical errors, the proposed PS-based strategy is an interesting alternative to multiple regression approaches.Entities:
Keywords: FDR; penalized multiple regression; pharmacovigilance; propensity score in high dimension; signal detection; spontaneous reports
Year: 2018 PMID: 30279658 PMCID: PMC6153352 DOI: 10.3389/fphar.2018.01010
Source DB: PubMed Journal: Front Pharmacol ISSN: 1663-9812 Impact factor: 5.810
Main advantages and disadvantages of the compared signal detection methods.
| Univariate approach (disproportionality method) | Very fast computation time | Do not account for multiple |
| Detection threshold based on classical test theory ( | exposures | |
| Penalized multiple logistic regression methods | Fast computation time | Detection threshold not relying on |
| Account for multiple exposures | classical test theory | |
| Propensity-score based methods | Account for multiple exposures | Long calculation time for |
| Detection threshold based on classical test theory ( | the propensity score estimation step |
Number of signals detected for each method.
| Univ | 359 | 105 | 86 | 81.90 | 75.44 | 19 | 18.10 | 78.89 |
| BIC-lasso | 173 | 66 | 64 | 96.97 | 56.14 | 2 | 3.03 | 97.78 |
| CISL-5% | 99 | 43 | 43 | 100.00 | 37.72 | 0 | 0.00 | 100.00 |
| CISL-10% | 109 | 48 | 48 | 100.00 | 42.11 | 0 | 0.00 | 100.00 |
| adjustPS-BIC | 308 | 96 | 81 | 84.38 | 71.05 | 15 | 15.62 | 83.33 |
| mwPS-BIC | 147 | 53 | 52 | 98.11 | 45.61 | 1 | 1.89 | 98.89 |
| iptwPS-BIC | 35 | 14 | 13 | 92.86 | 11.40 | 1 | 7.14 | 98.89 |
| adjustPS-CISL | 275 | 86 | 75 | 87.21 | 65.79 | 11 | 12.79 | 87.78 |
| mwPS-CISL | 121 | 50 | 49 | 98.00 | 42.98 | 1 | 2.00 | 98.89 |
| iptwPS-CISL | 63 | 17 | 14 | 82.35 | 12.28 | 3 | 17.65 | 96.67 |
| adjustPS-GTB | 273 | 85 | 74 | 87.06 | 64.91 | 11 | 12.94 | 87.78 |
| mwPS-GTB | 136 | 52 | 49 | 94.23 | 42.98 | 3 | 5.77 | 96.67 |
| iptwPS-GTB | 70 | 28 | 25 | 89.29 | 21.93 | 3 | 10.71 | 96.67 |
| adjustPS-hdPS | 310 | 93 | 83 | 89.25 | 72.81 | 10 | 10.75 | 88.89 |
| mwPS-hdPS | 139 | 54 | 53 | 98.15 | 46.49 | 1 | 1.85 | 98.89 |
| iptwPS-hdPS | 34 | 16 | 15 | 93.75 | 13.16 | 1 | 6.25 | 98.89 |
For the BIC-lasso a signal is a positive association in the selected model, for CISL-5% and CISL-10% a signal is a selected variable obtained according to a distribution quantile (5% or 10%) in the CISL methodology. For Univ and all the PS-based approaches, a signal is an association FDR significant. The number of true positives signals is the number of positives controls detected by the method. The number of false positives signals is the number of negatives controls detected by the method.
PPV: Positive Predictive Value.
FDP: False Discovery Proportion.
Figure 1(A) Number of positive reference signals detected according to number of signals generated by BIC-Lasso, mwPS-BIC and Univ, where signals are ranked in ascending order by their associated p-values for BIC-Lasso and by their adjusted p-values for mwPS-BIC and Univ. (B) Number of negative reference signals detected according to number of signals generated by BIC-lasso, mwPS-BIC and Univ, where signals are ranked in ascending order by their associated p-values for BIC-Lasso and by their adjusted p-values for mwPS-BIC and Univ.
Figure 2(A) Distribution of the first 147 signals generated between Univ, BIC-Lasso and mwPS-BIC. (B) Observed counts vs. expected counts of signals generated by Univ only, BIC-Lasso only, mwPS-BIC only and by the three methods, considering their first 147 generated signals. Observed counts n are number of reports which involved the signal considered, expected counts e are those expected if independence applies between the drug and the AE that form the signal considered . They are calculated as follows: for a signal (drug, AE) where N, N are the observed counts of drug and AE respectively, and N the total number of observations.