| Literature DB >> 26951233 |
Antoni F Z Wisniewski1, Andrew Bate2, Cedric Bousquet3,4, Andreas Brueckner5, Gianmario Candore6, Kristina Juhlin7, Miguel A Macia-Martinez8, Katrin Manlik9, Naashika Quarcoo10, Suzie Seabroke11, Jim Slattery6, Harry Southworth12, Bharat Thakrar13, Phil Tregunno11, Lionel Van Holle14, Michael Kayser15, G Niklas Norén7.
Abstract
Over a period of 5 years, the Innovative Medicines Initiative PROTECT (Pharmacoepidemiological Research on Outcomes of Therapeutics by a European ConsorTium) project has addressed key research questions relevant to the science of safety signal detection. The results of studies conducted into quantitative signal detection in spontaneous reporting, clinical trial and electronic health records databases are summarised and 39 recommendations have been formulated, many based on comparative analyses across a range of databases (e.g. regulatory, pharmaceutical company). The recommendations point to pragmatic steps that those working in the pharmacovigilance community can take to improve signal detection practices, whether in a national or international agency or in a pharmaceutical company setting. PROTECT has also pointed to areas of potentially fruitful future research and some areas where further effort is likely to yield less.Entities:
Mesh:
Year: 2016 PMID: 26951233 PMCID: PMC4871909 DOI: 10.1007/s40264-016-0405-1
Source DB: PubMed Journal: Drug Saf ISSN: 0114-5916 Impact factor: 5.606
| Recommendation | Rationale |
|---|---|
| For overall timeliness in quantitative signal detection, analysis can be performed at the MedDRA® PT level | The PROTECT study found no advantage in conducting signal detection at levels of MedDRA® above the PT level and indeed observed a net loss in timeliness of quantitative signal detection from replacing an analysis at the PT level with one at a higher level of the hierarchy [ |
| Future research should evaluate the false-positive burden for signal detection at each level of the terminology | The false-positive burden was out of scope for the PROTECT study [ |
| Future research should evaluate tighter custom-made groupings of MedDRA® PTs for signal detection | Neither PTs nor HLTs are universally ideal for quantitative signal detection. Gains in time by aggregating PTs were observed in the PROTECT study when the terms were very similar, in a clinical sense [ |
| Future research should evaluate simultaneous analysis at different levels of the terminology | Parallel analyses at different terminological levels could improve timeliness but have resource implications [ |
| Future research should explore a broader range of ADRs | The PROTECT study was restricted to a selection of 13 ADR categories [ |
| Recommendation | Rationale |
|---|---|
| Knowledge engineering techniques may be considered as an adjunct to the creation of custom groupings and SMQs designed for the selection and extraction of case reports in pharmacovigilance databases | The PROTECT studies show it is possible to propose relevant novel groupings when no predefined grouping is available in MedDRA® for a given safety topic (e.g. anaphylactic shock or upper gastric hemorrhages) [ |
| Additional research would be necessary to validate if novel groupings generated by knowledge engineering techniques can help in the design of appropriate groupings of MedDRA® PTs for use in signal detection or evaluation | Given the current state of research, the clinical accuracy of groupings generated by knowledge engineering is such that manual clinical review is still required and this still needs to be validated against existing signal detection methods [ |
| Consideration should be given to piloting the use of knowledge engineering in developing groupings in other ontologies for application to other vocabularies and their possible linkage | Given that it has been shown to be possible to generate relevant novel groupings in MedDRA®, it is reasonable to expect that it would also be possible in other ontologies, e.g. ICD10, SNOMED |
| Recommendation | Rationale |
|---|---|
| The structured database of ADRs for centrally authorised products may be used as a reference to enhance pharmacovigilance for these products. The database is available here: | The PROTECT database has been used to provide a reference in evaluating signal detection methods and also to identify known ADRs emerging from routine signal detection activity, hence reducing unnecessary investigation. Other potential uses have been identified but not yet tested |
| Structured databases of ADRs and their synonyms mapped to MedDRA® should be set up to cover other products | The current database does not address the majority of products authorised under mutual recognition or national processes. Further work is required if similar benefits are to be realised in pharmacovigilance systems covering these products |
| A standard minimum structure should be established for all SPC ADR databases. The PROTECT database provides a useful template for this structure | To maintain the utility of databases and allow combinations across databases, a standardised core structure will be essential although the appropriate structure will depend on the intended functions of the database. Thus, a coordinated approach with wide consultation of intended users would be needed. Co-ordination of such an effort could be undertaken by a large regulatory agency or a cross-industry organisation. For a description of the database structure see: |
| To facilitate signal detection, exact MedDRA® terms should be used to identify ADRs in SPC section 4.8 where feasible. When an ADR involves very large numbers of terms and requires an ad hoc name, mapping from this name to the relevant MedDRA® terms should be maintained | This is essential to facilitate the construction of machine-readable data sources that have a number of potential uses including the facilitation of signal detection. See Eudralex Vol. 2: |
| Computerised text processing to help in mapping non-standard descriptions of ADRs to MedDRA® codes should be considered both for efficiency and consistency of coding practice | The approximate matching system was used to find appropriate MedDRA® terms when non-standard terminology was used in the SPC. This was usually successful and also much more efficient than human intervention alone [ |
| In setting up a database of ADRs, a programme of maintenance should be established to reflect changes to the SPC from emerging safety issues or MedDRA® version changes | Around half the ADRs listed in SPCs are added as a result of post-authorisation activities and hence the database will require continuous attention to keep it up to date |
| Consideration should be given to establishing the value and feasibility of having direct links between databases of SPC data and other product information sources to prevent the need for duplicate data sources, or avoid repetition in the types of data collected in the different sources | Lists of product ADRs are currently maintained by regulators and by MAHs. These may conflict either in detail or in coding conventions. Even when they agree, it is not efficient to maintain independent sources of identical data |
| Recommendation | Rationale |
|---|---|
| Choice of a disproportionality statistic for signal detection should be primarily based on ease of implementation, interpretation and optimisation of resources | Several disproportionality statistics are currently used in data mining spontaneous report databases. All these can achieve similar overall performance by choice of appropriate signal detection algorithm. Thus, choice should be based on criteria other than signal detection performance. Factors that might be considered include the computing requirements to run the system, the ease of maintaining and adapting the system and whether the operation of the system can be easily communicated to non-statisticians [ |
| Consideration should be given to the choice of signal detection algorithm used with disproportionality statistics because these can have important effects on quantitative signal detection performance | In contrast to the choice of disproportionality statistic, the choice of signal detection algorithm to define a SDR can provide very different levels of quantitative signal detection performance in terms of sensitivity, precision and time to signal. Hence, these criteria must be carefully selected on the basis of empirical evidence [ |
| For moderate to large spontaneous report databases, the relative performance of a quantitative signal detection algorithm in one database can be predicted from research in other databases | In the PROTECT study, signal detection algorithms with good signaling properties (in terms of sensitivity and positive predictive value) compared to other signal detection algorithms in one spontaneous report database also had relatively good signaling properties in other spontaneous report databases. The databases were both regulatory and company based and ranged in size from about 500,000 to 5,000,000 reports. Hence, relative performance in moderately large databases can be reliably inferred from evaluations in other settings [ |
| Absolute performance of the selected quantitative signal detection algorithm must be validated in the target spontaneous report database | Although the relative performance of signal detection algorithms is similar in different spontaneous report databases, the absolute performance characteristics may vary substantially. Hence, it is advisable to test the chosen disproportionality statistic with a range of signal detection algorithms within the target database [ |
| Consideration should be given to the effect of reduced positive predictive value with time on the market | There appears to be a reduction in precision with time and hence it may be more productive to put additional effort into the evaluation of signals from newer products. This finding has been validated excluding ADRs identified prior to authorisation from the reference database but further work is ongoing to characterise this effect [ |
| Consideration should be given to carrying out comparisons of quantitative signal detection methods across spontaneous report databases matching at the drug-event combination level rather than averaging over all drug-event combinations | It is possible that some ADRs may be more easily found in some databases. This was not investigated in PROTECT |
| It would be useful to conduct research to establish empirically the best method for quantitative signal detection in combination products | Combination products and single substances are often treated as unrelated in signal detection systems; a question remains whether combining data from these products will provide more or less accurate detection of signals |
| Consideration should be given to establishing a framework for selecting the best quantitative signal detection algorithm to suit the organisational goals and resource available within a pharmacovigilance group | Our research has shown a predictable trade-off between sensitivity and precision as far as purely quantitative signal detection algorithms are concerned. However, the means of striking the correct balance between sensitivity and the concomitant burden of false positives for a given organisation requires careful consideration |
| Recommendation | Rationale |
|---|---|
| Subgroup analyses may be beneficial in routine first-pass signal detection and should be considered. Stratified/adjusted analyses are unlikely to provide added value | In spontaneous report databases with over 0.5 million reports with broad diversity of products, subgroup analyses tended to perform better than stratified/adjusted analyses in all spontaneous report databases. Stratified/adjusted analyses were not found to increase either sensitivity or precision beyond random variation [ |
| Subgroup analyses can be considered beneficial in large international spontaneous report databases with over 2 million reports. Smaller datasets especially those with reports from only one country may need to consider a likely tradeoff between increased precision with some loss of sensitivity if subgroup analysis was to replace a crude or adjusted analysis | Subgroup analyses within the larger international datasets consistently showed benefits in both precision and sensitivity over crude analyses for two disproportionality methods/thresholds with differing performance characteristics. For the smaller spontaneous report databases, a gain in precision tended to result in some loss of sensitivity particularly for the stricter disproportionality method/threshold and for the regulatory dataset with reports from only one country [ |
| Subgrouping by seriousness of ADR or routinely excluding legal cases is unlikely to provide benefits in signal detection in terms of increased sensitivity or precision | Subgrouping by seriousness of the ADR defined using the IME lista had little effect on sensitivity or precision in any spontaneous report database. An analysis excluding cases submitted by lawyers also had little effect in all spontaneous report databases apart from the largest international database, which showed an increase in sensitivity and precision when legal cases are excluded [ |
| Subgrouping by gender, reporter type and 5-yearly time points may provide modest improvement in precision in all, and sensitivity in some, spontaneous report databases | Subgrouping by gender, reporter type and 5-yearly time points showed a modest improvement in precision for all spontaneous report databases and improved sensitivity for larger and international databases. Implementation of these subgroup analyses into routine signal detection may provide some benefit [ |
| Subgrouping by age, country or continent of origin, or a combination of these variables, may confer improved precision in all and enhanced sensitivity in some spontaneous report databases | Subgrouping by age, country of origin, continent of origin and a combination of these variables showed the highest improvement of precision in all spontaneous report databases and sensitivity in the larger databases. Implementation of these subgroup analyses may be beneficial in optimising quantitative signal detection [ |
| Subgrouping by vaccines/non-vaccines should not be implemented without careful consideration of the desired effect | Subgrouping by vaccines/non-vaccines resulted in a decrease in both precision and sensitivity in all spontaneous report databases. This was almost exclusively driven by the vaccines subgroup. These effects were owing to the suppression of listed vaccine ADRs as a result of comparing vaccines to each other. This may be desirable for certain reactions e.g. injection-site reactions but undesirable for other more serious reactions e.g. Guillian–Barre syndrome [ |
| Where subgrouping by variables with considerable missing data (e.g. age, gender) is undertaken, consideration should be given to including a stratum for unknown rather than excluding these cases | Including missing data in the subgroup analyses for age and gender increased sensitivity in all spontaneous report databases but tended to also decrease precision. In spontaneous report databases with higher levels of missing data (20+ %) the increase in sensitivity was greater than the decrease in precision [ |
| Subgrouping with a threshold based on number of reports may benefit from basing the threshold on the entire drug-event combination rather than within each individual stratum | Results for subgroup analyses that used an overall threshold of |
| Future research should evaluate the use of subgroup analysis in parallel with crude and/or adjusted analysis | Results for subgroup analyses that used an overall threshold of |
aThe EMA Important Medical Event Terms (IME) list (https://eudravigilance.ema.europa.eu/human/textforIME.asp)
| Recommendation | Rationale |
|---|---|
| Quantification of the masking effect of drugs on adverse reactions or adverse reactions on drugs could be used as a diagnostic tool of the extent of masking at two levels: | Results indicate that many drugs and adverse reactions are not affected by masking. Avoid complicating the analysis of data by adding an unmasking procedure when masking is not an issue. Formulas for assessing the effect of masking can be found in papers by Maignen et al. [ |
| If the masking effect of drugs on adverse reactions or adverse reactions on drugs is substantial, applying an unmasking algorithm should be considered | Reducing the effect of masking can increase the sensitivity of quantitative signal detection and, in principle, result in earlier identification of new drug-event associations [ |
| If false negatives are a major concern, unmasking of drugs and/or adverse reactions can be used in parallel with standard disproportionality analysis to improve sensitivity and timeliness but this benefit must be balanced against the cost in increased evaluation of false positives | If unmasking and standard disproportionality analyses are used in parallel, sensitivity will be equal to or higher than that of standard disproportionality analysis alone, but parallel analyses of the data also increase the false-positive rate, from spurious associations [ |
| Future research should explore the effectiveness of unmasking in terms of true/false positives revealed by an algorithm | In the absence of public health evidence from prospective studies on the benefits of removing the masking (or situations in which unmasking could be beneficial), the use of a particular algorithm should be directed by the rate of true signals/false positives revealed by the removal of the unmasking effect [ |
| Future research should compare disproportionality-based approaches for unmasking to other statistical approaches (e.g. logistic regression models) that could also be used to account for masking effects | This was outside of the scope of the PROTECT studies and there appears to be no published research on this topic |
| The use of simple unmasking algorithms as a means of reducing computation complexity and improving transparency should be explored in a future study | Results indicate that the performance of the simplified methods is comparable to that of more complex methods while the computational complexity is reduced and transparency improved, but further research is needed to fully explore this on datasets with different properties [ |
| Recommendation | Rationale |
|---|---|
| Statistical interaction measures with additive baseline models should be preferred over those with multiplicative baseline models for detecting signals of DDIs in spontaneous report databases | Statistical interaction measures with additive baseline models provided better sensitivity and equal or better specificity for both established and emerging DDIs [ |
| Future research should explore how statistical interaction measures with additive baseline models can best be incorporated in broader predictive models of adverse drug interactions, and in routine signal detection | This was out of scope for the PROTECT study, but recent research has found that predictive models accounting for multiple aspects of strength of evidence perform better than statistical measure of interaction alone [ |
| Recommendation | Rationale |
|---|---|
| Probabilistic record matching should be considered as an alternative to rule-based methods for duplicate detection in pharmacovigilance | Probabilistic record matching demonstrated a high predictive value above that of rule-based methods in our study, and is expected to improve efficiency and accuracy of duplicate management [ |
| Care should be taken to avoid case duplication during system changes/upgrades, considering both internal aspects and case transmission to external organisations | Our study showed that such changes on occasion resulted in very large numbers of duplicates [ |
| Rapid electronic re-transmission of spontaneous adverse drug reaction reports between databases can increase the number of duplicates to the extent that disproportionality statistics are are significantly affected, emphasising the need for swift and robust duplicate detection and management processes in databases that employ electronic data exchange | There are a large number of duplicates in spontaneous reporting databases, which are shown to affect quantitative signal detection scores. Rapid transmission of cases by electronic systems exacerbates this issue, meaning that accurate (and ideally, non-burdensome) duplicate detection processes are required to mitigate this unwanted impact on disproportionality statistics [ |
| Further work should be undertaken to explore lowering the threshold for the tested probabilistic record matching method and methods in general to evaluate the balance of false positives and negatives | Our study showed very few false positives, so it should be possible to increase sensitivity while ensuring false positive rates are kept at a reasonable level [ |
| Further evaluation should be done to understand the impact of automatic exclusion of potential duplicates from quantitative signal detection algorithms | This was beyond the scope of the PROTECT study. If this approach proved successful manual duplicate detection activities could be eliminated resulting in time/resource savings [ |
| Approaches to deidentifiying data (for example, scrambling dates and patient initials) in a way that permits duplicate detection should be pursued to allow for effective duplicate detection in databases that pool reports from different sources | This will reduce the negative impact of data privacy laws, for duplicate detection in international databases [ |
| Recommendation | Rationale |
|---|---|
| It may be possible to use the PRR at the early phase of the analysis of a new safety signal as an indicator of the likely strength of the association, should the signal be confirmed | The PRR observed before general awareness of an ADR shows a good correlation with the strength of the association in terms of relative risk or odds ratios later established by controlled studies. However, the PRR is not a direct estimator of the risk ratio and should be considered only in the absence of any more reliable evidence. The caveat ‘should the signal turn out to be confirmed’ must be observed. The study analysis does not compare the distribution of PRR values for ‘true’ and ‘false’ signals of disproportionate reporting and no inference can be made about whether the initial magnitude of PRR gives information about the nature of the association (causal or otherwise) [ |
| Following the initial detection of a signal of a specific drug-event association, PRR values based on clinical definitions of the adverse event may serve to provide an estimate of the likely size of clinical effect and be included among the criteria for initial prioritisation of its assessment | This study shows that, at least in this selected set of study cases, the underlying relative risk seemed to influence both the direction and magnitude of the PRR calculated with a similar case definition of the adverse event. Because the study sample comprises drug-event associations confirmed following assessments of diverse data sources and signal detection systems, the results may be applicable to PRRs calculated following both quantitative and traditional signal detection approaches [ |
| Consideration should be given to repeating these analyses in other ADR datasets to see whether they can be replicated and, if they can be, to establish the relevant scale factor | PRR values generated in different ADR datasets are unlikely to be the same. Other IMI PROTECT research has focussed on the performance of disproportionally statistics and of different signalling algorithms in different ADR datasets. However, to date no attention has been paid to describe and explain differences in the calculated PRR values in different datasets |
| Consideration should be given to further exploring whether PRRs adjusted by subgroup variables improves the correlation with measures of association from studies | The findings from an IMI PROTECT study on sub-grouping and stratification [ |
| Consideration should be given to exploring whether PRRs calculated for single MedDRA® PTs as is in EudraVigilance monitoring behave in the same way as the clinically defined case definitions in terms of correlation with measures of association from studies | The medical concepts used in the studies to derive the estimates of relative risk often described broader medical concepts than the MedDRA® PT level used in EudraVigilance for the PRR screening analysis (see also Recommendations in relation to Timeliness of Quantitative Signal Detection using MedDRA® Terms and Groupings) |
| Recommendation | Rationale |
|---|---|
| Longitudinal observational data should be further explored as a complement to signal detection using individual case reports but cannot currently replace individual case reports for this purpose | Individual case reports of suspected harm from medicines have a proven value for safety signal detection. However, they are not optimal for detecting increased rates of multifactorial adverse drug reactions with high background incidence. Longitudinal observational data provide the basis for epidemiological evaluation of such associations and should in principle enable their initial identification. However, we lack evidence to suggest that signal detection in longitudinal observational data can match the performance of signal detection in individual case reports for all drugs and medical events. In our evaluation of historical safety signals from the EMA, none of the positive controls could be detected in the THIN database at an early stage, whereas this was possible in VigiBase for some of the signals, even when we considered only the subset of the UK individual case reports [ |
| Safety signal detection in longitudinal observational data should include clinical, pharmacological and epidemiological review of identified temporal associations | Clinical review of statistical signals is fundamental in evaluating signals arising from spontaneous report databases. In our study of structured assessment for prospective identification on safety signals in electronic health records, three out of four temporal associations identified in the initial screen could be dismissed from further evaluation after initial review. Without review, the majority of the highlighted associations would have been false positives [ |
| To the extent possible, temporal associations detected in longitudinal observational data should be further explored with statistical graphical methods | In our prospective identification study, in-depth review of the chronograph temporal patterns proved a valuable component of the expert review. Univariate measures of temporal association may over-simplify or obscure the underlying patterns in such rich, complex and often long records [ |
| Safety signal detection in longitudinal observational data should account for limitations of the underlying data and take measures to ensure appropriate interpretation. In selecting the data set for analysis, one should account for both its size and scope (which drugs and diagnoses it captures) and for the fact that effective review of identified temporal associations requires expert knowledge of the underlying data, which is particularly relevant for large heterogeneous data sets | Our retrospective evaluation against historical safety signals for European centrally authorised products showed that none of them could be detected in THIN with the method we used, prior to the initial signal at the EMA. In many cases (to be further specified once we have the data), this was because of the drug not being available on the UK market at the time, or the drug or medical event not being reliably captured in primary care |
| Future research should explore the relative merits of performing safety signal detection in longitudinal observational data for groups of medicinal products and medical events, instead of or in parallel with that of individual products and events | In our comparison to published epidemiological studies, a common discrepancy was that they performed analysis for all drugs in a class together and/or for a number of related medical events together, which improves power. However, our detailed review often found substantial and important differences among different drugs in the same class or among different medical events in the same category. |
| Recommendation | Rationale |
|---|---|
| If prior knowledge suggests data from a particular organ system should be monitored, consider extreme value modelling on data arising from each trial for the compound of interest. For example, if preclinical data suggested a potential liver issue, prepare to model ALT; if another compound in the class showed kidney signals, prepare to model creatinine | Extreme-value modelling has been demonstrated, in various examples, to provide useful predictions of drug toxicities from early-phase data. If it is possible to pre-specify the modelling and prediction exercise, the results have greater credibility than if they are data driven, and resources can be allocated up front to ensure the work is done to appropriate deadlines [ |
| Some analyses will be data driven, suggested by observed extremes in the data. These could also be subjected to extreme value modelling and the statistical evidence thus acquired interpreted in context | Not all potential safety issues are known in advance, so some analyses are necessarily data driven. It is inappropriate to consider such analysis illegitimate or to yield unreliable results provided they are interpreted in context. Statistical inference is only one part of the larger process of scientific inference [ |
| Extreme value modelling can commence as early as phase I; however, in most cases, phase II data need to be available for reliable inferences to be made | Experience suggests that phase I data may be sufficient for extreme value modelling to identify toxicity, but that sometimes the sample sizes are too small. Modelling and prediction have the most value to add when the volume of available data is low, so such exercises should be commenced as soon as possible [ |
| Properly trained, Independent Data Monitoring Committees or Safety Review Boards are likely to benefit from extreme value modelling of unblinded data | When an IDMC exists, there are sometimes reasons for additional monitoring. It follows that applying proven methodology to emerging data will provide the best chance of identifying and characterising the safety issue as soon as possible [ |
| When extreme value modelling does not find evidence of a safety signal in studies of short duration, extrapolation beyond observed durations of exposure is discouraged | It is reasonable to expect that some toxic effects of drugs will not manifest themselves until several weeks or months of exposure have occurred. If extreme values are not observed at relevant doses in short trials, proceed with caution, acknowledging that they could occur after longer durations of exposure |
| Multiplicity adjustment provides a useful tool to improve the positive predictive value in signal detection in clinical trial data. The use of multiplicity adjustment needs to be evaluated against the size of the available clinical trial database | The ability for ADR detection is highly influenced by ADR frequency in the source dataset. Thus, database size and event reporting frequency must be taken into consideration when the use of multiplicity adjustments for ADR candidate selection is considered [ |
| The use of Bayesian Hierarchical Models can improve the efficiency of signal detection through borrowing of strength from other relevant events in the clinical trial dataset. This must be weighed against the more complex computational requirements of Bayesian modelling | Bayesian Hierarchical Models provided the best performance with regard to positive predictive value, specificity, sensitivity and negative predictive value, mainly owing to their ability to “borrow strength” across similar terms [ |
| The use of more specific MedDRA® groupings can further improve signal detection in clinical trial data | The use of narrow-term groupings for analysis provided slightly better results for signal detection compared with the analysis based on MedDRA® PTs alone [ |