INTRODUCTION: Duplicate case reports in spontaneous adverse event reporting systems pose a challenge for medical reviewers to efficiently perform individual and aggregate safety analyses. Duplicate cases can bias data mining by generating spurious signals of disproportional reporting of product-adverse event pairs. OBJECTIVE: We have developed a probabilistic record linkage algorithm for identifying duplicate cases in the US Vaccine Adverse Event Reporting System (VAERS) and the US Food and Drug Administration Adverse Event Reporting System (FAERS). METHODS: In addition to using structured field data, the algorithm incorporates the non-structured narrative text of adverse event reports by examining clinical and temporal information extracted by the Event-based Text-mining of Health Electronic Records system, a natural language processing tool. The final component of the algorithm is a novel duplicate confidence value that is calculated by a rule-based empirical approach that looks for similarities in a number of criteria between two case reports. RESULTS: For VAERS, the algorithm identified 77% of known duplicate pairs with a precision (or positive predictive value) of 95%. For FAERS, it identified 13% of known duplicate pairs with a precision of 100%. The textual information did not improve the algorithm's automated classification for VAERS or FAERS. The empirical duplicate confidence value increased performance on both VAERS and FAERS, mainly by reducing the occurrence of false-positives. CONCLUSIONS: The algorithm was shown to be effective at identifying pre-linked duplicate VAERS reports. The narrative text was not shown to be a key component in the automated detection evaluation; however, it is essential for supporting the semi-automated approach that is likely to be deployed at the Food and Drug Administration, where medical reviewers will perform some manual review of the most highly ranked reports identified by the algorithm.
INTRODUCTION: Duplicate case reports in spontaneous adverse event reporting systems pose a challenge for medical reviewers to efficiently perform individual and aggregate safety analyses. Duplicate cases can bias data mining by generating spurious signals of disproportional reporting of product-adverse event pairs. OBJECTIVE: We have developed a probabilistic record linkage algorithm for identifying duplicate cases in the US Vaccine Adverse Event Reporting System (VAERS) and the US Food and Drug Administration Adverse Event Reporting System (FAERS). METHODS: In addition to using structured field data, the algorithm incorporates the non-structured narrative text of adverse event reports by examining clinical and temporal information extracted by the Event-based Text-mining of Health Electronic Records system, a natural language processing tool. The final component of the algorithm is a novel duplicate confidence value that is calculated by a rule-based empirical approach that looks for similarities in a number of criteria between two case reports. RESULTS: For VAERS, the algorithm identified 77% of known duplicate pairs with a precision (or positive predictive value) of 95%. For FAERS, it identified 13% of known duplicate pairs with a precision of 100%. The textual information did not improve the algorithm's automated classification for VAERS or FAERS. The empirical duplicate confidence value increased performance on both VAERS and FAERS, mainly by reducing the occurrence of false-positives. CONCLUSIONS: The algorithm was shown to be effective at identifying pre-linked duplicate VAERS reports. The narrative text was not shown to be a key component in the automated detection evaluation; however, it is essential for supporting the semi-automated approach that is likely to be deployed at the Food and Drug Administration, where medical reviewers will perform some manual review of the most highly ranked reports identified by the algorithm.
Authors: Scott L DuVall; Alison M Fraser; Kerry Rowe; Alun Thomas; Geraldine P Mineau Journal: J Am Med Inform Assoc Date: 2011-09-16 Impact factor: 4.497
Authors: Taxiarchis Botsis; Thomas Buttolph; Michael D Nguyen; Scott Winiecki; Emily Jane Woo; Robert Ball Journal: J Am Med Inform Assoc Date: 2012-08-25 Impact factor: 4.497
Authors: Philip Michael Tregunno; Dorthe Bech Fink; Cristina Fernandez-Fernandez; Edurne Lázaro-Bengoa; G Niklas Norén Journal: Drug Saf Date: 2014-04 Impact factor: 5.606
Authors: Taxiarchis Botsis; Christopher Jankosky; Deepa Arya; Kory Kreimeyer; Matthew Foster; Abhishek Pandey; Wei Wang; Guangfan Zhang; Richard Forshee; Ravi Goud; David Menschik; Mark Walderhaug; Emily Jane Woo; John Scott Journal: J Biomed Inform Date: 2016-07-28 Impact factor: 6.317