| Literature DB >> 35834334 |
Benjamin Skov Kaas-Hansen1,2,3, Davide Placido2, Cristina Leal Rodríguez2, Hans-Christian Thorsen-Meyer4, Simona Gentile5, Anna Pors Nielsen2, Søren Brunak2, Gesche Jürgens1, Stig Ejdrup Andersen1.
Abstract
We sought to craft a drug safety signalling pipeline associating latent information in clinical free text with exposures to single drugs and drug pairs. Data arose from 12 secondary and tertiary public hospitals in two Danish regions, comprising approximately half the Danish population. Notes were operationalised with a fastText embedding, based on which we trained 10 270 neural-network models (one for each distinct single-drug/drug-pair exposure) predicting the risk of exposure given an embedding vector. We included 2 905 251 admissions between May 2008 and June 2016, with 13 740 564 distinct drug prescriptions; the median number of prescriptions was 5 (IQR: 3-9) and in 1 184 340 (41%) admissions patients used ≥5 drugs concomitantly. A total of 10 788 259 clinical notes were included, with 179 441 739 tokens retained after pruning. Of 345 single-drug signals reviewed, 28 (8.1%) represented possibly undescribed relationships; 186 (54%) signals were clinically meaningful. Sixteen (14%) of the 115 drug-pair signals were possible interactions, and two (1.7%) were known. In conclusion, we built a language-agnostic pipeline for mining associations between free-text information and medication exposure without manual curation, predicting not the likely outcome of a range of exposures but also the likely exposures for outcomes of interest. Our approach may help overcome limitations of text mining methods relying on curated data in English and can help leverage non-English free text for pharmacovigilance.Entities:
Keywords: data mining; machine learning; pharmacovigilance; safety signal detection; safety signal refinement
Mesh:
Year: 2022 PMID: 35834334 PMCID: PMC9541191 DOI: 10.1111/bcpt.13773
Source DB: PubMed Journal: Basic Clin Pharmacol Toxicol ISSN: 1742-7835 Impact factor: 3.688
FIGURE 1Schematic illustration of the end‐of‐end pipeline; see sections with corresponding headings in main text for details: The blue areas correspond to operationalisation of clinical notes, the green to training the signal detection component and the purple to evaluating the safety signals. The red and blue areas illustrate data capture from a single patient.
FIGURE 4Main UKU terms by domain. (A) The number of terms used in congruence analysis (total = 116). (B) All 345 single‐drug assessments (23 terms × 5 single‐drug signals = 115; 23 terms × 5 drug‐pair signals × 2 drugs per pair = 230). Light green: undocumented reaction possibly caused by single‐drug (B) or drug‐pair (D) exposure. Dark green: known reaction (B + D) or interaction (C). Dark grey: protopathic or indication bias. Light grey: spurious signal. Horizontal scales in panels B–D are counts.
FIGURE 2Fingerprint plots of the 23 main UKU terms and their 571 single‐drug signals. Inner circles: Each wedge represents one drug and transparency the signal score. Outer circles: Colours represent anatomical drug classes (ATC level 1); see legend. A, alimentary tract and metabolism; B, blood and blood forming organs; C, cardiovascular system; D, dermatologicals; G, genito‐urinary system and sex hormones; H, systemic hormonal preparations, excluding sex hormones and insulins; J, antiinfectives for systemic use; L, antineoplastic and immunomodulating agents; M, musculo‐skeletal system; N, nervous system; P, antiparasitic produts, insecticides and repellents; R, respiratory system; S, sensory organs; V, various [Correction added on 8 August 2022, after first online publication: Figure 2 has been corrected.]
FIGURE 3Mean‐adjusted cosine similarities between signal pairs. Rows and columns show pairwise similarities between signal profiles for specific terms. Dark blue squares signify agreement between blocks of terms (red represent disagreement). Black and white margin bars represent UKU side‐effect terms, and columns/rows within the span of one bar are synonyms. The cosine similarity of two identical signals equals 1 (e.g., the diagonal). See the supporting information for more detailed explanation.