Literature DB >> 35834334

Language-agnostic pharmacovigilant text mining to elicit side effects from clinical notes and hospital medication records.

Benjamin Skov Kaas-Hansen^1,2,3, Davide Placido², Cristina Leal Rodríguez², Hans-Christian Thorsen-Meyer⁴, Simona Gentile⁵, Anna Pors Nielsen², Søren Brunak², Gesche Jürgens¹, Stig Ejdrup Andersen¹.

Abstract

We sought to craft a drug safety signalling pipeline associating latent information in clinical free text with exposures to single drugs and drug pairs. Data arose from 12 secondary and tertiary public hospitals in two Danish regions, comprising approximately half the Danish population. Notes were operationalised with a fastText embedding, based on which we trained 10 270 neural-network models (one for each distinct single-drug/drug-pair exposure) predicting the risk of exposure given an embedding vector. We included 2 905 251 admissions between May 2008 and June 2016, with 13 740 564 distinct drug prescriptions; the median number of prescriptions was 5 (IQR: 3-9) and in 1 184 340 (41%) admissions patients used ≥5 drugs concomitantly. A total of 10 788 259 clinical notes were included, with 179 441 739 tokens retained after pruning. Of 345 single-drug signals reviewed, 28 (8.1%) represented possibly undescribed relationships; 186 (54%) signals were clinically meaningful. Sixteen (14%) of the 115 drug-pair signals were possible interactions, and two (1.7%) were known. In conclusion, we built a language-agnostic pipeline for mining associations between free-text information and medication exposure without manual curation, predicting not the likely outcome of a range of exposures but also the likely exposures for outcomes of interest. Our approach may help overcome limitations of text mining methods relying on curated data in English and can help leverage non-English free text for pharmacovigilance.

Entities: Chemical

Keywords: data mining; machine learning; pharmacovigilance; safety signal detection; safety signal refinement

Mesh：

Year: 2022 PMID： 35834334 PMCID： PMC9541191 DOI： 10.1111/bcpt.13773

Source DB: PubMed Journal: Basic Clin Pharmacol Toxicol ISSN： 1742-7835 Impact factor: 3.688

INTRODUCTION

Pharmacovigilance usually operates with two qualifications of the common term side effect: adverse drug events (ADEs) and adverse drug reactions (ADRs). ADEs are (noxious) medical events occurring while using medicines without assuming causal relationships. ADRs are subsumed by ADEs and constitute outcomes believed or known to be caused by exposure to a given medicinal product. , ADRs are usually classified in six groups, including dose‐related and not dose‐related. The latter are more unpredictable than the former and tend to be unrelated to the pharmacological effect, making them interesting from a safety signal detection perspective. ADR signal detection usually revolves around spontaneous case reports, collated nationally (e.g., Danish Medicines Agency) and internationally (e.g., EudraVigilance of the European Medicines Agency and VigiBase of the Uppsala Monitoring Centre ). This system suffers from several shortcomings, including the inherit filtering of reports making it into central databases, causing i.a. under‐reporting , , , that may even be biassed or otherwise influenced by, for example, media hype or legislation. These weaknesses, and the ever‐expanding digitisation of patient data, have sparked much interest in leveraging complementary data sources and technologies for pharmacovigilance, including longitudinal clinical data and natural language processing (NLP), the branch of machine learning for making textual data compatible statistical modelling. , Text mining uses NLP methodologies to extract structured information from inherently unstructured textual data. Its applications in pharmacovigilance often hinge on hand‐curated reference sets for named‐entity recognition or entity extraction , , , ; for example, previous work brought about a Danish dictionary of side effects. These tasks focus on assigning labels to free‐text terms so they can be codified and used as structured data akin to diagnostic codes recorded in national registers or adverse‐event databases. Creation and maintenance of such gold standards are costly and tedious, which likely explains the limited availability of tools and resources (including corpora) for non‐English textual data. For example, the official ADR vocabulary of the Danish Medicines Agency is MedDRA (Medial Dictionary for Regulatory Activities, in English), and submitters of case reports are encouraged to pick from English terms when submitting case reports. When non‐standard side effects are entered, these are manually mapped to the English MedDRA afterwards. Thus, it is near‐impossible to extract information across languages which would be useful for pharmacovigilant purposes. We posit that, to leverage clinical free text, complementing existing vocabulary‐based approaches to pharmacovigilant NLP with (semi‐)automatic information extraction from clinical free text deserves exploration and could facilitate vast screening of clinical free text. To this end, we report on the creation of one such complementary system: an end‐to‐end machine learning pipeline associating latent information in clinical free text with medication profiles to highlight potential adverse drug reactions to single drugs and drug pairs. We envision a system that accepts one of several free‐text side‐effect terms from the user and returns likely prominent exposures to undergo assessments akin to the evaluation of signals in spontaneous case reports.

METHODS AND MATERIALS

Data were obtained from electronic patient record (EPR) systems of 12 secondary and tertiary public hospitals in two Danish regions (Capital Region and Region Zealand), comprising approximately 2.6 million persons (about half the Danish population). We used data from a random sample of 500 000 adult (age ≥18 years) patients admitted between 1 January 2006 and 30 June 2016. The full analytic workflow is depicted schematically in Figure 1 and has five main components (detailed below): deriving doorstep medication profiles (red), training the embedding model (brown), operationalisation of clinical notes (blue), training the signal detection component (green) and evaluating the safety signals (purple).

FIGURE 1

Schematic illustration of the end‐of‐end pipeline; see sections with corresponding headings in main text for details: The blue areas correspond to operationalisation of clinical notes, the green to training the signal detection component and the purple to evaluating the safety signals. The red and blue areas illustrate data capture from a single patient.

Doorstep medication profiles

We considered only pre‐existing medication at start of admission and created one medication vector with one element per distinct single drug and drug pair in the full data set, using their respective anatomical therapeutic chemical (ATC) codes. Medication data were extracted from the electronic patient files and, so, reflect what was registered by a physician at time of admission. Elements corresponding to single drugs and drug pairs used by a given patient at doorstep were set to 1, the rest to 0. We only considered single drugs and drug pairs used in at least 1000 admissions.

Embedding model

An embedding packs high‐dimensional data into much fewer dimensions. Imagine, for example, one‐hot‐encoding words , in a corpus of clinical notes that collectively contain 345 671 unique words: The presence of a word in a given note could be represented by a (very sparse) vector with 345 670 zeros and a single 1. Learning a 100‐dimensional embedding of the words, in contrast, enables us to represent each word by a 100‐element vector that also captures latent information in unstructured text. , This vector will not be sparse (computationally convenient) and vectors of words with similar meanings will be similar even when lexicographically different (e.g., headache, sore head and neuralgia). Our embedding used tokens (one or several words together that collectively make up a term such as the three terms in italic in the previous sentence) and not only single words. See the supporting information for more detailed explanations. We used fastText to train the embedding model on the full corpus after slight pruning: Characters other than letters and numbers were removed, as were multiple white spaces. This yielded one white‐space separated string of words from each note. Hyperparameters were arbitrary but appropriate for the task at hand; for example, we used a 256‐dimensional embedding, sub‐word components were allowed to be between three and six characters long (minn and maxn fastText settings; ‘dys’ and ‘tonia’ are two sub‐word component examples of ‘dystonia’) and tokens were allowed to span up to three words to capture multi‐word signals (such as chest pain or sore head; wordNgrams fastText setting; N‐grams are tokens that consist of N words, where N is usually one, two and/or three: ‘tremor’ is a unigram, ‘idiopathic tremor’ a digram, and ‘intermittent dystonic tremor’ a trigram). All settings can be found in the analytic code; see below.

Operationalisation of clinical notes

The corpus comprised notes recorded within the first 48 h of admission; each note underwent five processing steps. First, the note was split into sentences. Second, within each sentence, we identified negations and for each of these excluded the subsequent five words or until end‐of‐sentence (heuristic based on Thomas et al. ). Third, we removed special characters from these non‐negated words. Fourth, we retained the pruned words that were neither Danish stop words (using nltk.corpus ) nor present in an in‐house list of almost 430 000 names used in Denmark. We forewent stemming and lemmatisation to let the model learn from natural words, to facilitate its downstream use (stemming and lemmatisation harmonise the corpus by transforming the words therein to [usually] shorter versions, i.e., their stems and lemmas). Finally, these retained tokens were concatenated by admission, essentially considering each admission one document (an oft‐used term in text‐mining and information retrieval literature). We computed the term‐frequency/inverse‐document‐frequency (TF‐IDF) as tf × log(N/[1 + df]) for retained tokens with 10 ≤ df ≤ 50 000 to omit tokens so common or rare that they unlikely contained information of interest. The final TF‐IDF values were not used to discard tokens at this step; that happened during training; see below. The final step of this component was converting tokens to their corresponding embedding vectors using the fastText model. This happened during training to not unnecessarily store vectors for tokens many of which were never used due to under‐sampling, see below.

Training the signal detection component

We constructed one multilayer perceptron (MLP, also called feed‐forward neural network) model with two hidden layers of 256 nodes for each of the 12 270 unique drugs and drug pairs in the medication profiles, setting the binary outcome to 1 if that drug (pair) was in the doorstep medication profile and 0 otherwise. Because of the imbalanced nature of the prediction task (Figure S1) and to obtain tolerable runtime, we used random 1:2 under‐sampling of the majority class to help the model focus on pertinent signals. We used all tokens for cases and the top 50 tokens based on TF‐IDF for controls. Then, the embedding vector for each token and its outcome became one observation for training the MLP model. We used sigmoid activation functions, the Adam optimiser and regularisation only in the form of early stopping based on area under the receiver operating characteristic curve (AUROC) in the internal validation set. The validation set came about by 80/20 random split‐sampling, deemed appropriate as this served solely for regularisation and not validation per se. Pertinence was operationalised as signals from well‐performing models with respect to discrimination and calibration‐in‐the‐small using the internal validation set. Discrimination was gauged by AUROCs, calibration‐in the‐small by the intercepts and slopes of linear regressions to the calibration curves of decile‐binned predicted probabilities and corresponding bin‐wise observed outcome proportions. Only models with intercepts in [−0.05, 0.05], slopes in [0.95, 1.05] and AUROCs ≥ 0.7 in the validation sets were considered to yield pertinent signals.

Evaluating safety signals

Congruence

To quantify the relevance of the signals, we compared the predicted odds with the odds in the background population and used these odds ratios as the signal scores. The congruence analysis served to qualitatively assess whether tokens with near‐identical or very similar clinical meanings (‘clinical cousins’) were assigned the same medication profiles regardless of lexicographical (dis)similarity. To this end, we used the terms in Figure 4 (their origin is explained below) and a list of clinical cousins for a total 116 terms. Congruence was, then, assessed visually by plotting pairwise adjusted cosine distances , between the signal profiles of all 116 terms, constructed as the union of all exposures in the top 50 of any of the terms.

FIGURE 4

Main UKU terms by domain. (A) The number of terms used in congruence analysis (total = 116). (B) All 345 single‐drug assessments (23 terms × 5 single‐drug signals = 115; 23 terms × 5 drug‐pair signals × 2 drugs per pair = 230). Light green: undocumented reaction possibly caused by single‐drug (B) or drug‐pair (D) exposure. Dark green: known reaction (B + D) or interaction (C). Dark grey: protopathic or indication bias. Light grey: spurious signal. Horizontal scales in panels B–D are counts.

Relevance

We used a reference set to gauge the signals' relevance, that is, to what extent signals are meaningful from a clinical and pharmacovigilance point of view. From the several potential reference sets that exist, we chose the items in the UKU (Udvalg for Kliniske Undersøgelser, English: Committee for Clinical Investigations) side effect rating scale. We manually reviewed the top 5 single‐drug and top 5 drug‐pair signals for each reference‐set term consulting three standard sources in clinical pharmacology, in Denmark: www.pro.medicin.dk (side effects; identical side‐effect information as the official Danish summaries of product characteristics [SPCs, available at www.produktresume.dk] with few exceptions), DrugBank (drug‐drug interactions; publicly available information; www.drugbank.ca ) and the Danish Interaction Database (drug‐drug interactions). We crafted a helper R package (promedreadr, doi: 10.5281/zenodo.5529817) to do the heavy lifting when collecting side‐effect information from www.pro.medicin.dk. DrugBank kindly made their data (v5.1.8) available to the first author for the purpose of this study. Each single‐drug signal was labelled, in this order, as (a) example of protopathic bias or bias‐by‐indication, (b) known side effect if reported for at least one product with that ATC code, (c) possible side effect (i.e., biologically plausible) or (d) spurious signal. For drug‐pair signals, we labelled each drug according to the single‐drug classification and further evaluated the signal from a drug‐drug interaction point of view on two axes: whether the two drugs are known to interact (is any interaction described in the Danish Interaction Database and/or DrugBank?) and relevance of signal (three options: known result of interaction, possible result of interaction or not caused by interaction). BSKH, GJ and SEA undertook signal assessment: Each signal was evaluated independently by two assessors, and disagreement (quantified by Cohen's kappa ) was resolved by consensus.

Ethics

This study is part of the BigTempHealth research programme for which approval was granted by the Danish Patient Safety Authority (3‐3013‐1723, then competent authority for ethical approval), the Danish Data Protection Agency (DT SUND 2016‐48, 2016‐50, 2017‐57) and the Danish Health Data Authority (FSEID 00003724). This report honours relevant items of the RECORD statement.

RESULTS

The final data set covered the period from 18 May 2008 through 30 June 2016 and comprised 2 905 251 inpatient visits (admissions) of which 1559 685 (54%) were of women. The median age was 58 years (inter‐quartile range, IQR: 33–73) and stable throughout the study period. These admissions comprised 10 788 259 clinical notes (18% of these patients' 60 960 247 notes) recorded within 48 h of admission and 13 740 564 doorstep drug prescriptions; the median number of doorstep‐profile prescriptions was 5 (IQR: 3–9) and in 1 184 340 (41%) admissions patients used ≥5 drugs concomitantly, a common polypharmacy threshold. Pruning and filtering left 179 441 739 tokens (per‐admission median: 51 [IQR: 29–80]) for training the 10 270 neural‐network models of which 3945 (38%) yielded pertinent signals (see Figure S2). Figure S1 shows the relative frequency of all 571 single‐drug exposures and (correspondingly) the top 571 drug‐pair exposures. The dominant drug classes were those affecting the nervous system (N, including psychiatric drugs), the alimentary tract and metabolism (A) and the cardiovascular system (C). The same picture emerged from the drug‐pair exposures: The most prevalent drug pairs involved these same three drug classes (e.g., AA, AC and AN). We devised so‐called fingerprints for each main UKU term visualising single‐drug exposures (Figure 2). These fingerprint plots illustrate that general or vague terms (e.g., depression, nausea and weight gain) are relatively strongly associated with many drug exposures (many wedges in the inner circle are dark) and that fewer drugs, of appropriate drug classes, light up for more specific terms (e.g., amenorrhoea/galactorrhoea and tremor/dystonia/parkinsonism). Also, fingerprints of clinically related terms (e.g., tremor, parkinsonism and dystonia) are similar but clearly distinct from those of other terms.

FIGURE 2

Fingerprint plots of the 23 main UKU terms and their 571 single‐drug signals. Inner circles: Each wedge represents one drug and transparency the signal score. Outer circles: Colours represent anatomical drug classes (ATC level 1); see legend. A, alimentary tract and metabolism; B, blood and blood forming organs; C, cardiovascular system; D, dermatologicals; G, genito‐urinary system and sex hormones; H, systemic hormonal preparations, excluding sex hormones and insulins; J, antiinfectives for systemic use; L, antineoplastic and immunomodulating agents; M, musculo‐skeletal system; N, nervous system; P, antiparasitic produts, insecticides and repellents; R, respiratory system; S, sensory organs; V, various [Correction added on 8 August 2022, after first online publication: Figure 2 has been corrected.]

Congruence

We hypothesised that signal profiles of clinical cousins would be similar regardless of lexicographical (dis)similarity. Indeed, as Figure 3 illustrates, signal profiles agreed within UKU terms, within UKU domains and within the mental‐neurological spectrum. As expected, the terms in the Other domain did not agree well, likely because this domain comprises very different side effects not fitting in elsewhere. Agreement was imperfect, which can be seen from, e.g., the light stripes representing terms with signal profiles distinct from all other terms. Several UKU terms have synonyms identical to those of other UKU terms so these will of course show perfect congruence, even if across UKU domains.

FIGURE 3

Mean‐adjusted cosine similarities between signal pairs. Rows and columns show pairwise similarities between signal profiles for specific terms. Dark blue squares signify agreement between blocks of terms (red represent disagreement). Black and white margin bars represent UKU side‐effect terms, and columns/rows within the span of one bar are synonyms. The cosine similarity of two identical signals equals 1 (e.g., the diagonal). See the supporting information for more detailed explanation.

Relevance

Agreement between the three assessors (BSKH, GJ and SEA) was moderate, with four values of Cohen's kappa (κ): relevance of drug 1 (κ = 0.49), relevance of drug 2 (κ = 0.72), whether the two drugs were known to interact in any way (κ = 1.0) and relevance of interaction (κ = 0.73); see pairwise κ values in Figure S3. The consensus assessments in Figure 4 show that the method picked up pertinent information. There were 345 single‐drug/potential‐reaction pairs (Figure 4, caption). Of these, 28 (8.1%) represented possible relationships between drug exposure and the reaction in question (Figure 4B, light green). For 186 (54%) signals, the reactions were either possible, known or due to protopathic or indication bias, all clinically meaningful relationships (Figure 4B, green and dark grey). Sixteen (14%) of the 115 drug‐pair signals were possible interactions; two (1.7%) were known and the rest not attributable to the drugs interacting (Figure 4C). Table S1 contains a selection of clinically interesting signals of possibly undocumented relationships between exposures and reactions. Main UKU terms by domain. (A) The number of terms used in congruence analysis (total = 116). (B) All 345 single‐drug assessments (23 terms × 5 single‐drug signals = 115; 23 terms × 5 drug‐pair signals × 2 drugs per pair = 230). Light green: undocumented reaction possibly caused by single‐drug (B) or drug‐pair (D) exposure. Dark green: known reaction (B + D) or interaction (C). Dark grey: protopathic or indication bias. Light grey: spurious signal. Horizontal scales in panels B–D are counts.

DISCUSSION

With a novel, language‐agnostic approach using word embeddings, we successfully built an end‐to‐end machine learning pipeline to elicit potential side effects of out‐of‐hospital drug exposure; the method may well complement existing safety signal detection and refinement. Using side effects from the psychiatric domain with (somewhat) well‐defined pharmacological properties, we illustrated that this method may offer genuine utility: manual review of signals for clinically relevant side effects illustrated the ability of the pipeline to highlight pertinent signals, with the ‘hit rate’ in the same order of magnitude as that of signal detection in spontaneous case reports. The novelty of our approach hinders direct comparisons with the published literature. Indeed, we try to fill a gap in the three‐axis categorisation of pharmacovigilance NLP: using non‐English text, overcoming the reliance on annotated data and leveraging EHR data. The number of published NLP applications in pharmacovigilance is growing: a review from 2012 included but seven studies, most of which used either simplistic keyword searches or more elaborate NLP methodologies (MediClass and MedLEE), predominantly in discharge summaries with relatively old data (1995 through 2008). More recently, a review from 2017 included 48 studies and emphasised the need for side‐effect detection methods to handle also polypharmacy‐related side‐effects, an issue intimately related to drug‐drug interactions. Side‐effect signal detection generally occurs in three types of data (spontaneous case reports, online forums including social media and longitudinal patient data) with the analytical approaches somewhere along two axes (modelling complexity and structuredness of the data). The long‐standing signal detection in spontaneous case reports rests on several large database (e.g., FAERS, EudraVigilance and VigiBase) collecting reports from healthcare staff, patients and pharmaceutical companies across the globe. The mainstay of this system has been disproportionality analytic with attempts at assessing DDIs, although NLP applications exist. , , , Several attempts at leveraging online content for pharmacovigilance have come about, , , , , especially using Twitter posts , , , , , , , , , with examples of trying to disentangle temporality of exposure‐event pairs. Although pharmacovigilant text mining in non‐English corpora is not the norm, examples do exist. A Danish dictionary of side effects was created and used for mining psychiatric patient files, relying on ontologies against which terms found in the clinical text were compared , , and, thus, different in scope than ours. Oronoz et al. sought to create a gold standard from EMR notes in Spanish that had been annotated by pharmacologists and pharmacists, with particular focus on medicines and diagnoses, while Segura‐Bedmar and Martinez sought to extract drug effects, both beneficial and noxious, from a Spanish online health forum. Another study used Japanese online platforms to evaluate basic characteristics of medicine users, and Ujiie et al. used medical articles, manually annotated by a medical engineer, in Japanese articles published for post‐marketing surveillance. Usui et al. devised a system to automatically assign ICD‐10 codes to Japanese free‐text patient complaints recorded by pharmacists when dispensing prescription medicines. These examples all share the foundational characteristic that they rely on curated ontologies for annotating their corpora. This eases evaluation as the curation process establishes a ground truth against which to compare the algorithm's output. Nevertheless, real‐life clinical corpora are moving targets, and the constant expansion and morphing of ontologies require continual and costly updating of annotation rules. Our approach stands in contrast to this: It is an end‐to‐end pipeline that requires no annotation of specific documents but acts a simple signal detection engine whose signals should then undergo expert review and can underpin evaluation of signals from other systems. With text embeddings at its core, the method allows for data augmentation without hand‐tuning ; we did not, however, venture down this path. Data mining models generally carry no causal meaning, and an oft‐raised issue of NLP is the need for (often large) annotated corpora which requires much work and continuous updating to remain relevant, the very thing we attempted to circumvent by reversing the prediction direction. Others have used word embeddings to operationalise free text in a non‐annotated manner. For example, Workman et al. showed that word embeddings can help overcome the problems of misspelling in a pharmacovigilance application; the RedMed model was trained on Reddit posts to extract health entities therein and performed reasonably well in such consumer‐generated content ; and combining pre‐trained word embeddings and conditional random fields could have flagged potential cutaneous adverse reactions to two chemotherapy classes in internet content before they were reported in the scientific literature. We trained one model per drug exposure for a total of 10 270 individual models. Although multi‐label architectures sometimes aid learning, we found this to drown pertinent signals in models with thousands of outputs nodes in a single network. This probably happens because the model can only optimise a single loss value, and we found no good way to automatically up‐ or down‐weigh contributions from different outputs. Further, in a multi‐label feed‐forward architecture, all weights are shared except those between the last hidden layer and the outputs, and there seems to be no good reason that predicting the risk of, say, exposure to metformin should be so intimately linked to that of olanzapine. One potentially viable alternative might have been a factorial‐like design in which each model had four mutually exclusive outcome nodes: exposure to none of the drugs, drug 1 only, drug 2 only and both drugs. As mentioned, several options exist for the reference sets in the relevance evaluation. Among these, we chose UKU side effect rating scale for three principal reasons. First, the UKU items were originally developed in a Nordic setting, so English‐Danish translations are readily available. Second, the UKU items were developed to gauge the side‐effect load of psychotropics, and so their (somewhat) well‐defined pharmacological mechanisms aid the assessment of biological plausibility of signals. Third, our results are readily put in a scientific context because the UKU scale has been used for several years and in different contexts, , , ensuring transparency with respect to and confidence in the translations for readers unfamiliar with the Danish language. When designing our approach, we had institutional/regulatory pharmacovigilance in mind, but alternative use cases exist, such as patient‐level decision‐making support and drug repurposing research. Including patient characteristics (e.g., age, sex and comorbidities) would enable clinical staff to query the method for single drugs or drug combinations potentially explaining the symptoms of their patients. Instead of looking at drugs given disproportionately often for a given term, we could focus on those given more rarely (so with the odds ratio of <1) potentially eliciting interesting novel target conditions for existing treatments similar in spirit to, e.g., Kessing et al. Combinatorial explosion is a well‐known challenge for the study of DDIs: A person using seven different medicines is exposed to 21 two‐way drug combinations. This challenge is only exacerbated if higher order combinations are considered. So, instead of modelling this explicitly, one could consider higher order interactions (e.g., three‐ or four‐way) by piecing together two‐way combinations that yield predicted probabilities above a certain threshold when multiplied, i.e., using a simplistic approximation to the predicted joint probability. An alternative approach, and indeed research question, would have been to compare new in‐hospital exposures with terms in subsequent days for immediate side effects. To be feasible, this would likely require a much larger data set to have sufficient exposure‐outcome pairs. It might, however, be less unwieldy as such an approach could focus on new(er) drugs drastically reducing the number of labels (and, thus, models to be trained).

Strengths and limitations

Our approach has six principal strengths. First, its unsupervised nature drastically reduces the need for manual work. This sets it apart from most other published studies using NLP in pharmacovigilance that tend to hinge on manual curation. Second, the method is language‐agnostic owing to its unsupervised nature, so that it does not rely on a vocabulary for looking up words. This renders the approach potentially useful for pharmacovigilance in also smaller languages. Third, our corpus is quite large, a natural consequence of its non‐reliance on curated data. Fourth, skipgrams (i.e., using sub‐word information) enable embedding of also i.a. word bigrams, misspellings and out‐of‐vocabulary words. Fifth, the crude and almost reductionist nature of our approach circumvents many difficulties posed by NLP because we break documents down to basic components and use them without modelling semantics and syntax. Finally, using the UKU side effect rating scale (i.e., a Nordic, translated, pharmacology‐based and widely used tool in Denmark) aids in contextualising the results. Even though the UKU side effect rating scale target psychotropics, interesting signals emerged also for somatic drugs (Table S1). This study, however, is subject to several limitations. First, the apparently well‐defined temporality obtained using doorstep medication profiles does not necessarily guarantee that what is reported in the text occurred after start of exposure. This potential problem, and source of protopathic bias, is not unique to our approach but rather necessitates cautious interpretation of any signal detection method, in longitudinal and case‐report settings alike. Second, we do not actually have data on prescriptions from the primary sector but rely on the doorstep registration of pre‐existing medication. Physicians are obliged to record these doorstep medication profiles, and we expect they generally be accurate despite occasional exceptions. Third, we considered exposure a binary notion and, due to the nature of the data, do not have well‐defined start‐of‐exposure. Doses could be considered, perhaps on an ordinal scale, if the interest revolves around dose‐related ADRs; the lack of well‐defined exposure time could be mitigated if doorstep medication profiles were based on data from the Danish Drug Statistics Register (unavailable to us when conducting this study). Fourth, word embeddings are powerful but not magical: The method clearly links clinical terms with similar meanings (even if lexicographically very different) to similar medications profiles, but the embedding model has difficulties with i.a. rare variations. These yield different embedding vectors resulting in noisy signal profiles fitting poorly with clinical expectations. However, rarity of terms also hampers other kinds of association‐mining or disproportionality‐analytic techniques, and our method might even be less prone because few mentions could suffice to at least hint at relevant clinical cousins. Fifth, even if the doorstep medication profiles are correct, we have no records of exposure to over‐the‐counter and herbal drugs, and we have to assume patients be compliant, just as any study using secondary data. Finally, we only had data on inpatients who were not, generally, admitted due to side effects although this is common. , , , Inpatients are not representative of the general population, and so, with the data at our disposal, the safety signals might be somewhat conditional on frailty although this could be mitigated by focusing on specific sub‐populations (e.g., elderly or oncological patients).

CONCLUSION

Combining various flavours of machine learning and data scientific tools, we have built an end‐to‐end pipeline for mining associations between free‐text information and medication exposure without the need for manual curation. We achieve this by turning things upside down, predicting not the likely outcome of a range of exposures but also the likely exposures for one or several outcomes of interest. The congruence analysis suggests that the method pick up pertinent information, even when supplied with synonyms, and with 8% of single‐drug and 14% of drug‐pair signals being possibly undocumented side effects, it provides a hit rate appropriate for its purpose: shortlisting few relevant signals from thousands of noisy ones. These shortlists would then undergo review by pharmacologists, pharmacists or other pharmacovigilance experts , to elicit truly unknown side effects (safety signal detection) or aid substantiating/refuting suspected side effects emerging from, e.g., spontaneous case reports (safety signal refinement). Our approach is original in the field of side effect detection and helps overcome many limitations of NLP methods relying on curated data including being language‐agnostic. Crucially, this makes our method appealing in settings that must make sense of non‐English free text for pharmacovigilance while lending itself well to alternative use cases, e.g., patient‐level decision‐making support and drug repurposing.

CONFLICT OF INTEREST

SB reports ownerships in Intomics A/S, Hoba Therapeutics Aps, Novo Nordisk A/S, Lundbeck A/S, and managing board memberships in Proscion A/S and Intomics A/S outside the submitted work. All other authors report no competing interests. Table S1. Examples of clinically interesting possible relationships between single‐drug and drug‐pair exposures and reactions. Figure S1. Proportions of included visits with all 571 single drugs (top panel) and the top‐571 two‐way drug combinations (lower panel), by anatomical drug classes (ATC level 1). Colours in the lower panel have come about by additive mixing of the drug‐class colours used in the top panel. The vertical scale is pseudo‐log‐transformed (linear between 0% and 1%). A: Alimentary tract and metabolism. B: Blood and blood forming organs. C: Cardiovascular system. D: Dermatologicals. G: Genito‐urinary system and sex hormones. H: Systemic hormonal preparations, excluding sex hormones and insulins. J: Antiinfectives for systemic use. L: Antineoplastic and immunomodulating agents. M: Musculo‐skeletal system. N: Nervous system. P: Antiparasitic produts, insecticides and repellents. R: Respiratory system. S: Sensory organs. V: Various. Figure S2. Intercepts (x axis) and slopes (y axis) of linear regressions of the calibration curves in the internal validation sets. Colour represents AUROC (0.5 corresponds to random guessing, 1.0 to perfect discrimination). Models with intercept > 0 tend to have slopes < 1 and vice‐versa, as a compensatory mechanism. Models represented by points inside the rectangle yield pertinent signals. Figure S3. Cohens kappa for each rater pair (coloured bars) and overall (shaded, wide bar in the background), by item. Click here for additional data file.

64 in total

1. Clarifying adverse drug events: a clinician's guide to terminology, documentation, and reporting.

Authors: Jonathan R Nebeker; Paul Barach; Matthew H Samore
Journal: Ann Intern Med Date: 2004-05-18 Impact factor: 25.391

2. Pharmacovigilance.

Authors: I Ralph Edwards
Journal: Br J Clin Pharmacol Date: 2012-06 Impact factor: 4.335

3. Crowdsourcing Twitter annotations to identify first-hand experiences of prescription drug use.

Authors: Nestor Alvaro; Mike Conway; Son Doan; Christoph Lofi; John Overington; Nigel Collier
Journal: J Biomed Inform Date: 2015-11-07 Impact factor: 6.317

4. Under-reporting of adverse drug reactions in general practice.

Authors: Y Moride; F Haramburu; A A Requejo; B Bégaud
Journal: Br J Clin Pharmacol Date: 1997-02 Impact factor: 4.335

5. Prediction models need appropriate internal, internal-external, and external validation.

Authors: Ewout W Steyerberg; Frank E Harrell
Journal: J Clin Epidemiol Date: 2015-04-18 Impact factor: 6.437

6. [Under-reporting of adverse drug reactions, a problem that also involves medicines subject to additional monitoring. Preliminary data from a single-center experience on novel oral anticoagulants].

Authors: Anna Patrignani; Giorgia Palmieri; Nino Ciampani; Vincenzo Moretti; Antonio Mariani; Lucia Racca
Journal: G Ital Cardiol (Rome) Date: 2018-01

7. Risk of hospitalisation associated with benzodiazepines and z-drugs in Italy: a nationwide multicentre study in emergency departments.

Authors: Niccolò Lombardi; Alessandra Bettiol; Giada Crescioli; Claudia Ravaldi; Roberto Bonaiuti; Mauro Venegoni; Giuseppe Danilo Vighi; Alessandro Mugelli; Guido Mannaioni; Alfredo Vannacci
Journal: Intern Emerg Med Date: 2020-04-24 Impact factor: 3.397

8. A method for data-driven exploration to pinpoint key features in medical data and facilitate expert review.

Authors: Kristina Juhlin; Kristina Star; G Niklas Norén
Journal: Pharmacoepidemiol Drug Saf Date: 2017-08-16 Impact factor: 2.890

9. Social Media Surveillance of Multiple Sclerosis Medications Used During Pregnancy and Breastfeeding: Content Analysis.

Authors: Bita Rezaallah; David John Lewis; Carrie Pierce; Hans-Florian Zeilhofer; Britt-Isabelle Berg
Journal: J Med Internet Res Date: 2019-08-07 Impact factor: 5.428

10. Learning signals of adverse drug-drug interactions from the unstructured text of electronic health records.

Authors: Srinivasan V Iyer; Paea Lependu; Rave Harpaz; Anna Bauer-Mehren; Nigam H Shah
Journal: AMIA Jt Summits Transl Sci Proc Date: 2013-03-18

1 in total

1. Language-agnostic pharmacovigilant text mining to elicit side effects from clinical notes and hospital medication records.

Authors: Benjamin Skov Kaas-Hansen; Davide Placido; Cristina Leal Rodríguez; Hans-Christian Thorsen-Meyer; Simona Gentile; Anna Pors Nielsen; Søren Brunak; Gesche Jürgens; Stig Ejdrup Andersen
Journal: Basic Clin Pharmacol Toxicol Date: 2022-07-26 Impact factor: 3.688

1 in total