| Literature DB >> 35579817 |
Eva-Lisa Meldau1, Shachi Bista2, Emma Rofors2, Lucie M Gattepaille2.
Abstract
INTRODUCTION: Coding medicinal products described on adverse event (AE) reports to specific entries in standardised drug dictionaries, such as WHODrug Global, is a time-consuming step in case processing activities despite its potential for automation. Many organisations are already partially automating drug coding using text-processing methods and synonym lists, however addressing challenges such as misspellings, abbreviations or ambiguous trade names requires more advanced methods. WHODrug Koda is a drug coding engine using text-processing algorithms, built-in coding rules and machine learning to code drug verbatims to WHODrug Global.Entities:
Mesh:
Year: 2022 PMID: 35579817 PMCID: PMC9114093 DOI: 10.1007/s40264-022-01162-7
Source DB: PubMed Journal: Drug Saf ISSN: 0114-5916 Impact factor: 5.228
Fig. 1Steps for the extraction of the evaluation dataset and the general data statistics after each step
Description of Koda input fields
| Field name | Field type | Possibly missing | Koda input requirement |
|---|---|---|---|
| Verbatim | Free text | No | Mandatory |
| Route of administration | Structured (70 values possible) | Yes | Optional |
| Indication | Free text | Yes | Optional |
| Country | Structured (251 valuesa) | No | Optional |
aISO 3166-1 alpha-3 country code or unknown
Top-5 report countries in the evaluation dataset
| Country | % |
|---|---|
| USA | 43 |
| Korea | 12 |
| Germany | 6 |
| UK | 6 |
| France | 4 |
| Rest of the world | 29 |
Top-5 report qualification in the evaluation dataset
| Reporter qualification | % |
|---|---|
| Consumer | 39 |
| Physician | 24 |
| Other health professionals | 19 |
| Pharmacist | 13 |
| Lawyer | 3 |
| Unknown | 2 |
Top-5 reported drugs in the evaluation dataset
| Drug | % |
|---|---|
| Other anti-acne preparations for topical use [umbrella term] | 2.0 |
| Adapalene | 1.2 |
| Revlimid | 1.0 |
| Humira | 1.0 |
| Zantac | 0.9 |
Top-5 reported ATC codes in the evaluation dataset
| ATC level 2 | % |
|---|---|
| V91 Homeopathic preparation | 1.2 |
| V90 Unspecified herbal and traditional medicine | 0.5 |
| R01 Nasal preparations | 0.5 |
| D03 Preparations for treatment of wounds and ulcers | 0.2 |
| N05 Psycholeptics | 0.1 |
ATC Anatomical Therapeutic Chemical
Descriptive statistics of WHODrug records in the evaluation dataset
| Number of unique Koda inputs per WHODrug record | Number of unique verbatims reported per WHODrug record | Number of unique routes reported per WHODrug record (total = 67) | Number of unique indications reported per WHODrug record (total = 26,379) | |
|---|---|---|---|---|
| Minimum | 1 | 1 | 0 | 0 |
| 25th percentile | 1 | 1 | 1 | 1 |
| Median | 3 | 1 | 1 | 1 |
| 75th percentile | 11 | 2 | 2 | 4 |
| Maximum | 98,033 | 281 | 34 | 1487 |
Fig. 2a Koda’s automation level showing percentages of high certainty, suggested encodings and uncoded Koda inputs compared with the direct-match baseline. b Agreement between high-certainty Koda encodings and gold standard. c Agreement between Koda suggestions and gold standard
Fig. 3Number of options provided by Koda when coding as suggested
Centre and dispersion of the number of options for suggested encodings
| Min | Median | 75th pctl | 95th pctl | 99th pctl | Max |
|---|---|---|---|---|---|
| 1 | 1 | 2 | 4 | 6 | 42 |
Min minimum; max maximum; pctl percentile
Macro average precision, recall and F1 score across all WHODrug records in the gold standard, and the average of these metrics per WHODrug record weighted by the classes’ prevalence in the gold standard for WHODrug Koda and our direct-match baseline
| Accuracy | Macro average | Weighted average | |||||
|---|---|---|---|---|---|---|---|
| Precision | Recall | Precision | Recall | ||||
| Koda | 86.0% | 91.0% | 86.7% | 87.6% | 94.9% | 86.0% | 88.2% |
| Direct-match baseline | 60.4% | 71.9% | 62.1% | 64.4% | 88.6% | 60.4% | 66.1% |
Fig. 4Manual assessment results for Koda’s three confidence levels
Result of masking experiment, masking various fields
| WHODrug Koda is one of the first drug coding engines using artificial intelligence. |
| Originally developed for the use in clinical trials, Koda reaches equally good performance on adverse event reports. |
| Koda can automatically code large proportions of drugs, including ambiguous drug names, using its internal coding rules and additional information about the drug, such as route, indication and country. |
| Designed to code only when confident, Koda can identify challenging cases and leave these for manual coding while making helpful suggestions for a large proportion of inputs. |