| Literature DB >> 31984370 |
Adam Spiro1, Jonatan Fernández García1, Chen Yanover1.
Abstract
OBJECTIVES: Identifying new relations between medical entities, such as drugs, diseases, and side effects, is typically a resource-intensive task, involving experimentation and clinical trials. The increased availability of related data and curated knowledge enables a computational approach to this task, notably by training models to predict likely relations. Such models rely on meaningful representations of the medical entities being studied. We propose a generic features vector representation that leverages co-occurrences of medical terms, linked with PubMed citations.Entities:
Keywords: MeSH headings; adverse drug reaction; drug repositioning; literature-based discovery; machine learning; medical informatics
Year: 2019 PMID: 31984370 PMCID: PMC6951958 DOI: 10.1093/jamiaopen/ooz022
Source DB: PubMed Journal: JAMIA Open ISSN: 2574-2531
MeSH term count per category
| Category | # Terms | % Terms |
|---|---|---|
| Chemicals and drugs | 10 086 | 32.7 |
| Diseases | 4711 | 15.3 |
| Organisms | 3678 | 11.9 |
| Analytical, diagnostic and therapeutic techniques, and equipment | 2885 | 9.4 |
| Phenomena and processes | 2227 | 7.2 |
| Anatomy | 1802 | 5.8 |
| Healthcare | 1687 | 5.5 |
| Psychiatry and psychology | 1042 | 3.4 |
| Anthropology, education, sociology, and social phenomena | 573 | 1.9 |
| Technology, industry, and agriculture | 565 | 1.8 |
| Disciplines and occupations | 403 | 1.3 |
| Geographicals | 385 | 1.2 |
| Information science | 338 | 1.1 |
| Named groups | 260 | 0.8 |
| Humanities | 171 | 0.6 |
Note. The full tree structure can be found in the NLM MeSH website (https://meshb.nlm.nih.gov/treeView).
Abbreviation: MeSH: Medical Subject Headings.
Drug count per MeSH term sub-category
| Sub-category | # Drugs | % Drugs |
|---|---|---|
| Organic chemicals | 1228 | 38.3 |
| Heterocyclic compounds | 933 | 29.1 |
| Polycyclic compounds | 257 | 8.0 |
| Amino acids, peptides, and proteins | 245 | 7.6 |
| Inorganic chemicals | 164 | 5.1 |
| Hormones, hormone substitutes, and hormone antagonists | 84 | 2.6 |
| Nucleic acids, nucleotides, and nucleosides | 71 | 2.2 |
| Carbohydrates | 67 | 2.1 |
| Lipids | 64 | 2.0 |
| Biological factors | 40 | 1.2 |
| Pharmaceutical preparations | 20 | 0.6 |
Note. Categories with less than 20 drugs are omitted.
Abbreviation: MeSH: Medical Subject Headings.
Figure 1.Drug and MeSH term co-occurrences; the number of drugs that co-appear with any number of MeSH terms (left) as well as the number of MeSH terms that coincide with different number of drugs (right). MeSH: Medical Subject Headings.
Figure 2.Task-specific performance. Markers indicate the average per-ADR (left) and per-indication (right) precision within the top-K ranked drugs (x-axis). ADR: adverse drug reaction.
Figure 3.Overall prediction accuracy. Precision-recall curves plotted for the 3 trained model types in predicting ADRs (left) and indications (right); the PRAUC is shown in parentheses. The inset zooms in on the high precision range. ADR: adverse drug reaction; PRAUC: precision-recall area under the curve.
Figure 4.Per-drug performance. Markers indicate the average per-drug precision within the top-K ranked ADRs (left) and indications (right). ADR: adverse drug reaction.
Model performance for the OMOP data
| ADR | # of drugs with positive and negative relation | LR | GBM | Baseline |
|---|---|---|---|---|
| Acute kidney injury | 24 positive, 63 negative | 0.83 ± 0.08 | 0.81 ± 0.10 | 0.50 ± 0.22 |
| Chemical and drug induced liver injury | 81 positive, 36 negative | 0.91 ± 0.04 | 0.94 ± 0.02 | 0.85 ± 0.08 |
| Myocardial infarction | 36 positive, 65 negative | 0.66 ± 0.17 | 0.59 ± 0.24 | 0.51 ± 0.16 |
| Gastrointestinal hemorrhage | 24 positive, 67 negative | 0.77 ± 0.27 | 0.76 ± 0.13 | 0.56 ± 0.24 |
Note. For each ADR, the table shows the number of drugs in the OMOP data (mapped to MeSH) with positive relation (drug causes the ADR) and negative relation (drug does not cause the ADR) and the PRAUC scores for all single-task models.
Abbreviations: ADR: adverse drug reaction; GBM: gradient boosting machines; LR: logistic regression; MeSH: Medical Subject Headings; OMOP: observational medical outcomes partnership; PRAUC: precision-recall area under the curve.
PRAUC scores using subsets of the features
| Features types used | # of features | PRAUC score ADRs | PRAUC score indications |
|---|---|---|---|
| All features | 28 320 | 0.48 ± 0.01 | 0.34 ± 0.03 |
| All except drugs and diseases | 13 523 | 0.48 ± 0.01 | 0.32 ± 0.03 |
| Drugs | 10 086 | 0.47 ± 0.01 | 0.32 ± 0.03 |
| Diseases without signs and symptoms | 4344 | 0.47 ± 0.01 | 0.35 ± 0.03 |
| Signs and symptoms | 367 | 0.46 ± 0.01 | 0.27 ± 0.03 |
| Baseline | 1 | 0.12 ± 0.01 | 0.20 ± 0.07 |
Note. For each feature group, the table shows the number of features and the corresponding PRAUC scores for the ADRs and indications tasks.
Abbreviations: ADR: adverse drug reaction; PRAUC: precision-recall area under the curve.