| Literature DB >> 34556781 |
Neil S Zheng1,2, V Eric Kerchberger1,3, Victor A Borza4, H Nur Eken4, Joshua C Smith1, Wei-Qi Wei5,6.
Abstract
The MEDication-Indication (MEDI) knowledgebase has been utilized in research with electronic health records (EHRs) since its publication in 2013. To account for new drugs and terminology updates, we rebuilt MEDI to overhaul the knowledgebase for modern EHRs. Indications for prescribable medications were extracted using natural language processing and ontology relationships from six publicly available resources: RxNorm, Side Effect Resource 4.1, Mayo Clinic, WebMD, MedlinePlus, and Wikipedia. We compared the estimated precision and recall between the previous MEDI (MEDI-1) and the updated version (MEDI-2) with manual review. MEDI-2 contains 3031 medications and 186,064 indications. The MEDI-2 high precision subset (HPS) includes indications found within RxNorm or at least three other resources. MEDI-2 and MEDI-2 HPS contain 13% more medications and over triple the indications compared to MEDI-1 and MEDI-1 HPS, respectively. Manual review showed MEDI-2 achieves the same precision (0.60) with better recall (0.89 vs. 0.79) compared to MEDI-1. Likewise, MEDI-2 HPS had the same precision (0.92) and improved recall (0.65 vs. 0.55) than MEDI-1 HPS. The combination of MEDI-1 and MEDI-2 achieved a recall of 0.95. In updating MEDI, we present a more comprehensive medication-indication knowledgebase that can continue to facilitate applications and research with EHRs.Entities:
Mesh:
Year: 2021 PMID: 34556781 PMCID: PMC8460636 DOI: 10.1038/s41598-021-98579-4
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1Flowchart outlining the process of building MEDI-2 and quantifying the medication and indications identified at each major step. RxCUI = RxNorm concept unique identifiers; HPS = high precision subset.
Figure 2Weighted Venn diagram of distribution of medications within the six resources for MEDI-2. Each colored area represents a different resource. The larger number in each colored area represents the number of medications found in the combination of resources labeled by the smaller numbers in the parenthesis. The numbers in the parenthesis correspond with the numbers in the color legend. The circle area sizes are proportional to the number of medications–indications that were found within the corresponding resource(s).
Summary of counts of medications, ICD codes, and indications for MEDI-2.
| Resource | Medications (% of total) | ICD-9-CM (% of total) | ICD-10-CM (% of total) | ICD-9-CM Indications (% of total) | ICD-10-CM Indications (% of total) |
|---|---|---|---|---|---|
| RxNorm | 1922 (63.4) | 1302 (42.4) | 2125 (39.5) | 10,221 (13.6) | 14,830 (13.3) |
| Mayo Clinic | 1696 (56.0) | 1305 (42.5) | 2137 (39.8) | 14,027 (18.7) | 20,193 (18.2) |
| MedlinePlus | 1273 (42.0) | 1270 (41.3) | 2095 (39.0) | 15,631 (20.8) | 22,692 (20.4) |
| SIDER 4.1 | 1042 (34.4) | 1833 (59.7) | 3206 (59.7) | 15,765 (21.0) | 23,351 (21.0) |
| WebMD | 2289 (75.5) | 1484 (48.3) | 2500 (46.5) | 32,408 (43.2) | 44,627 (40.2) |
| Wikipedia | 1999 (66.0) | 2417 (78.7) | 4143 (77.1) | 20,624 (27.5) | 31,704 (28.5) |
| Union of all resources | 3031 | 3072 | 5373 | 74,971 | 111,093 |
Figure 3Weighted Venn diagram of distribution of medication-indication pairs within the six resources for MEDI-2, stratified by ICD-9-CM (left) and ICD-10-CM (right). Each colored area represents a different resource. The larger number in each colored area represents the number of medications found in the combination of resources labeled by the smaller numbers in the parenthesis. The numbers in the parenthesis correspond with the numbers in the color legend. The circle area sizes are proportional to the number of medications–indications that were found within the corresponding resource(s). ICD International Classification of Diseases.
Estimated precision of MEDI-2 for different resource combinations.
| Resource | Medications | Indications pairs | Total reviewed a | True positive | Precision |
|---|---|---|---|---|---|
| RxNorm | 1922 | 25,051 | 91 | 85 | 0.93 |
| Mayo Clinic | 1696 | 34,220 | 106 | 86 | 0.81 |
| MedlinePlus | 1273 | 38,323 | 105 | 82 | 0.78 |
| SIDER 4.1 | 1042 | 39,116 | 102 | 78 | 0.76 |
| WebMD | 2289 | 77,035 | 126 | 95 | 0.75 |
| Wikipedia | 1999 | 52,325 | 111 | 82 | 0.74 |
| 1 resource | 2892 | 135,787 | 174 | 87 | 0.50 |
| 2 resources | 1517 | 15,789 | 88 | 65 | 0.74 |
| 3 resources | 939 | 5863 | 63 | 56 | 0.89 |
| 4 resources | 510 | 2451 | 60 | 56 | 0.93 |
| 5 resources | 233 | 1123 | 57 | 52 | 0.91 |
| ≥ 1 resource | 2899 | 161,013 | 0.55 | ||
| ≥ 2 resources | 1621 | 25,226 | 0.80 | ||
| ≥ 3 resources | 1066 | 9437 | 0.90 | ||
| ≥ 4 resources | 602 | 3574 | 0.92 | ||
| MEDI-2 (any resource) | 3031 | 186,064 | 0.60 | ||
| MEDI-2 HPS | 2000 | 34,488 | 0.92 | ||
aIndications that the reviewers deemed were too ambiguous were excluded from analysis (e.g., ICD10CM R69 = Illness, unspecified).
bHPS: High precision subset = indications from RxNorm or ≥ 3 resources.
Figure 4Weighted Venn diagram of the differences and overlap of medications included in MEDI-1, MEDI-1 HPS, MEDI-2, and MEDI-2 HPS. Each colored area represents a different resource. The larger number in each colored area represents the number of medications found in the combination of resources labeled by the smaller numbers in the parenthesis. The numbers in the parenthesis correspond with the numbers in the color legend. The circle area sizes are proportional to the number of medications found within the corresponding resource(s). HPS high precision subset.
Estimated precision and recall of MEDI-1 and MEDI-2.
| Resource | Medications | Indications pairs a | Precision b | Recall |
|---|---|---|---|---|
| MEDI-1 | 2701 | 56,550 | 0.60 | 0.79 |
| MEDI-1 HPS | 1764 | 11,552 | 0.92 | 0.55 |
| MEDI-2 | 3031 | 186,064 | 0.60 | 0.89 |
| MEDI-2 HPS | 2000 | 34,488 | 0.92 | 0.65 |
| MEDI-C (MEDI-1 + MEDI-2) | 3752 | 223,153 | 0.60 | 0.95 |
| MEDI-C HPS (MEDI-1 HPS + MEDI-2 HPS) | 2359 | 39,100 | 0.92 | 0.67 |
aIndication pairs for MEDI-2 include both ICD-9-CM indications and ICD-10-CM indications, which may include some overlap.
bEstimated precision for MEDI-1 and MEDI-1-HPS from Wei et al.[20].