| Literature DB >> 31437955 |
Jake Vasilakes1,2, Yadan Fan1, Rubina Rizvi1,2, Anusha Bompelli1, Olivier Bodenreider3, Rui Zhang1,2.
Abstract
The use of dietary supplements (DSs) is increasing in the U.S. As such, it is crucial for consumers, clinicians, and researchers to be able to find information about DS products. However, labeling regulations allow great variability in DS product names, which makes searching for this information difficult. Following the RxNorm drug name normalization model, we developed a rule-based natural language processing system to normalize DS product names using pattern templates. We evaluated the system on product names extracted from the Dietary Supplement Label Database. Our system generated 136 unique templates and obtained a coverage of 72%, a 32% increase over the existing RxNorm model. Manual review showed that our system achieved a normalization accuracy of 0.86. We found that the normalization of DS product names is feasible, but more work is required to improve the generalizability of the system.Entities:
Keywords: Dietary supplements; Natural Language Processing; RxNorm
Mesh:
Year: 2019 PMID: 31437955 PMCID: PMC6792000 DOI: 10.3233/SHTI190253
Source DB: PubMed Journal: Stud Health Technol Inform ISSN: 0926-9630
Figure 1–The study design.
Term types used in the product name normalization system.
| Term Type (Abbreviation) | Description | Example | Pattern Source |
|---|---|---|---|
| Animal Source (ANM) | The part of an animal from which the ingredient is derived. | Bone Marrow | TGA |
| Brand Name (BN) | Manufacturer’s name. | GNC | Annotation, rules |
| Certification (CERT) | Official certifications claimed by the product. | USP certified | TGA |
| Claim or Use (USE) | A description of the purported use of a dietary supplement. | Sleep aid | Annotation |
| Dose Form (DF) | The phsyical from of the product. | Capsule | TGA, RxNorm |
| Dose Form Group (DFG) | A grouping of dose forms related by route of administration. | Topical | TGA, RxNorm |
| Flavor (FLV) | The flavor of a supplement. | Strawberry | Annotation |
| Ingredient (IN) | Name of the dietary supplement ingredient. | Gingko Biloba | iDISK, rules |
| Plant Source (PLNT) | The part of a plant from which the ingredient is derived. | Leaf | TGA |
| Demographic or Population (POP) | The group of persons for whom the product is intended. | Children’s | TGA |
| Preparation (PREP) | A descriptor of how an ingredient is prepared. | Dried | TGA |
| Stop Word (STOP) | Uninformative words that are to be excluded from the normalized form. | With, Natural | Annotation |
| Strength (STR) | The quantity of the ingredient in a product. | 100 mg | TGA |
| Time of Use (TIM) | When the product is intended to be used. | Night time | TGA |
The 5 most common templates and their product name coverage across the development and evaluation sets along with examples for each.
| Frequency ranked templates | Example product name |
|---|---|
| BN IN STR (32.0%) | Bronson Laboratories Vitamin E 200 IU |
| BN IN (21.3%) | NutraBio Melatonin |
| BN IN DF (3.4%) | TERRAVITA Potassium Citrate Powder |
| BN IN STR DF (3.0%) | Optimum Nutrition Tribulus 625 MG Caps |
| BN IN PLNT (1.9%) | Nature’s Answer Hawthorn Berry |
Frequencies of templates generated on the development and evaluation sets that match RxNorm term types, computed using the fully matched product names. We do not include the following RxNorm term types: Precise Ingredient (PIN), Multiple Ingredients (MIN), Generic Pack (GPCK), Brand Name Pack (BPCK) as they are not applicable to this study.
| RxNorm Term Type | Corresponding Template | Dev Frequency | Eval Frequency |
|---|---|---|---|
| Ingredient (IN) | IN | 1 (0.01%) | 1 (0.04%) |
| Semantic Clinical Drug Component (SCDC) | IN STR | 1 (0.01%) | 0 |
| Semantic Clinical Drug Form (SCDF) | IN DF | 2 (0.02%) | 0 |
| Semantic Clinical Dose Form Group (SCDG) | IN DFG | 0 | 0 |
| Semantic Clinical Drug (SCD) | IN STR DF | 3 (0.03%) | 0 |
| Brand Name (BN) | BN | 209 (2.11%) | 10 (0.40%) |
| Semantic Branded Drug Component (SBDC) | BN IN STR | 3353 (33.85%) | 812 (32.78%) |
| Semantic Branded Drug Form (SBDF) | BN IN DF | 370 (3.74%) | 80 (3.23%) |
| Semantic Branded Dose Form Group (SBDG) | BN DFG | 0 | 0 |
| Semantic Branded Drug (SBD) | BN IN STR DF | 325 (3.28%) | 85 (3.43%) |
Overall accuracy on the evaluation set, reported for all evaluation names, only those which were fully matched, and only those that were partially matched.
| Match Type | Accuracy |
|---|---|
| Full + Partial + None | 0.86 |
| Full match only | 0.95 |
| Partial match only | 0.65 |