| Literature DB >> 31711534 |
Hegler Tissot1, Richard Dobson2,3,4.
Abstract
BACKGROUND: There is an increasing amount of unstructured medical data that can be analysed for different purposes. However, information extraction from free text data may be particularly inefficient in the presence of spelling errors. Existing approaches use string similarity methods to search for valid words within a text, coupled with a supporting dictionary. However, they are not rich enough to encode both typing and phonetic misspellings.Entities:
Keywords: Misspelt names of drugs; Phonetic similarity; Similarity search
Mesh:
Substances:
Year: 2019 PMID: 31711534 PMCID: PMC6849162 DOI: 10.1186/s13326-019-0216-2
Source DB: PubMed Journal: J Biomed Semantics
Occurrence (#) of the 20 most cited drug names in a set of 4748 medical records written in Portuguese
| Drug name | Number of occurrences |
|---|---|
| Fluoxetina | 18624 |
| Paracetamol | 8697 |
| Diazepam | 8474 |
| Amitriptilina | 8463 |
| Omeprazol | 7825 |
| Dipirona | 7320 |
| Glicose | 5721 |
| Captopril | 5383 |
| Insulina | 5290 |
| Nimesulida | 4228 |
| Clorpromazina | 4226 |
| Enalapril | 4144 |
| Imipramina | 4135 |
| Sinvastatina | 3862 |
| Carbamazepina | 3853 |
| Amoxicilina | 3716 |
| Ibuprofeno | 3714 |
| Metformina | 3467 |
| Risperidona | 3464 |
| Atenolol | 3224 |
Fig. 1A pseudo-code to find similarity thresholds
Best threshold values found by the grid search method
| Parameter | Value | |
|---|---|---|
| Training Set | Number of true positives | 417 |
| Number of false positives | 31 | |
| Number of false negatives | 25 | |
| Precision | 0.931 | |
| Recall | 0.943 | |
| F1-score | 0.937 | |
| Validation Set | Number of true positives | 477 |
| Number of false positives | 39 | |
| Number of false negatives | 19 | |
| Precision | 0.924 | |
| Recall | 0.961 | |
| F1-score | 0.942 | |
| Thresholds | Phonetic similarity | 0.844 |
| String similarity | 0.831 |
Drugs with the highest number of misspelt variations
| Drug name | Number of similar words | Inexact Phonetic Match | F1-Score when using only string match | |||||
|---|---|---|---|---|---|---|---|---|
| Precision | Recall | F1 | ||||||
| Propanolol | 52 | 0.960 | 0.979 | 0.310 | 0.819 | 0.945 | 0.955 | |
| Glibenclamida | 49 | 1.000 | 1.000 | 0.829 | 0.956 | 0.955 | 0.961 | |
| Anlodipino | 49 | 0.913 | 0.976 | 0.944 | 0.612 | 0.938 | 0.942 | |
| Medroxiprogesterona | 47 | 1.000 | 0.914 | 0.763 | 0.881 | 0.955 | 0.927 | |
| Metoclopramida | 46 | 1.000 | 0.977 | 0.750 | 0.977 | 0.965 | 0.964 | |
| Loratadina | 46 | 0.837 | 0.947 | 0.889 | 0.774 | 0.963 | 0.955 | |
| Dexametasona | 45 | 1.000 | 0.800 | 0.889 | 0.615 | 0.915 | 0.952 | |
| Furosemida | 43 | 0.963 | 1.000 | 0.981 | 0.844 | 0.976 | 0.961 | |
| Prednisona | 42 | 1.000 | 0.878 | 0.935 | 0.730 | 0.952 | 0.956 | |
| Hidroclorotiazida | 41 | 1.000 | 0.975 | 0.776 | 0.962 | 0.952 | 0.940 | |
| Diclofenaco | 41 | 0.923 | 0.947 | 0.935 | 0.812 | 0.914 | 0.912 | |
| Ciprofloxacino | 37 | 1.000 | 0.918 | 0.520 | 0.878 | 0.935 | 0.922 | |
| Espironolactona | 36 | 1.000 | 1.000 | 0.714 | 0.941 | 0.948 | 0.962 | |
| Salbutamol | 36 | 1.000 | 0.972 | 0.819 | 0.956 | 0.976 | 0.943 | |
| Clonazepam | 34 | 1.000 | 1.000 | 0.692 | 0.969 | 0.961 | 0.939 | |
| Beclometasona | 33 | 1.000 | 0.967 | 0.777 | 0.935 | 0.961 | 0.921 | |
| Dexclorfeniramina | 31 | 1.000 | 0.903 | 0.949 | 0.708 | 0.872 | 0.959 | |
| Metronidazol | 30 | 0.965 | 0.965 | 0.816 | 0.964 | 0.942 | 0.926 | |
| Prednisolona | 30 | 0.965 | 0.965 | 0.965 | 0.739 | 0.925 | 0.966 | |
| Isossorbida | 29 | 0.963 | 1.000 | 0.761 | 0.960 | 0.957 | 0.936 | |
| Average F1-score | 0.718 | 0.934 | 0.958 | 0.945 | ||||
The best F1 score is highlighted for each drug
Examples of misspelt variations for “Fluoxetina” (Fluoxetine) and the corresponding Edit Distance (ED) values
| Misspelt variation | ED |
|---|---|
| dfluoxetina | 1 |
| flluoxetina | 1 |
| floxetina | 1 |
| fluoexetina | 1 |
| fluoixetina | 1 |
| fluopxetina | 1 |
| fluoxertina | 1 |
| fluoxetiina | 1 |
| fluoxetijna | 1 |
| fluoxetin | 1 |
| fluoxetinas | 1 |
| fluoxetna | 1 |
| fluoxetona | 1 |
| fluoxettina | 1 |
| fluoxetuina | 1 |
| fluoxewtina | 1 |
| fluoxtina | 1 |
| fluozxetina | 1 |
| fluuoxetina | 1 |
| fluuoxetina | 1 |
| fluxetina | 1 |
| fluyoxetina | 1 |
| flhuoxetin | 2 |
| flluoxetin | 2 |
| flouxetina | 2 |
| fluoxeitna | 2 |
| fluoxetian | 2 |
| fluxoetina | 2 |
| fluxotina | 2 |
| fluloextina | 3 |
| fluoxetinaate | 3 |
| fluoxetinapor | 3 |
| flxtina | 3 |
| fluoxetinapara | 4 |
| infloexetina | 4 |