| Literature DB >> 35442903 |
Josefien Van Olmen1, Jens Van Nooten2, Hilde Philips1, Annet Sollie1, Walter Daelemans2.
Abstract
BACKGROUND: Electronic medical records have opened opportunities to analyze clinical practice at large scale. Structured registries and coding procedures such as the International Classification of Primary Care further improved these procedures. However, a large part of the information about the state of patient and the doctors' observations is still entered in free text fields. The main function of those fields is to report the doctor's line of thought, to remind oneself and his or her colleagues on follow-up actions, and to be accountable for clinical decisions. These fields contain rich information that can be complementary to that in coded fields, and until now, they have been hardly used for analysis.Entities:
Keywords: COVID-19; artificial intelligence; coding procedure; electronic medical records; feasibility study; natural language processing; precision model; prediction model; primary care; structured registry; text mining
Year: 2022 PMID: 35442903 PMCID: PMC9049643 DOI: 10.2196/37771
Source DB: PubMed Journal: JMIR Med Inform
Final list with signs and symptoms to be coded from the free text.
| Final symptoms—coded | Explanation |
| Sa1; SAb1 | Cough |
| S100; SA100 | Upper respiratory tract infection complaints |
| S101; SA101 | Dyspnea and shortness of breath |
| S7; SA7 | Thoracic pain or chest pain |
| S102; SA102 | Loss of taste or smell |
| S10; SA10 | History of fever |
| S112 | Pain or stiffness in muscles, joints, or neck |
| S109 | Complaints of throat or voice |
| S12 | Fatigue |
| S15 | Headache |
| S103; SA103 | Gastrointestinal complaints |
| S104 | Significant acute event or change |
| S105 | Chronic pulmonary complaints; smoking; potentially worsening |
| S105 | Other comorbidities or being pregnant |
| S106 | Known cardiovascular diseases or hypertension or relevant medication |
| S107 | Known diabetes or diabetes medication |
| S108 | Medication NSAIDc or immunosuppressive drugs |
| S113 | Palpitations or dizziness |
| S110 | General complaints as malaise and illness |
| S111 | Mental or sleeping problems |
| S63 | Close contact with a sick person (COVID-19 symptoms) or COVID-19–positive case |
| Od101 | Respiratory signs found during physical examination |
| O6 | Fever measured by health care staff |
| O102 | Ear-, nose-, or throat-positive signs during physical examination |
| O104 | Neurological symptoms |
| O103 | Circulatory positive signs: abnormal pulse rate, tension, or turgor of capillary refill |
| O19 | Impression of being ill |
aS: Subjective.
bA: absence of the symptom.
cNSAID: nonsteroidal anti-inflammatory drugs.
dO: Objective.
Total number of entries, average amount of tokens per entry, and total amount of tokens for the training, test portions, and the entire data set.
| Portion | Entries, n (%) | Average tokens per entry, n | Total tokens, n |
| Train | 1966 (85) | 24 | 53,929 |
| Test | 347 (15) | 31 | 10,779 |
| Total | 2313 (100) | 28 | 64,708 |
Figure 1Code distribution in the data set. Codes to the right of the threshold line were removed for the experiments where a frequency threshold was employed.
Figure 2Distribution of the percentage of entries in the data set assigned to a particular number of codes.
Average results for the different models on test data with a frequency threshold for the codes (codes occurring at least 50 times).
| Method | Weighted precision | Weighted specificity | Weighted recall | Weighted F1 |
| Binary Relevance (SGDa classifier) | 0.69 | 0.93 | 0.52 | 0.59 |
| BERTje | 0.77 | 0.97 | 0.68 | 0.70 |
| BERTje (domain adaptation) | 0.74 | 0.96 | 0.62 | 0.67 |
aSGD: Stochastic Gradient Descent.