| Literature DB >> 34210317 |
Jocelyn Dunstan1,2, Fabián Villena3,4, Jorge Pérez5,6, René Lagos7.
Abstract
BACKGROUND: In Chile, a patient needing a specialty consultation or surgery has to first be referred by a general practitioner, then placed on a waiting list. The Explicit Health Guarantees (GES in Spanish) ensures, by law, the maximum time to solve 85 health problems. Usually, a health professional manually verifies if each referral, written in natural language, corresponds or not to a GES-covered disease. An error in this classification is catastrophic for patients, as it puts them on a non-prioritized waiting list, characterized by prolonged waiting times.Entities:
Keywords: Decision support systems; Machine learning; Natural Language processing; Neural networks (computer); Waiting lists
Mesh:
Year: 2021 PMID: 34210317 PMCID: PMC8252255 DOI: 10.1186/s12911-021-01565-z
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1Diagram of the classification process. The input is the whole WL, and after removing the GES cases marked by the human, the non-GES WL is checked by the classification platform to make sure there are no GES cases in it, which should be prioritized by law. The panel on the left shows the frontend, while on the right is the backend
Extrinsic evaluation of Word2vec embeddings with identical hyperparameters, but different training corpora
| Training corpus | Vocabulary size (tokens) | ROC AUC |
|---|---|---|
| General dataset | 57,112 | 0.94 |
| Biomedical literature | 183,766 | 0.90 |
| General Spanish language | 1,000,653 | 0.90 |
Performance of machine learning models
| Model | ROC AUC (SD) |
|---|---|
| Logistic regression | 0.91 (7.8 e-4) |
| Support vector machine | 0.95 (5.4 e-4) |
| Random forest | 0.96 (5.2 e-4) |
| Multilayer perceptron | 0.95 (5.9 e-4) |
Best hyperparameters for Random Forest along with its hyperparameter grid
| Random forest | ||
|---|---|---|
| Hyperparameter | Best value | Hyperparameter grid |
| Number of estimators | 1600 | 200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800, 2000 |
| Minimum samples split | 5 | 2, 5, 10 |
| Max features | sqrt | sqrt, log2 |
| Max depth | 100 | 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110 |
| Bootstrap | True | True, false |
Performance of the Random Forest Classifier over the testing dataset and the ground truth constructed from human classifications
| Class | Precision | Recall | F1-score | Number of examples |
|---|---|---|---|---|
| Testing dataset | ||||
| GES | 0.67 | 0.90 | 0.77 | 37,502 |
| non-GES | 0.98 | 0.91 | 0.94 | 173,011 |
| Weighted average | 0.92 | 0.91 | 0.91 | 210,513 |
| Ground truth dataset | ||||
| GES | 0.92 | 0.55 | 0.69 | 260 |
| No-GES | 0.85 | 0.98 | 0.91 | 681 |
| Weighted average | 0.87 | 0.86 | 0.85 | 941 |
Expert performance over ground truth
| Expert | Weighted average | ||
|---|---|---|---|
| Precision | Recall | F1-Score | |
| 1 | 0.96 | 0.96 | 0.96 |
| 2 | 0.95 | 0.94 | 0.94 |
| 3 | 0.95 | 0.95 | 0.95 |
| Average | 0.95 | 0.95 | 0.95 |
Fig. 2ROC curve for the human classification, best machine learning model in the testing dataset and over the ground truth dataset. Area under the curve (AUC) is also shown in the figure
Fig. 3Distribution on non-detected GES cases (false negatives) in the ground-truth dataset
Fig. 4Wireframe representation of the platform. (1) Webpage to upload the spreadsheet in Microsoft Excel Format. This Excel contains both the GES and non-GES waiting lists. (2) Webpage showing the current spreadsheet being processed by the backend. (3) When conflicts are found, the user can manually solve each one by pressing if he/she is right, or the machine is right. At the end of this stage the user can download the corrected spreadsheet