| Literature DB >> 35040795 |
Adrian Ahne1,2, Guy Fagherazzi3, Xavier Tannier4, Thomas Czernichow2, Francisco Orchard2.
Abstract
BACKGROUND: The amount of available textual health data such as scientific and biomedical literature is constantly growing and becoming more and more challenging for health professionals to properly summarize those data and practice evidence-based clinical decision making. Moreover, the exploration of unstructured health text data is challenging for professionals without computer science knowledge due to limited time, resources, and skills. Current tools to explore text data lack ease of use, require high computational efforts, and incorporate domain knowledge and focus on topics of interest with difficulty.Entities:
Keywords: active learning; classification; clinical decision making; clinical decision support; digital health; evidence-based medicine; hierarchical clustering; medical informatics; memory consumption; natural language processing; transparency
Mesh:
Year: 2022 PMID: 35040795 PMCID: PMC8808347 DOI: 10.2196/27434
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 5.428
Figure 1Overview of user interaction with the visual interface. SVM: support vector machine.
Figure 2Iterative user interaction via the user interface following the 3 steps of exploring, annotating, and reiterating. To simplify, in iteration 1, no more classifiers are created. In a real-case scenario, a user usually defines several classifiers in the first iterations.
Figure 3Classification and clustering tree after several iterations.
Figure 4Real clustering example of a node, showing the headwords of its children. For each child, 3 sample titles of an abstract are provided. Note: only the titles and not the entire abstract are shown due to limited space.
Figure 5Visual user interface where colored circles represent user-defined topics (classifiers). Clicking on one of the nodes zooms into the node and shows the documents of the node on the bottom left. The headwords are shown in the white circles for each node.
Figure 6In active learning strategy, the positive tree is the subtree under the classifier node type 1, and the negative tree is the subtree under its clustering brother. On the left side, a sample of the document selection process is provided.
Diabetes related MeSHa codes with number of documents per MeSH code.
| Diabetes mellitus (C19.246) | N | |||
| Diabetes complications (C19.246.099) | 5000 | |||
| Diabetic angiopathies (C19.246.099.500) | 3026 | |||
| Diabetic foot (C19.246.099.500.191) | 4424 | |||
| Diabetic retinopathy (C19.246.099.500.382) | 5000 | |||
| Diabetic cardiomyopathies (C19.246.099.625) | 386 | |||
| Diabetic coma (C19.246.099.750) | 97 | |||
| Hyperglycemic hyperosmolar nonketotic coma (C19.246.099.750.490) | 97 | |||
| Diabetic ketoacidosis (C19.246.099.812) | 1308 | |||
| Diabetic nephropathies (C19.246.099.875) | 5000 | |||
| Diabetic neuropathies (C19.246.099.937) | 3662 | |||
| Diabetic foot (C19.246.099.937.250) | 4424 | |||
| Fetal macrosomia (C19.246.099.968) | 1282 | |||
| Diabetes, gestational (C19.246.200) | 5000 | |||
| Diabetes mellitus, experimental (C19.246.240) | 5000 | |||
| Diabetes mellitus, type 1 (C19.246.267) | 5000 | |||
| Wolfram syndrome (C19.246.267.960) | 228 | |||
| Diabetes mellitus, type 2 (C19.246.300) | 5000 | |||
| Diabetes mellitus, lipoatrophic (C19.246.300.500) | 85 | |||
| Donohue syndrome (C19.246.537) | 39 | |||
| Latent autoimmune diabetes in adults (C19.246.656) | 16 | |||
| Prediabetic state (C19.246.774) | 1261 | |||
aMeSH: Medical Subject Headings
Weighted average of active learning performance over all Medical Subject Headings codes.
| # training data | Random | Uncertainty sampling | Feedback Explorer | CNNa Zhang | |||||||||||||||
|
| Accb | Precc | Recd | F1e | Acc | Prec | Rec | F1 | Acc | Prec | Rec | F1 | Acc | Prec | Rec | F1 | |||
| 50 | 0.87 | 0.62 | 0.57 | 0.51 | 0.83 | 0.56 | 0.60 | 0.50 | 0.88 | 0.63 | 0.44 | 0.49 | 0.81 | 0.24 | 0.31 | 0.20 | |||
| 100 | 0.86 | 0.62 | 0.51 | 0.49 | 0.88 | 0.68 | 0.64 | 0.62 | 0.90 | 0.71 | 0.51 | 0.56 | 0.86 | 0.39 | 0.59 | 0.42 | |||
| 150 | 0.88 | 0.68 | 0.46 | 0.47 | 0.90 | 0.75 | 0.62 | 0.63 | 0.90 | 0.75 | 0.59 | 0.60 | 0.88 | 0.52 | 0.72 | 0.55 | |||
| 200 | 0.89 | 0.62 | 0.43 | 0.45 | 0.91 | 0.77 | 0.53 | 0.61 | 0.91 | 0.71 | 0.58 | 0.62 | 0.90 | 0.58 | 0.79 | 0.63 | |||
aCNN: convolutional neural network.
bAcc: accuracy.
cPrec: precision.
dRec: recall.
eF1: F1 score.
Figure 7Memory consumption and execution times per volume of documents.