| Literature DB >> 29038097 |
Yuqun Zeng1,2,3, Xusheng Liu1, Yanshan Wang2, Feichen Shen2, Sijia Liu2,4, Majid Rastegar-Mojarad2, Liwei Wang2, Hongfang Liu2.
Abstract
BACKGROUND: Self-management is crucial to diabetes care and providing expert-vetted content for answering patients' questions is crucial in facilitating patient self-management.Entities:
Keywords: education materials; information retrieval; patients; questions; recommendation
Mesh:
Year: 2017 PMID: 29038097 PMCID: PMC5662791 DOI: 10.2196/jmir.7754
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 5.428
Figure 1The workflow of this study.
Figure 2The workflow of the semantic group-based model. CHEM: chemicals and drugs; CONC: concepts and ideas; CUI: concept unique identifier; DISO: disease; QDP: questions from diabetic patients; PEM: patient educational materials.
An overview of the two corpora.
| Corpus | Number | Total word count | Word count, mean (SD) | Unique word count | Unique UMLS |
| Questions from diabetic | 7510 | 829,893 (91.18%) | 110 (36) | 41,820 | 19,616 |
| Patient educational materials | 144 | 139,463 (93.31%) | 968 (115) | 8952 | 7924 |
a Mapping rate was presented the probability of words mapped to the UMLS from the total word count. Difference in mapping rate between the two corpa was statistically significant (P<.001).
Figure 3The Venn diagram of the words in the two corpora. There were 35,112 (83.96%) unique words in the questions from diabetic patients (QDP) corpus and 2244 (25.06%) unique words in the patient educational materials (PEM) corpus.
The top 20 words in the two corpora.
| Rank | Questions from diabetic patients | Patient educational materials | ||
| Word | Frequency | Word | Frequency | |
| 1 | Diabetes | 9062 | Blood | 3081 |
| 2 | Insulin | 5369 | Insulin | 2504 |
| 3 | Type | 4657 | Glucose | 2074 |
| 4 | Like | 4620 | Diabetes | 1385 |
| 5 | Get | 4457 | Child | 1166 |
| 6 | Time | 4107 | Meal | 1047 |
| 7 | Know | 3875 | Childs | 815 |
| 8 | Pump | 3428 | Care | 801 |
| 9 | Now | 3421 | Health | 797 |
| 10 | Blood | 3388 | Dose | 782 |
| 11 | Day | 3317 | Test | 738 |
| 12 | People | 2789 | Sugar | 728 |
| 13 | First | 2395 | Help | 671 |
| 14 | Sugar | 2383 | Provider | 638 |
| 15 | Go | 2309 | Day | 635 |
| 16 | Back | 2290 | High | 627 |
| 17 | See | 2219 | Evening | 583 |
| 18 | Think | 2148 | Take | 583 |
| 19 | High | 2088 | Time | 571 |
| 20 | Use | 2036 | Eat | 547 |
Category and topic distribution of the two corpora.
| Category/topica | n (%) | |
| Type 2 | 454 (6.0) | |
| Type 1 and LADA | 1609 (21.4) | |
| TuDiabetes website | 97 (1.3) | |
| Treatment | 507 (6.8) | |
| Mental and emotional wellness | 92 (1.2) | |
| Healthy living | 187 (2.5) | |
| Food | 321 (4.3) | |
| Diabetes technology | 1903 (25.4) | |
| Diabetes complications and other conditions | 211 (2.8) | |
| Diabetes and pregnancy | 117 (1.6) | |
| Diabetes advocacy | 253 (3.4) | |
| Community | 1759 (23.4) | |
| PEM1 | 6 (4.2) | |
| PEM2 | 5 (3.5) | |
| PEM3 | 13 (9.0) | |
| PEM4 | 6 (4.2) | |
| PEM5 | 15 (10.4) | |
| PEM6 | 10 (6.9) | |
| PEM7 | 3 (2.1) | |
| PEM8 | 11 (7.6) | |
| PEM9 | 5 (3.5) | |
| PEM10 | 7 (4.9) | |
| PEM11 | 9 (6.3) | |
| PEM12 | 9 (6.3) | |
| PEM13 | 6 (4.2) | |
| PEM14 | 8 (5.6) | |
| PEM15 | 3(2.1) | |
| PEM16 | 3(2.1) | |
| PEM17 | 6(4.2) | |
| PEM18 | 7(4.9) | |
| PEM19 | 5(3.5) | |
| PEM20 | 7(4.9) | |
a The categories of the questions from diabetic patients corpus were labeled as the website provided, and the topics of the patient educational material (PEM) corpus were generated using LDA topic modeling. The topic proportion was calculated with the maximum distribution of document.
Sample topics in the patient educational materials (PEM) corpus.
| PEM group | Top 20 most prominent words (corresponding weight) | Topic |
| PEM2 | Disease (0.071), kidney (0.043), risk (0.037), heart (0.031), health (0.023), pressure (0.021), care (0.018), provider (0.017), factors (0.017), people (0.017), kidneys (0.015), cholesterol (0.012), high (0.011), lifestyle (0.010), levels (0.010), protein (0.010), control (0.009), body (0.008), urine (0.008), medications (0.008) | Complication-kidney |
| PEM8 | Food (0.039), fruit (0.024), cup (0.022), foods (0.022), eat (0.020), sugar (0.020), fat (0.019), carbohydrate (0.017), meal (0.016), plan (0.015), milk (0.015), protein (0.014), carbohydrates(0.013), snack (0.013), vegetables (0.013), grams(0.011), meals (0.011), make (0.011), calories (0.010), serving (0.010) | Food |
| PEM13 | Care (0.024), feet (0.023), problems (0.022), provider (0.020), pain (0.020), health (0.017), term (0.017), symptoms (0.015), peripheral (0.015), website (0.014)nerves (0.013), legs (0.012), system (0.012), neuropathy (0.012), stroke (0.012), walking (0.011), figure (0.011), shoes (0.011), infections (0.009), brain (0.009) | Complication-foot |
Figure 4Heat map of questions from diabetic patients categories and patient educational materials topics based on cosine similarity of word vectors weighted using TF-IDF or topic word distribution. The clustering is based on Euclidean distance.
Figure 5Distribution of 10 clinical semantic groups in the two corpora: questions from diabetic patients (QDP) and patient educational materials (PEM). ANAT: anatomy; CHEM: chemicals and drugs; DEVI: devices; DISO: disorders; GENE: genes and molecular sequences; LIVB: living beings; OBJC: objects; PHEN: phenomena; PHYS: physiology; PROC: procedures.
Figure 6(A) Network formed using the topic modeling-based model (TMB) with topic frequency cutoff 1, (B) network formed based on the semantic group-based model (SGB) with semantic group frequency cutoff 1, and (C) a combined network by linking the two networks (TMB+SGB) based on questions.
Figure 7Precision at rank 1 to 20 for topic modeling-based (TMB), semantic group-based (SGB), and vector space model (VSM) models.
Performance comparison of topic modeling-based, semantic group-based, and vector space model (VSM) models.
| Model | Mean precision | ||||||
| P 1 | P 2 | P 3 | P 4 | P 5 | P 10 | P 20 | |
| Topic modeling-based | 0.670 | 0.622 | 0.596 | 0.588 | 0.596 | 0.579 | 0.572 |
| Semantic group-based | 0.628 | 0.606 | 0.585 | 0.582 | 0.581 | 0.564 | 0.547 |
| VSM | 0.543 | 0.532 | 0.532 | 0.529 | 0.528 | 0.528 | 0.531 |