| Literature DB >> 23457530 |
Yingjie Lu1, Pengzhu Zhang, Jingfang Liu, Jia Li, Shasha Deng.
Abstract
Recently, health-related social media services, especially online health communities, have rapidly emerged. Patients with various health conditions participate in online health communities to share their experiences and exchange healthcare knowledge. Exploring hot topics in online health communities helps us better understand patients' needs and interest in health-related knowledge. However, the statistical topic analysis employed in previous studies is becoming impractical for processing the rapidly increasing amount of online data. Automatic topic detection based on document clustering is an alternative approach for extracting health-related hot topics in online communities. In addition to the keyword-based features used in traditional text clustering, we integrate medical domain-specific features to represent the messages posted in online health communities. Three disease discussion boards, including boards devoted to lung cancer, breast cancer and diabetes, from an online health community are used to test the effectiveness of topic detection. Experiment results demonstrate that health-related hot topics primarily include symptoms, examinations, drugs, procedures and complications. Further analysis reveals that there also exist some significant differences among the hot topics discussed on different types of disease discussion boards.Entities:
Mesh:
Year: 2013 PMID: 23457530 PMCID: PMC3574139 DOI: 10.1371/journal.pone.0056221
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Data collection statistics.
| Disease type | Messages | Members | Messages per member | Time span |
| Lung cancer | 4,728 | 1,928 | 2.45 | March 2004–March 2012 |
| Breast cancer | 65,856 | 16,100 | 4.09 | March 2004–March 2012 |
| Diabetes | 25,509 | 8,169 | 3.12 | March 2004–March 2012 |
Figure 1The design of the topic analysis method.
The UMLS semantic types used.
| Abbr. | Semantic Types | Abbr. | Semantic Types |
| aapp | Amino Acid, Peptide, or Protein | lbpr | Laboratory Procedure |
| acab | Acquired Abnormality | imft | Immunologic Factor |
| anab | Anatomical Abnormality | inpo | Injury or Poisoning |
| bdsy | Body System | mobd | Mental or Behavioral Dysfunction |
| blor | Body Location or Region | neop | Neoplastic Process |
| bmod | Biomedical Occupation or Discipline | orch | Organic Chemical |
| bpoc | Body Part, Organ, or Organ Component | patf | Pathologic Function |
| diap | Diagnostic Procedure | phsu | Pharmacologic Substance |
| dsyn | Disease or Syndrome | sosy | Sign or Symptom |
| horm | Hormone | topp | Therapeutic or Preventive Procedure |
Features adopted in the method.
| Feature category | Features |
| Keyword-based | Frequency of unigram words |
| Frequency of bigram words | |
| Frequency of trigram words | |
| Medical domain-specific | Frequency of medical terms |
| Frequency of the semantic types |
Figure 2Log-likelihood versus iteration number for the clustering of lung cancer, diabetes and breast cancer.
Key phrases extracted from lung cancer discussion boards.
| Cluster | Label | Key phrases | UMLS semantic types |
| 1 | Symptom | pain, symptoms, cough, breathless, chest pain, painful, shortness of breath,coughing up blood, short of breath, wheezing, nausea | sosy |
| 2 | Complication | pneumonia, infection, tuberculosis, bronchitis, asthma, COPD, pleural effusion,emphysema, atelectasis, collapsed lung | dsyn, patf |
| 3 | Examination | cat scan, biopsy, X-ray, pet scan, chest X-ray, scans, MRI, bronchoscopy,imaging, biopsy needle | diap |
| 4 | Procedure | chemo, radiation, chemotherapy, lobectomy, operation, therapy,surgery, removal, radiation therapy, wedge resection | topp |
| 5 | Drug | silicas, tarceva, morphine, chantix, carboplatin, coumadin, alimta, advil, taxol, dilaudid | phsu |
Key phrases extracted from breast cancer discussion boards.
| Cluster | Label | Key phrases | UMLS semantic types |
| 1 | Examination | biopsy, mammogram, ultrasound, MRI, BI-RADS, biopsy needle, core biopsy,cat scan, imaging, screening | diap, lbpr |
| 2 | Procedure | chemo, radiation, mastectomy, lumpectomy, chemotherapy,implant, removal, operation, radiotherapy, surgical | topp |
| 3 | Symptom | pain, painful, sore, nipple discharge, breast pain, itching, itchy,tingling, hot flashes, nausea | sosy |
| 4 | Drug | tamoxifen, arimidex, femara, taxol, taxotere, effexor, carboplatin,raloxifene, valium, docetaxel | phsu |
| 5 | Complication | infection, lymph edema, rash, fibrocystic breast, mastitis, IDC, eczema, complex cyst,complex cysts, paget’s disease, neuropathy, fibrocystic disease, fibrocystic breastdisease | dsyn |
Key phrases extracted from diabetes discussion boards.
| Cluster | Label | Key phrases | UMLS semantic types |
| 1 | Drug | insulin, lantus, metformin, januvia, glucophage, actos, marihuana, avandia, glipizide, amaryl | phsu |
| 2 | Complication | hypoglycaemia, low blood sugar, infection, DKA, PCOS, BGs, coma, kidney disease, obesity, diabetic neuropathy | dsyn, patf |
| 3 | Symptom | pain, tired, thirsty, nausea, fatigue, tingling, frequent urination, hungry, sore, dizzy, itchy | sosy |
| 4 | Examination | blood test, fasting test, glucose test, fasting blood sugar, hemoglobin A1c test, glucose tolerance test, cat scan, GTTS, MRI | lbpr, diap |
| 5 | Procedure | infusion, injection, transplant, therapy, dialysis, CDE, RX, amputation, insulin injection, ect | topp |
Performance measures using different feature sets.
| Disease | Feature Set | Rand | Jaccard | FM |
| Lung Cancer | F1 | 0.762 | 0.284 | 0.460 |
| F1+F2 |
|
|
| |
| Breast Cancer | F1 | 0.741 | 0.220 | 0.361 |
| F1+F2 |
|
|
| |
| Diabetes | F1 | 0.752 | 0.246 | 0.395 |
| F1+F2 |
|
|
|
(Note: F1 are keyword-based features and F2 are medical domain-specific features).
Figure 3The distribution of hot topics for the discussion boards of lung cancer, breast cancer and diabetes.