| Literature DB >> 28736777 |
Shaodian Zhang1, Tian Kang1, Lin Qiu2, Weinan Zhang2, Yong Yu2, Noémie Elhadad1.
Abstract
A large number of patients discuss treatments in online health communities (OHCs). One research question of interest to health researchers is whether treatments being discussed in OHCs are eventually used by community members in their real lives. In this paper, we rely on machine learning methods to automatically identify attributions of mentions of treatments from an online autism community. The context of our work is online autism communities, where parents exchange support for the care of their children with autism spectrum disorder. Our methods are able to distinguish discussions of treatments that are associated with patients, caregivers, and others, as well as identify whether a treatment is actually taken. We investigate treatments that are not just discussed but also used by patients according to two types of content analysis, cross-sectional and longitudinal. The treatments identified through our content analysis help create a catalogue of real-world treatments. This study results lay the foundation for future research to compare real-world drug usage with established clinical guidelines.Entities:
Keywords: Autism; Conditional Random Fields; Natural Language Processing; Online Health Community; Treatment
Year: 2017 PMID: 28736777 PMCID: PMC5516208 DOI: 10.1145/3038912.3052661
Source DB: PubMed Journal: Proc Int World Wide Web Conf
Descriptive statistics of the ASD dataset
| Number of sub-forums | 16 |
| Number of threads | 61,817 |
| Number of posts | 551,029 |
| Number of authors | 10,210 |
Attribution labels for treatment mentions, their descriptions and their counts in dataset.
| Attribution label | Description | Count |
|---|---|---|
| Mention of treatment which indicates an actual usage or usage history of the patients of interest, usually children of the users in this particular study. | 1830 | |
| pt- | Mention of treatment tied to the patient but does not indicate actual usage. | 434 |
| Mention of treatment tied to the caregiver of the patient, usually the user herself. | 95 | |
| Mention of treatment tied to specific individuals other than the caregiver or the patient. Can be other members in the community, or other people in the author’s real life. | 210 | |
| Mention not tied to a specific individual. | 1635 |
System performance for binary treatment mention detection with different types of features.
| Features | Precision | Recall | F |
|---|---|---|---|
| Baseline | 78.2 | 56.5 | 65.6 |
| lexical | 82.1 | 83.6 | 82.9 |
| lexical+semantic | 81.4 | 83.8 | 82.6 |
| lexical+semantic+syntactical | 81.0 | 83.5 | 82.2 |
System performance (F score) for joint treatment detection and attribution classification with different types of features. The baseline systems use dictionary matching for term identificaiton, and logistic regression with corresponding features for attribution classification. The baseline+ systems use CRF for term identificaiton, and logistic regression with corresponding features for attribution classification.
| Features | micro | cg | gen | others | pt | pt-gen |
|---|---|---|---|---|---|---|
|
| ||||||
| baseline (lexical) | 33.4 | 9.8 | 31.2 | 29.1 | 45.0 | 13.4 |
| baseline+ (lexical) | 52.7 | 17.6 | 54.4 | 34.8 | 60.8 | 15.6 |
| crf (lexical) | 55.4 | 18.2 | 56.0 | 37.0 | 61.6 | 19.2 |
|
| ||||||
| baseline (lexical+semantic) | 34.1 | 10.2 | 31.1 | 29.0 | 45.6 | 14.5 |
| baseline+ (lexical+semantic) | 52.9 | 17.5 | 55.1 | 35.4 | 60.9 | 17.0 |
| crf(lexical+semantic) | 56.1 | 18.1 | 57.4 | 36.8 | 61.7 | 20.9 |
|
| ||||||
| baseline (lexical+semantic+syntax) | 35.0 | 14.6 | 29.9 | 31.2 | 44.1 | 15.4 |
| baseline+ (lexical+semantic+syntax) | 60.3 | 17.9 | 62.8 | 45.5 | 62.9 | 30.0 |
| crf (lexical+semantic+syntax) | 62.3 | 18.2 | 64.1 | 51.6 | 66.8 | 34.1 |
System performance (CRF) for mentions with pt attribution with different types of features, when all other types of attributions are merged into one as non-pt.
| Features | Prec. (pt) | Rec. (pt) | F (pt) |
|---|---|---|---|
|
| |||
| lexical | 61.0 | 56.6 | 58.7 |
| lexical+semantic | 63.0 | 58.9 | 60.9 |
| lexical+semantic+syntax | 68.7 | 64.8 | 66.7 |
Top 10 treatment with number of mentions for the five attribution classes, identified in the entire data set.
| Term | Frequency | Term | Frequency |
|---|---|---|---|
|
| |||
| chelation | 4935 | chelation | 1259 |
| probiotics | 2498 | probiotics | 389 |
| zinc | 2011 | chelating | 210 |
| enzymes | 1705 | speech therapy | 99 |
| melatonin | 1425 | probiotic | 98 |
| special education | 1287 | activated charcoal | 77 |
| antibiotics | 1283 | nystatin | 75 |
| speech therapy | 1245 | melatonin | 73 |
| early intervention | 1061 | calcium | 70 |
| magnesium | 889 | early intervention | 66 |
|
| |||
|
| |||
| chelation | 16 | probiotics | 424 |
| progesterone | 7 | chelation | 408 |
| probiotics | 7 | probiotic | 163 |
| cod liver oil | 5 | chelating | 150 |
| chelator | 4 | melatonin | 121 |
| cab | 4 | enzymes | 117 |
| molybdenum glycinate chelate | 4 | zinc | 114 |
| sensory integration | 4 | risperdal | 80 |
| aloe vera | 3 | charcoal | 77 |
| pyridoxine hydrochloride | 3 | homeopathy | 76 |
|
| |||
|
| |||
| chelation | 8341 | ||
| vitamin | 1418 | ||
| early intervention | 1268 | ||
| probiotics | 1267 | ||
| special education | 1153 | ||
| chelator | 910 | ||
| vitamins | 886 | ||
| melatonin | 877 | ||
| homeopathy | 862 | ||
| thimerosal | 801 | ||
Figure 1Distributions of number of users, by number of used treatment. The x axis is the number of used treatment identified, and the y axis is the number of users.
Top 10 treatment by number of users, identified in the ASD data set.
| Term | Number of users |
|---|---|
|
| |
| probiotics | 819 |
| speech therapy | 565 |
| chelation | 520 |
| early intervention | 475 |
| special education | 395 |
| melatonin | 391 |
| antibiotics | 381 |
| enzymes | 352 |
| zinc | 332 |
| vitamins | 283 |
Figure 2Changes of frequencies (mention per post) of top five treatments in autism communities, since members joining the community. Two separate X-axes represent views in weeks (right) and in days (left), respectively. Variables (measure names) ending with “all” represent total frequencies of mentions of corresponding treatment, regardless of their attribution types. Variables ending with “pt” represent frequencies of mentions of attribution type pt.