| Literature DB >> 35727610 |
Steven S Doerstling1,2, Dennis Akrobetu1,2, Matthew M Engelhard3, Felicia Chen4, Peter A Ubel5,6.
Abstract
BACKGROUND: Web-based crowdfunding has become a popular method to raise money for medical expenses, and there is growing research interest in this topic. However, crowdfunding data are largely composed of unstructured text, thereby posing many challenges for researchers hoping to answer questions about specific medical conditions. Previous studies have used methods that either failed to address major challenges or were poorly scalable to large sample sizes. To enable further research on this emerging funding mechanism in health care, better methods are needed.Entities:
Keywords: GoFundMe; crowdfunding; health care costs; named entity recognition; natural language processing
Mesh:
Year: 2022 PMID: 35727610 PMCID: PMC9257615 DOI: 10.2196/32867
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 7.076
Figure 1A schematic diagram of the disease identification algorithm. This figure shows how this study’s algorithm determines which disease categories are present in a hypothetical example that is representative of web-based medical crowdfunding text. Medical conditions are identified in the text by using a named entity recognition model to identify diagnoses and keyword searches to identify treatments and procedures. Diagnoses identified by the named entity recognition model are assigned to best-matching ICD-10-CM codes by using an entity resolution model and grouped according to the disease category definitions outlined in the Methods section. Treatments and procedures were used to indicate the presence of corresponding disease categories (defined in Table 1). GU: genitourinary; ICD-10-CM: International Classification of Diseases, 10th Revision, Clinical Modification.
The keywords used to identify additional disease categories in campaign descriptions.
| Disease categorya | Keywords searched in campaign descriptionsb | Representative examples from campaign descriptions |
| Injuries and external causes | “[He] got into a serious accident in October. All four extremities were injured but the most severe were his legs.” | |
| Cardiovascular diseases | “His cardiologist has informed him that a heart transplant is [his] only hope for survival.” | |
| Neoplasms | “The chemotherapy did not stabilize the lymphoma so we were unable to move forward with the transplant.” | |
| Genitourinary diseases | “This disease resulted in my kidneys failing and having to start dialysis.” | |
| Gastrointestinal diseases |
| “...the cirrhosis is incurable without a complete liver transplant.” |
| Respiratory diseases |
| “Her desire to live life...will only be possible with the double lung transplant.” |
aEach disease category was indicated as present in a campaign if any of the corresponding terms were included in the campaign description.
bKeywords were selected during the exploratory reading of crowdfunding campaigns as indicators of a disease category that did not specify a diagnosis.
Figure 2The relative contributions of the NER model and word search to detecting disease categories. All campaigns for which the disease categories on the y-axis were detected by the disease identification algorithm are presented. The colored bars represent the percentage of those campaigns for which the disease categories were detected by the NER model only (blue), the NER model and word search (orange), or the word search only (green). NER: named entity recognition.
Figure 3The co-occurrence of disease categories identified by the NER model and word search. The heat map values represent the percentage of campaigns containing the disease category in each row (identified by the NER model) that also contain the disease category in each column (identified via word search). NER: named entity recognition.
Classification performance of the disease identification algorithma.
| Disease category | Campaigns in the reference set that mention disease category, n | Precision (95% CI) | Recall (95% CI) | Accuracy (95% CI) | |
| Cardiovascular diseases | 82 | 0.92 (0.86-0.99) | 0.74 (0.65-0.84) | 0.82 | 0.94 (0.91-0.96) |
| Endocrine diseases | 19 | 0.75 (0.54-0.96) | 0.63 (0.41-0.85) | 0.69 | 0.97 (0.96-0.99) |
| Gastrointestinal diseases | 18 | 0.56 (0.33-0.79) | 0.56 (0.33-0.79) | 0.56 | 0.96 (0.94-0.98) |
| Genitourinary diseases | 35 | 0.97 (0.90-1.03) | 0.8 (0.67-0.93) | 0.88 | 0.98 (0.97-0.99) |
| Infections | 30 | 0.56 (0.41-0.71) | 0.77 (0.62-0.92) | 0.65 | 0.94 (0.91-0.96) |
| Injuries and external causes | 53 | 0.69 (0.58-0.80) | 0.92 (0.85-1.00) | 0.79 | 0.94 (0.91-0.96) |
| Mental health disorders | 20 | 0.48 (0.30-0.66) | 0.7 (0.50-0.90) | 0.57 | 0.95 (0.93-0.97) |
| Musculoskeletal diseases | 45 | 0.64 (0.48-0.80) | 0.51 (0.37-0.66) | 0.57 | 0.91 (0.88-0.94) |
| Neoplasms | 162 | 0.95 (0.91-0.98) | 0.98 0.96-1.00) | 0.96 | 0.97 (0.95-0.99) |
| Nervous system diseases | 66 | 0.88 (0.76-0.99) | 0.42 (0.31-0.54) | 0.57 | 0.90 (0.86-0.93) |
| Respiratory diseases | 29 | 0.92 (0.81-1.03) | 0.76 (0.60-0.91) | 0.83 | 0.98 (0.96-0.99) |
aThe average precision, recall, F1 score, and accuracy values are 0.83, 0.77, 0.78, and 0.95, respectively. Classification performance is based on a comparison to 400 campaigns that were annotated by a team of expert coders. The averages are weighted by the number of campaigns in the reference set that mention each disease category.