| Literature DB >> 29739741 |
Ming Huang1, Omar ElTayeby2, Maryam Zolnoori1, Lixia Yao1.
Abstract
BACKGROUND: Society always has limited resources to expend on health care, or anything else. What are the unmet medical needs? How do we allocate limited resources to maximize the health and welfare of the people? These challenging questions might be re-examined systematically within an infodemiological frame on a much larger scale, leveraging the latest advancement in information technology and data science.Entities:
Keywords: Reuters; news; public policy; research priority; sentiment analysis; text mining; topic modeling; unmet medical need
Mesh:
Year: 2018 PMID: 29739741 PMCID: PMC5964307 DOI: 10.2196/10047
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 5.428
Figure 1Workflow for mining news media data. Concat: concatenate; dx: disease; PheWAS: phenome-wide association study.
Summary statistics of Reuters historical news data.
| Data set | Date | Article total, n | Articles mentioning diseases, n (%) | Mapped PheWASa codes |
| RCV1b | 8/20/1996 to 8/19/1997 | 806,791 | 3516 (0.44) | 342 |
| TRC2c | 1/1/2008 to 12/31/2008 | 1,546,350 | 8835 (0.57) | 311 |
| RC16d | 1/1/2016 to 12/31/2016 | 1,182,761 | 9633 (0.81) | 375 |
aPheWAS: phenome-wide association study.
bRCV1: Reuters Corpus Volume 1.
cTRC2: Thomson Reuters Text Research Collection.
dRC16: Reuters Corpus 2016.
Figure 2Coverage percentages and sentiments of the top 53 phenome-wide association study disease concepts. Blue, white, and red in a diverging color map denote the most negative (–1.0), right neutral (0.0), and the most positive (1.0) sentiments, respectively. The 53 phenome-wide association study disease concepts are put into 3 buckets based on coverage percentages for better resolution and comparison. From top to bottom, there are 10, 17, and 26 phenome-wide association study disease concepts in each budget with mean coverage range of 34% to 5%, 5% to 2%, and 2% to 1%, respectively. ADHD: attention-deficit/hyperactivity disorder; ASCVD: atherosclerotic cardiovascular disease; COPD: chronic obstructive pulmonary disease; dx: disease; mal neo: malignant neoplasm; NOS: not otherwise specified; other nerv sys d/o: other and unspecified disorders of the nervous system; unc behave: uncertain behaviour; unkn orig: unknown origin; unsp: unspecified; syn: syndrome.
Figure 3Top 30 topic keywords associated with the top 6 disease concepts in 3 decades. The rings represent the study periods of 1996/1997 (inner circle), 2008 (middle circle), and 2016 (outer circle). The size of the circles located on each ring denotes the permille of each term or phrase in a topic. dx: disease; mal neo: malignant neoplasm.
The permille of tobacco-related topic terms associated with other malignant neoplasm.
| Topic term | 1996/1997 | 2008 | 2016 |
| Tobacco | 1.5 | 0.0 | 0.0 |
| Smoking | 0.8 | 0.0 | 0.0 |
| Cigarette | 0.7 | 0.0 | 0.0 |