| Literature DB >> 32047861 |
Adam Sadilek1, Yulin Hswen2,3, John S Brownstein3,4, Evgeniy Gabrilovich1, Shailesh Bavadekar1, Tomer Shekel1.
Abstract
Lyme disease is the most common tick-borne disease in the Northern Hemisphere. Existing estimates of Lyme disease spread are delayed a year or more. We introduce Lymelight-a new method for monitoring the incidence of Lyme disease in real-time. We use a machine-learned classifier of web search sessions to estimate the number of individuals who search for possible Lyme disease symptoms in a given geographical area for two years, 2014 and 2015. We evaluate Lymelight using the official case count data from CDC and find a 92% correlation (p < 0.001) at county level. Importantly, using web search data allows us not only to assess the incidence of the disease, but also to examine the appropriateness of treatments subsequently searched for by the users. Public health implications of our work include monitoring the spread of vector-borne diseases in a timely and scalable manner, complementing existing approaches through real-time detection, which can enable more timely interventions. Our analysis of treatment searches may also help reduce misdiagnosis of the disease.Entities:
Keywords: Computational science; Epidemiology; Infectious diseases
Year: 2020 PMID: 32047861 PMCID: PMC7000681 DOI: 10.1038/s41746-020-0222-x
Source DB: PubMed Journal: NPJ Digit Med ISSN: 2398-6352
Searches for drugs associated with Lyme disease sessions.
| Drug searches | Lymelight-positive cases (%) | Lymelight-negative cases (%) | Chi-square | |
|---|---|---|---|---|
| #1 Doxycycline* | 26.29 | 0.51 | 2,663,557 | <0.001 |
| #2 Amoxicillin* | 5.71 | 0.97 | 65,301 | <0.001 |
| #3 Penicillin* | 2.56 | 0.53 | 23,698 | <0.001 |
| #4 Metronidazole+ | 2.24 | 0.58 | 16,373 | <0.001 |
| #5 Ceftriaxone* | 2.20 | 0.14 | 70,896 | <0.001 |
| #6 Ivermectin+ | 1.94 | 0.18 | 41,860 | <0.001 |
| #7 Prednisone# | 1.93 | 1.05 | 6057 | <0.001 |
| #8 Cefuroxime* | 1.65 | 0.06 | 83,976 | <0.001 |
| #9 Trimethoprim/sulfamethoxazole+ | 1.56 | 0.58 | 7705 | <0.001 |
| #10 Rifampicin+ | 1.51 | 0.04 | 116,892 | <0.001 |
| #11 Clindamycin+ | 1.21 | 0.41 | 6571 | <0.001 |
| #12 Ciprofloxacin+ | 1.16 | 0.56 | 4227 | <0.001 |
| #13 Hydroxychloroquine# | 1.06 | 0.12 | 18,039 | <0.001 |
| #14 Permethrin+ | 1.05 | 0.15 | 14,008 | <0.001 |
| #15 Clarithromycin* | 0.97 | 0.07 | 27,183 | <0.001 |
| #16 Tinidazole+ | 0.95 | 0.02 | 87,125 | <0.001 |
| #17 Cefalexin+ | 0.94 | 0.41 | 3828 | <0.001 |
| #18 Amoxicillin/clavulanic acid* | 0.85 | 0.27 | 4917 | <0.001 |
| #19 Fluconazole+ | 0.85 | 0.30 | 4309 | <0.001 |
| #20 Hash Oil+ | 0.83 | 0.21 | 3991 | <0.001 |
Searches for drugs associated with Lyme disease sessions. Percentage figures show the probability of searching for the drug. The “*” symbol denotes recommended treatment for Lyme Disease (per Clinical Practice Guidelines), the “+” symbol denotes non-recommended treatment for Lyme Disease, and the “#” symbol denotes recommended treatment for arthritis.
Fig. 2Precision-recall plot for the Lymelight query classification model.
Top 50 classifier features, ranked by information gain.
| Feature | Information gain (in bits of information) |
|---|---|
| Lyme | 1.10E−03 |
| Lyme disease | 1.08E−03 |
| Tick | 6.90E−04 |
| Ticks | 6.60E−04 |
| Of lyme | 6.40E−04 |
| Disease | 6.20E−04 |
| [Lyme disease] (KG concept) | 5.50E−04 |
| A tick | 5.10E−04 |
| [Tick] (KG concept) | 4.70E−04 |
| Parasites | 4.50E−04 |
| Tick borne | 4.40E−04 |
| Tick bite | 4.30E−04 |
| Tick bites | 3.80E−04 |
| [Pathogenic bacteria] (KG concept) | 3.80E-04 |
| Borrelia | 3.70E-04 |
| For lyme | 3.50E-04 |
| Conditions lyme | 3.50E-04 |
| Diseases | 3.40E-04 |
| Bite | 3.30E-04 |
| Borne | 3.30E-04 |
| Burgdorferi | 3.30E-04 |
| cdc | 3.30E-04 |
| [Disease vectors] (KG concept) | 3.20E-04 |
| [Disease] (KG concept) | 3.20E−04 |
| Borrelia burgdorferi | 3.20E−04 |
| Disease cdc | 3.20E−04 |
| Disease is | 3.10E−04 |
| [Infectious diseases] (KG concept) | 3.10E−04 |
| Ticks are | 3.10E−04 |
| Ticks and | 2.80E−04 |
| The tick | 2.80E−04 |
| [Disease or medical conditions] (KG concept) | 2.80E−04 |
| Symptoms | 2.70E−04 |
| Blacklegged | 2.60E−04 |
| Of ticks | 2.50E−04 |
| Disease symptoms | 2.50E−04 |
| The bite | 2.40E−04 |
| Of tick | 2.40E−04 |
| Disease lyme | 2.40E−04 |
| Lyme disease | 2.40E−04 |
| Health | 2.30E−04 |
| Infection | 2.20E−04 |
| Bites | 2.20E−04 |
| Treatment | 2.20E−04 |
| Infected | 2.20E−04 |
| Rash | 2.20E−04 |
| Transmitted | 2.20E−04 |
| About lyme | 2.20E−04 |
| With lyme | 2.10E−04 |
| Deer ticks | 2.00E−04 |
Top 50 features, ranked by information gain. KG concepts are those found in the Google Knowledge Graph.
Fig. 1Task definition for obtaining human judgements on queries.
The same template was used to solicit labels from non-medical professionals as well as from medical doctors.
Fig. 3Rank coverage by LL2015 counties.
The plot shows ranks at which LL2015 counties appear in the list of all counties, which is sorted by decreasing order of incidence rates according to CDC. We observe near-uniform coverage between ranks 240 and 1055.
Ranking of counties according to CDC and according to Lymelight.
| Rank | CDC | Lymelight |
|---|---|---|
| 1 | New Haven County, Connecticut | Fairfield County, Connecticut |
| 2 | Montgomery County, Pennsylvania | New Haven County, Connecticut |
| 3 | Chester County, Pennsylvania | Chester County, Pennsylvania |
| 4 | Fairfield County, Connecticut | Suffolk County, New York |
| 5 | Middlesex County, Massachusetts | Middlesex County, Massachusetts |
| 6 | Essex County, Massachusetts | Allegheny County, Pennsylvania |
| 7 | Hartford County, Connecticut | Essex County, Massachusetts |
| 8 | Montgomery County, Maryland | Westchester County, New York |
| 9 | New York County, New York | Hartford County, Connecticut |
| 10 | Suffolk County, New York | Montgomery County, Pennsylvania |
| 11 | Hennepin County, Minnesota | Suffolk County, Massachusetts |
| 12 | Fairfax County, Virginia | Fairfax County, Virginia |
| 13 | Westchester County, New York | Hennepin County, Minnesota |
| 14 | Allegheny County, Pennsylvania | Montgomery County, Maryland |
| 15 | Suffolk County, Massachusetts | New York County, New York |
| 16 | Kings County, New York | Philadelphia County, Pennsylvania |
| 17 | Philadelphia County, Pennsylvania | Nassau County, New York |
| 18 | Queens County, New York | Wake County, North Carolina |
| 19 | Nassau County, New York | Kings County, New York |
| 20 | DuPage County, Illinois | DuPage County, Illinois |
| 21 | Wake County, North Carolina | Oakland County, Michigan |
| 22 | Cook County, Illinois | Queens County, New York |
| 23 | Orange County, Florida | Cook County, Illinois |
| 24 | Santa Clara County, California | Santa Clara County, California |
| 25 | Broward County, Florida | King County, Washington |
| 26 | Oakland County, Michigan | San Diego County, California |
| 27 | Miami-Dade County, Florida | Orange County, Florida |
| 28 | Travis County, Texas | Travis County, Texas |
| 29 | King County, Washington | Miami-Dade County, Florida |
| 30 | San Diego County, California | Los Angeles County, California |
| 31 | Harris County, Texas | Tarrant County, Texas |
| 32 | Tarrant County, Texas | Broward County, Florida |
| 33 | Los Angeles County, California | Harris County, Texas |
Ordering of LL2015 counties according to CDC and according to Lymelight, in decreasing order of incidence rate computed by each source.