| Literature DB >> 35144241 |
John Caskey1, Iain L McConnell1, Madeline Oguss1, Dmitriy Dligach2, Rachel Kulikoff3, Brittany Grogan3, Crystal Gibson3, Elizabeth Wimmer4, Traci E DeSalvo4, Edwin E Nyakoe-Nyasani4, Matthew M Churpek1, Majid Afshar1.
Abstract
BACKGROUND: In Wisconsin, COVID-19 case interview forms contain free-text fields that need to be mined to identify potential outbreaks for targeted policy making. We developed an automated pipeline to ingest the free text into a pretrained neural language model to identify businesses and facilities as outbreaks.Entities:
Keywords: COVID-19; contact tracing; digital health; digital surveillance tool; disease surveillance; electronic surveillance; named entity recognition; natural language processing; neural language model; outbreaks; public health; public health informatics
Mesh:
Year: 2022 PMID: 35144241 PMCID: PMC8906835 DOI: 10.2196/36119
Source DB: PubMed Journal: JMIR Public Health Surveill ISSN: 2369-2960
Figure 1Process map for NER tool and location-mapping tool. ETL: extract, transform, and load; NER: named entity recognition.
Figure 2Location-mapping example for a cluster of COVID-19 clusters in Dane County as of October 2021. The grey dots show incident cases for a possible or known cluster/outbreak. The white dot shows the calculated centroid point of this cluster. The black dot shows the obfuscated centroid latitude/longitude point that is submitted to the Google Places API, which is shown by the larger gray circle. API: Application Program Interface.
Figure 3Framework for the extended location-mapping algorithm. If a named entity is not found within the initial search radius, additional search radii are created extending outward from the original search by creating a series of interlocking equilateral triangles, where each vertex of the triangle is a new API starting search point. The extended search stops when at least 1 match is found or the maximum distance is reached. API: Application Program Interface.
Example summary report for contact tracers for the county health departmenta.
| Named entityb | Type | Frequencyc | Predicted | Case IDse | Outbreak entityf | Addressg | Predicted |
| Sun Prairie | Place | 12 | 0.67 | 12345, 12346 | Sun Prairie | —i | 100.0 |
| Local retailer | Organization | 7 | 0.54 | 12347, 12349, 22221 | Retailer 001 | Keys and Things, 21 Science Dr., Madison, WI | 95.2 |
| Big-box store | Organization | 3 | 0.45 | 13347, 18349, 22221 | Boxstore 08 | Circuited City, 1561 Rocky Rd., Verona, WI | 87.1 |
| Fast-food place | Organization | 2 | 0.71 | 17247, 18149, 29121 | — | Burger Time 1234 State St., Madison, WI | 88.2 |
aThe example is based on fictitious data and not sourced from the original Wisconsin Electronic Disease Surveillance System (WEDDS) data due to privacy restrictions.
bNamed entity: result from the named entity recognition (NER) pipeline. Named entities only qualified as cluster outbreaks if they had >2 case IDs associated with them.
cFrequency: unique mentions of NER across available case IDs from the reporting period.
dPredicted probability 1: average predicted probability from the classifier for the type of named entity.
eCase IDs: unique case IDs for lookup by the contact tracer.
fOutbreak entity: known outbreak exposures.
gAddress: matched named entity using the longitude/latitude for the address from k-means clustering from Google Places Application Program Interface (API).
hPredicted probability 2: predicted probability from the location-mapping tool.
iNo result from the NER or location-mapping tool.
Characteristics of COVID-19 cases and noncases in Dane County, Wisconsin, between July 1, 2020, and June 30, 2021.
| Individual characteristics | Negative cases (N=323,424) | Probable/confirmed cases (N=46,902) | Total (N=370,326) | |
| Age (years), median (IQR) | 32 (20-51) | 30 (20-47) | 31 (20-51) | |
|
| ||||
|
| Male | 152,852 (47.26) | 23,506 (50.12) | 176,358 (47.62) |
|
| Female | 165,482 (51.17) | 23,314 (49.71) | 188,796 (50.98) |
|
| Unknown | 5090 (1.57) | 82 (0.17) | 5172 (1.40) |
|
| ||||
|
| Non-Hispanic White | 199,629 (61.72) | 30,423 (64.87) | 230,052 (62.12) |
|
| Non-Hispanic Black | 14,302 (4.42) | 3266 (6.96) | 17,568 (4.74) |
|
| Hispanic | 23,878 (7.38) | 6662 (14.20) | 30,540 (8.25) |
|
| Other | 85,615 (26.47) | 6551 (13.97) | 92,166 (24.89) |
|
| ||||
|
| Not recorded | 311,809 (96.41) | 37,083 (79.06) | 348,892 (94.21) |
|
| Nonuniversity student | 3099 (0.96) | 2391 (5.10) | 5490 (1.48) |
|
| University student | 1161 (0.36) | 903 (1.93) | 2064 (0.56) |
|
| Retired | 573 (0.18) | 468 (1.00) | 1041 (0.28) |
|
| Unemployed | 502 (0.16) | 429 (0.91) | 931 (0.25) |
|
| Other | 6280 (1.94) | 5628 (12.00) | 11,908 (3.22) |
|
| ||||
|
| Madison | 159,983 (49.47) | 23,949 (51.06) | 183,932 (49.67) |
|
| Sun Prairie | 22,667 (7.01) | 3722 (7.94) | 26,389 (7.13) |
|
| Fitchburg | 16,104 (4.98) | 2983 (6.36) | 19,087 (5.15) |
|
| Middleton | 15,991 (4.94) | 1838 (3.92) | 17,829 (4.81) |
|
| Verona | 15,224 (4.71) | 1745 (3.72) | 16,969 (4.58) |
|
| Other | 93,455 (28.90) | 12,665 (27.00) | 106,120 (28.66) |
aMultiple responses were possible.
Figure 4Trend over time of COVID-19 cases and noncases in Dane County, Wisconsin, between January 1, 2020, and October 31, 2021. Data are retrieved from the WEDSS. Line graphs of categories are aggregated by day and averaged in a moving 7-day window between January 2020 and October 2021. The CDC declared COVID 19 a global pandemic on March 11, 2020. The Public Health Madison and Dane County Initial mask mandate (Order 1) went into effect on May 13, 2020, and was updated and modified until June 2, 2021. Mandate 2 (Face-Covering Emergency Order) went into effect on August 19, 2021. The vertical dashed lines are demarcations for health policy changes. The gray-shaded area represents the validation period for our pipeline analysis. CDC: Centers for Disease Control and Prevention; WEDSS: Wisconsin Electronic Disease Surveillance System.
Results from the NERa tool by month for Dane County, Wisconsin, between July 1, 2020, and June 30, 2021.
| Month | Cases, N | Confirmed outbreaks, n (%) | Total outbreaks identified by Automated Public Outbreak Localization through Lexical Operations (APOLLO), n (%) | Precision (95% CI) | Recall (95% CI) | |
| July 2020 | 508 | 137 (27.0) | 251 (49.4) | 0.51 (0.40-0.62) | 0.28 (0.23-0.34) | 0.37 |
| August 2020 | 783 | 133 (17.0) | 350 (44.7) | 0.34 (0.25-0.44) | 0.23 (0.17-0.29) | 0.27 |
| September 2020 | 2693 | 256 (9.5) | 889 (33.0) | 0.51 (0.46-0.56) | 0.56 (0.51-0.59) | 0.53 |
| October 2020 | 4619 | 459 (9.9) | 1267 (27.4) | 0.64 (0.60-0.67) | 0.69 (0.67-0.71) | 0.66 |
| November 2020 | 7129 | 564 (7.9) | 1906 (26.7) | 0.62 (0.59-0.65) | 0.70 (0.68-0.72) | 0.66 |
| December 2020 | 3772 | 308 (8.2) | 1078 (28.6) | 0.58 (0.54-0.62) | 0.70 (0.67-0.73) | 0.64 |
| January 2021 | 3361 | 241 (7.2) | 1062 (31.6) | 0.53 (0.49-0.58) | 0.72 (0.69-0.75) | 0.61 |
| February 2021 | 2339 | 157 (6.7) | 899 (38.4) | 0.42 (0.36-0.47) | 0.56 (0.51-0.61) | 0.48 |
| March 2021 | 1513 | 134 (8.9) | 647 (42.8) | 0.37 (0.31-0.43) | 0.52 (0.46-0.57) | 0.43 |
| April 2021 | 1460 | 161 (11.0) | 639 (43.8) | 0.46 (0.39-0.53) | 0.50 (0.45-0.55) | 0.48 |
| May 2021 | 410 | 81 (19.8) | 233 (56.8) | 0.41 (0.28-0.52) | 0.33 (0.23-0.4) | 0.36 |
| June 2021 | 149 | 21 (14.1) | 88 (59.1) | 0.30 (0.12-0.52) | 0.29 (0.11-0.44) | 0.30 |
aNER: named entity recognition.