| Literature DB >> 35568692 |
Lu Wei1, Xiaojing Li1, Zhongbo Jing1, Zhidong Liu1.
Abstract
With the recurrence of infectious diseases caused by coronaviruses, which pose a significant threat to human health, there is an unprecedented urgency to devise an effective method to identify and assess who is most at risk of contracting these diseases. China has successfully controlled the spread of COVID-19 through the disclosure of track data belonging to diagnosed patients. This paper proposes a novel textual track-data-based approach for individual infection risk measurement. The proposed approach is divided into three steps. First, track features are extracted from track data to build a general portrait of COVID-19 patients. Then, based on the extracted track features, we construct an infection risk indicator system to calculate the infection risk index (IRI). Finally, individuals are divided into different infection risk categories based on the IRI values. By doing so, the proposed approach can determine the risk of an individual contracting COVID-19, which facilitates the identification of high-risk populations. Thus, the proposed approach can be used for risk prevention and control of COVID-19. In the empirical analysis, we comprehensively collected 9455 pieces of track data from 20 January 2020 to 30 July 2020, covering 32 provinces/provincial municipalities in China. The empirical results show that the Chinese COVID-19 patients have six key features that indicate infection risk: place, region, close-contact person, contact manner, travel mode, and symptom. The IRI values for all 9455 patients vary from 0 to 43.19. Individuals are classified into the following five infection risk categories: low, moderate-low, moderate, moderate-high, and high risk.Entities:
Keywords: COVID-19; infection risk; risk measurement; text mining; track data
Year: 2022 PMID: 35568692 PMCID: PMC9348336 DOI: 10.1111/risa.13944
Source DB: PubMed Journal: Risk Anal ISSN: 0272-4332 Impact factor: 4.302
FIGURE 1The specific steps of using the proposed approach to assess the personal infection risk
FIGURE 2Value‐at‐risk (VaR) of loss distribution at the α confidence level
FIGURE 3The number of daily cumulative diagnosed patients in China during 2020
The number of diagnosed patients’ track data in each provincial‐level administrative region
|
|
|
|
|
|---|---|---|---|
| Heilongjiang | 1641 | Jilin | 152 |
| Henan | 1208 | Shanxi | 124 |
| Zhejiang | 1147 | Guizhou | 105 |
| Chongqing | 622 | Guangxi Zhuang Autonomous Region | 96 |
| Guangdong | 536 | Gansu | 87 |
| Hunan | 515 | Shaanxi | 62 |
| Shandong | 503 | Yunnan | 57 |
| Beijing | 396 | Inner Mongolia Autonomous Region | 34 |
| Hainan | 353 | Ningxia Hui Autonomous Region | 34 |
| Sichuan | 345 | Jiangxi | 17 |
| Hebei | 335 | Qinghai | 17 |
| Fujian | 271 | Shanghai | 9 |
| Anhui | 232 | Xinjiang Uygur Autonomous Region | 5 |
| Liaoning | 201 | Macao Special Administrative Region | 3 |
| Tianjin | 188 | Hong Kong Special Administrative Region | 3 |
| Jiangsu | 155 | Taiwan | 2 |
FIGURE 4Age distribution of COVID‐19 diagnosed patients
FIGURE 5The proportions of high‐frequency words classified into six common features
FIGURE 6Word clouds of the six common features identified from track data of COVID‐19 patients
FIGURE 7The general portrait of COVID‐19 diagnosed patients in China
Differences in high‐frequency places by region
|
|
|
|---|---|
| Public transportation | Xinjiang (60.00%), Macao (45.45%), Shaanxi (40.66%), Guangdong (36.83%), Yunnan (34.81%) |
| Living | Shanghai (100.00%), Henan (97.44%), Jiangsu (96.62%), Beijing (96.54%), Hunan (96.27%), Chongqing (96.23%), Anhui (94.64%), Guangxi (94.47%), Hebei (94.34%), Jiangxi (92.41%), Sichuan (90.00%) |
| Entertainment | Xinjiang (40%), Ningxia (24.18%), Gansu (18.26%), Hainan (17.83%), Yunnan (17.04%), Heilongjiang (14.12%), Qinghai (10.53%) |
Differences in high‐frequency travel modes by region
|
|
|
|---|---|
| Short‐distance transportation (taxi, bus, subway, etc.) | Beijing (100.00%), Shanghai (100.00%), Liaoning (100.00%), Chongqing (82.28%), Xinjiang (71.43%), Ningxia (65.00%), Gansu (63.45%), Shaanxi (57.02%), Shanxi (55.28%), Inner Mongolia (52.17%) |
| Long‐distance transportation (plane, train, ferry, etc.) | Macao (100.00%), Jiangsu (100.00%), Qinghai (100.00%), Zhejiang (100.00%), Tianjin (93.62%), Fujian (92.95%), Shandong (82.55%), Henan (83.26%), Guangdong (80.38%), Jilin (80.34%), Guizhou (80.00%), Guangxi (78.85%), Sichuan (77.85%), Hainan (77.21%), Yunnan (75.36%), Hunan (75.00%), Hebei (65.88%), Anhui (63.64%), Heilongjiang (63.01%) |
Differences in high‐frequency contact manners by region
|
|
|
|---|---|
| Gathering entertainment | Gansu (73.08%), Shanghai (72.73%), Inner Mongolia (67.50%), Chongqing (60.90%) |
| Travelling | Yunnan (15.22%), Guangxi (9.59%), Zhejiang (7.74%), Guangdong (7.38%) |
| Going to work and meetings | Beijing (31.53%), Shanghai (27.27%), Shaanxi (14.29%), Tianjin (10.94%), Jiangsu (10.77%), Liaoning (10.47%), Sichuan (10.05%) |
| Visiting relatives | Qinghai (50.00%), Hebei (29.76%), Ningxia (27.45%), Guangxi (19.18%), Guangdong (17.51%), Henan (14.10%) |
Differences in high‐frequency contact manners by region
|
|
|
|---|---|
| Anhui, Heilongjiang, Guizhou, Chongqing, Liaoning, Shanxi, Fujian, Shanghai, Hebei, Tianjin, Guangdong | Relatives and family members: 90.99, 95.39, 95.06, 94.65, 93.99, 93.98, 93.85, 93.55, 93.53, 93.44, and 93.10%, respectively. |
| Beijing | Relatives and family members (31.77%), operators and suppliers (64.71%) |
| Yunnan | Relatives and family members (43.24%), tour guides (21.62%), passengers (13.51%), company staffs (10.81%) |
| Zhejiang | Relatives and family members (50.00%), tourists (30%), overseas students (10%) |
Differences in high‐frequency regions between different provinces
|
|
|
|---|---|
| Macao | Philippines (50%) |
| Fujian | Philippines (6.96%) |
| Guangdong | London, UK (3.75%) |
| Heilongjiang | Vladivostok, Russia (20.48%), Moscow, Russia (14.67%), Russia (8.01%) |
| Liaoning | Tokyo, Japan (4.21%) |
| Tianjin | Paris, France (3.67%), Russia (3.21%) |
| Zhejiang | Moscow, Russia (8.51%) |
The word frequency and proportion of the top 10 high‐frequency words for each feature category
|
|
|
|
|
|
|---|---|---|---|---|
| Place | 28,076 | Hospital | 7807 | 27.81% |
| Residential district | 1426 | 5.08% | ||
| Centre for disease control | 1277 | 4.55% | ||
| Supermarket | 1250 | 4.45% | ||
| COVID‐19 designated hospital | 1230 | 4.38% | ||
| Port | 1085 | 3.86% | ||
| Airport | 897 | 3.19% | ||
| Hotel | 845 | 3.01% | ||
| Clinic | 512 | 1.82% | ||
| Market | 466 | 1.66% | ||
| Top 10 accumulative total | 16,795 | 59.82% | ||
| Region | 15,074 | Wuhan | 1814 | 12.03% |
| Suifenhe | 1732 | 11.49% | ||
| Vladivostok | 1010 | 6.70% | ||
| Moscow | 897 | 5.95% | ||
| Russia | 479 | 3.18% | ||
| Beijing | 373 | 2.47% | ||
| Hubei | 337 | 2.24% | ||
| Haikou | 327 | 2.17% | ||
| Chongqing | 307 | 2.04% | ||
| Wanzhou district (Chongqing city) | 267 | 1.77% | ||
| Top 10 accumulative total | 7543 | 50.04% | ||
| Close‐contact person | 9893 | Relatives | 732 | 7.40% |
| Husband | 714 | 7.22% | ||
| Family member | 583 | 5.89% | ||
| Wife | 510 | 5.16% | ||
| Parents | 491 | 4.96% | ||
| Son | 471 | 4.76% | ||
| Household | 466 | 4.71% | ||
| Infected person | 459 | 4.64% | ||
| Mother | 446 | 4.51% | ||
| Father | 388 | 3.92% | ||
| Top 10 accumulative total | 5260 | 53.17% | ||
| Contact manner | 17,638 | Gathering | 1997 | 11.32% |
| Close contact | 1852 | 10.50% | ||
| Shopping | 1277 | 7.24% | ||
| Dining together | 1038 | 5.89% | ||
| In contact with others | 615 | 3.49% | ||
| Riding together | 544 | 3.08% | ||
| Going to work | 518 | 2.94% | ||
| Dining out | 422 | 2.39% | ||
| Having meals | 367 | 2.08% | ||
| Grocery shopping | 301 | 1.71% | ||
| Top 10 accumulative total | 8931 | 50.63% | ||
| Travel mode | 21,013 | Self‐driving | 2634 | 12.54% |
| Plane | 2258 | 10.75% | ||
| Walking | 1703 | 8.10% | ||
| Private vehicle | 1462 | 6.96% | ||
| Taxi | 1056 | 5.03% | ||
| Bus | 773 | 3.68% | ||
| Coach | 687 | 3.27% | ||
| Train | 368 | 1.75% | ||
| High‐speed rail | 365 | 1.74% | ||
| Online car‐hailing | 296 | 1.41% | ||
| Top 10 accumulative total | 11,602 | 55.21% | ||
| Symptom | 13,172 | Fever | 3717 | 28.22% |
| Cough | 1002 | 7.61% | ||
| Abnormal temperature | 589 | 4.47% | ||
| Indisposition | 559 | 4.24% | ||
| Asymptomatic | 480 | 3.64% | ||
| Fatigue | 383 | 2.91% | ||
| Medication | 256 | 1.94% | ||
| Expectoration | 190 | 1.44% | ||
| Dry cough | 175 | 1.33% | ||
| Sore throat | 156 | 1.18% | ||
| Top 10 accumulative total | 7507 | 56.99% | ||
| Total | 104,866 | Top 60 accumulative total | 57,638 | 54.96% |
COVID‐19 infection risk indicator system based on textual track data
|
|
|
|
|
|---|---|---|---|
| Place | Hospital | 7807 | 0.1357 |
| Residential district | 1426 | 0.0248 | |
| Centre for disease control | 1277 | 0.0222 | |
| Supermarket | 1250 | 0.0217 | |
| COVID‐19 designated hospital | 1230 | 0.0214 | |
| Port | 1085 | 0.0189 | |
| Airport | 897 | 0.0156 | |
| Hotel | 845 | 0.0147 | |
| Clinic | 512 | 0.0089 | |
| Market | 466 | 0.0081 | |
| Travel mode | Self‐driving | 2634 | 0.0458 |
| Plane | 2258 | 0.0392 | |
| Walking | 1703 | 0.0296 | |
| Private vehicle | 1462 | 0.0254 | |
| Taxi | 1056 | 0.0183 | |
| Bus | 773 | 0.0134 | |
| Coach | 687 | 0.0119 | |
| Train | 368 | 0.0064 | |
| High‐speed rail | 365 | 0.0064 | |
| Online car‐hailing | 296 | 0.0052 | |
| Contact manner | Gathering | 1997 | 0.0347 |
| Close contact | 1852 | 0.0322 | |
| Shopping | 1277 | 0.0222 | |
| Dining together | 1038 | 0.0180 | |
| In contact with others | 615 | 0.0107 | |
| Riding together | 544 | 0.0095 | |
| Going to work | 518 | 0.009 | |
| Dining out | 422 | 0.0073 | |
| Having meals | 367 | 0.0064 | |
| Grocery shopping | 301 | 0.0052 | |
| Symptom | Fever | 3717 | 0.0646 |
| Cough | 1002 | 0.0174 | |
| Abnormal temperature | 589 | 0.0102 | |
| Indisposition | 559 | 0.0097 | |
| Asymptomatic | 480 | 0.0083 | |
| Fatigue | 383 | 0.0067 | |
| Medication | 256 | 0.0044 | |
| Expectoration | 190 | 0.0033 | |
| Dry cough | 175 | 0.0030 | |
| Sore throat | 156 | 0.0027 | |
| Close‐contact person | Relatives | 732 | 0.0127 |
| Husband | 714 | 0.0124 | |
| Family member | 583 | 0.0101 | |
| Wife | 510 | 0.0089 | |
| Parents | 491 | 0.0085 | |
| Son | 471 | 0.0082 | |
| Household | 466 | 0.0081 | |
| Infected person | 459 | 0.008 | |
| Mother | 446 | 0.0078 | |
| Father | 388 | 0.0068 | |
| Region | Wuhan | 1804 | 0.0313 |
| Suifenhe | 1722 | 0.0299 | |
| Vladivostok | 1000 | 0.0174 | |
| Moscow | 887 | 0.0154 | |
| Russia | 469 | 0.0081 | |
| Beijing | 363 | 0.0063 | |
| Hubei | 327 | 0.0057 | |
| Haikou | 317 | 0.0055 | |
| Chongqing | 297 | 0.0052 | |
| Wanzhou District (Chongqing city) | 267 | 0.0046 |
Risk categories of activities in the United States divided by the Texas medical association
|
|
|
|
|---|---|---|
| High risk | Going to a bar | 9 |
| Attending a religious service with 500+ worshipers | 9 | |
| Going to a sports stadium | 9 | |
| Attending a large music concert | 9 | |
| Going to a movie theater | 8 | |
| Going to an amusement park | 8 | |
| Working out at a gym | 8 | |
| Eating at a buffet | 8 | |
| Moderate‐high | Hugging or shaking hands when greeting a friend | 7 |
| Playing football | 7 | |
| Playing basketball | 7 | |
| Traveling by plane | 7 | |
| Attending a wedding or funeral | 7 | |
| Eating in a restaurant (inside) | 7 | |
| Going to a hair salon or barbershop | 7 | |
| Moderate risk | Visiting an elderly relative or friend in their home | 6 |
| Swimming in a public pool | 6 | |
| Working a week in an office building | 6 | |
| Sending kids to school, camp, or day care | 6 | |
| Shopping at a mall | 5 | |
| Going to a beach | 5 | |
| Attending a backyard barbecue | 5 | |
| Having dinner at someone else's house | 5 | |
| Moderate‐low | Spending an hour at a playground | 4 |
| Walking in a busy downtown | 4 | |
| Eating in a restaurant (outside) | 4 | |
| Going to a library or museum | 4 | |
| Sitting in a doctor's waiting room | 4 | |
| Staying at a hotel for two nights | 4 | |
| Playing golf | 3 | |
| Going for a walk, run, or bike ride with others | 3 | |
| Grocery shopping | 3 | |
| Low risk | Going camping | 2 |
| Playing tennis | 2 | |
| Pumping gasoline | 2 | |
| Getting restaurant takeout | 2 | |
| Opening the mail | 1 |
FIGURE 8Risk categories of 46 risk events in China based on the track data of diagnosed patients
FIGURE 9The probability density distribution of infection risk. Note: The part of the curve corresponding to values less than 0 on the horizontal axis is not accurate in reality, because the IRI only had positive values
Descriptive statistics for infection risk distribution
| Statistics | Value |
|---|---|
| Minimum | 0.00 |
| Maximum | 43.19 |
| Mean | 2.67 |
| Median | 2.27 |
| Standard deviation | 0.03 |
| Skewness | 2.40 |
| Kurtosis | 20.68 |
The number of patients in various age groups under different risk index ranges
|
|
|
|
|
| |
|---|---|---|---|---|---|
| Young | 0−9 | 0 | 0 | 0 | 0 |
| 10−19 | 0 | 0 | 0 | 0 | |
| 20−29 | 0 | 1 | 10 | 11 | |
| Middle‐aged | 30−39 | 1 | 2 | 19 | 22 |
| 40−49 | 0 | 0 | 12 | 12 | |
| 50−59 | 0 | 4 | 13 | 17 | |
| Elderly | 60−69 | 3 | 1 | 8 | 12 |
| 70−79 | 0 | 0 | 0 | 0 | |
| ≥80 | 0 | 0 | 0 | 0 | |
FIGURE 10Five categories of individual infection risk of COVID‐19
The clustering results of the infection risk index based on the K‐means approach
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| Sample number | 4,853 | 3,502 | 1,034 | 64 | 2 |
| Index range | [0.00, 2.32] | [2.32, 5.34] | [5.34,10.79] | [10.79, 23.42] | [23.42, 43.19] |
FIGURE 11Perplexities with different numbers of topics
FIGURE C1Word clouds of 25 automatic classifications obtained using the LDA model
FIGURE 12The proportion of words classified into each of six identified common features
FIGURE C2Word clouds of six common features identified based on the LDA model
The confusion matrix and the results of precision and recall rates of the LDA model
|
|
|
|
|
|
|
| |
|---|---|---|---|---|---|---|---|
| Place | 27,525 | 3435 | 675 | 326 | 195 | 2788 | 78.77% |
| Travel mode | 928 | 17,509 | 427 | 98 | 78 | 1475 | 85.35% |
| Contact manner | 837 | 2040 | 10,668 | 89 | 90 | 1659 | 69.35% |
| Symptom | 5,053 | 703 | 108 | 8865 | 32 | 956 | 56.40% |
| Close‐contact person | 617 | 1583 | 368 | 56 | 1300 | 358 | 30.36% |
| Region | 1575 | 2299 | 464 | 277 | 88 | 9322 | 66.47% |
|
| 75.34% | 63.51% | 83.93% | 91.29% | 72.93% | 56.3% | 64.45% |
|
| 73.88% |