Literature DB >> 34485681

Data interpretation and visualization of COVID-19 cases using R programming.

Yagyanath Rimal1, Saikat Gochhait2, Aakriti Bisht3.   

Abstract

BACKGROUND: Data analysis and visualization are essential for exploring and communicating medical research findings, especially when working with COVID records.
RESULTS: Data on COVID-19 diagnosed cases and deaths from December 2019 is collected automatically from www.statista.com, datahub.io, and the Multidisciplinary Digital Publishing Institute (MDPI). We have developed an application for data visualization and analysis of several indicators to follow the SARS-CoV-2 epidemic using Statista, Data Hub, and MDPI data from densely populated countries like the United States, Japan, and India using R programming.
CONCLUSIONS: The COVID19-World online web application systematically produces daily updated country-specific data visualization and analysis of the SARS-CoV-2 epidemic worldwide. The application will help with a better understanding of the SARS-CoV-2 epidemic worldwide.
© 2021 The Authors.

Entities:  

Keywords:  Coronavirus; Covid-19; Data visualization; Machine learning; Open data map

Year:  2021        PMID: 34485681      PMCID: PMC8404394          DOI: 10.1016/j.imu.2021.100705

Source DB:  PubMed          Journal:  Inform Med Unlocked        ISSN: 2352-9148


Introduction

The first case of COVID originated in China on Dec 29 and Jan 3, 2020, when fifty people developed pneumonia-like symptoms [1,[2]]. Before the Chinese New Year, Wuhan was a significant transport hub [3,4]. Several new cases of COVID infections were reported daily after that, resulting in the World Health Organization declaring COVID 19 an epidemic [5]. Although 43% of hospitalized patients showed fever symptoms, more than 80% of COVID patients showed fever in hospital and quarantine facilities. Sometimes, these diseases may not get detected based on symptoms alone [6,7]. COVID 19 disease primarily manifests as diarrhea, cough, and shortness of breath [9]. We cannot, therefore, assume that patients without fever will not find COVID infection. Early symptoms of diarrhea may occur in 3%–5% of patients [8]. Those suffering from mild symptoms must isolate themselves for 14 days from the first infection. In this situation, patients who experience difficulty breathing and have chest infections should see a doctor right away [10]. During this time, doctors encourage patients to rest and rebuild their immune system by taking them into quarantine [11]. Only 5% of patients suffer from ARDS (Acute Respiratory Distress Syndrome), a condition in which a virus extends into the air sack with water, making breathing and purifying red blood cells more complex [12]. In cases of severe lung damage, the oxygenation process, which removes blood from the patient and filters it before returning it to the patient, may be necessary [13]. Heart patients, kidney patients, diabetes patients, long-term pharmaceutical users experience more severe symptoms than those younger [14]. Despite this, we are concerned about halting the spread at the community level. Everyone speaks about flattening the curve, which is slowing its spread so that the hospital system can deal with it more quickly rather than having to manually pick it out in a fixed period to prevent the entire hospital system from becoming like a country without toilet paper [15,16]. Table 1 shows that the total number of COVID cases in India has reached 720346, with 22510 new cases discovered every day in 0607/202 records. Overall, there have been 1704 COVID-related deaths in India. The number represents 4% of all active cases in India. A total of 613 deaths per day out of 8944 patients out of 260022 active cases. In the same way, a country like India has tested 9969662 people out of 13 million people living there.
Table 1

India state data statistics.

StatesIndiaGujaratUttar PradeshMaharashtraDelhiTamil Nadu
Status22,5007359295,3681,3793,827
Conformed7,20,34636858286362,11,9871,00,8231,14,978
Active2,59,92685748718876822562046836
Status15,2564233483,5227493,793
Recovery4,40,15026323191091,15,2627208866571
Status47317242044861
Death47317242044861
Status10 M6.3K25.9K22.6K13.9K34.8K
Total138019747418.5K890K1.1 M657.4K1.4 M
India state data statistics. The interactive map of Table 1 is available for free at https://www.statista.com/statistics/1103458/india-novel-coronavirus-covid-19-cases-by-state/, describes the most interactive rather than tabular records when compared to both the given table and the chart presented here. The graph shows states having the highest or lowest number of new COVID cases. Likewise, trend and pattern line graphs and charts provide more information than tabular statistics, as shown in the preceding table. Compared to the fact table and figure comparisons above, map visualization is one of the most effective tools for presenting data. Despite this, the theory behind flattening the curve ensures that the balance between hospital ventilators, quarantine medicine systems, and human control in every country is maintained. The hospital system can accommodate everyone if the spread is gradual [4]. As a result of 1918's influenza and pneumonia outbreaks in Philadelphia, the mortality rate rose sharply during October [17], enabling a flat curve. As of now, the primary control of COVID 19 is considered social distance [18]. Consequently, flattening the curve consists of 20 s of handwashing, 6 feet of social distancing, and wearing a mask unless necessary. However, health workers and those who suffer from symptoms should also wear the mask. Vaccines and antiviral medicines have developed. However, several clinical trials on appropriate vaccines are currently underway. The best way to stop the spread of the disease is to disengage socially to flatten the curve and manage our health care system globally [19]. Twenty-eight vaccines are currently under the testing phase by the WHO [20]. Since healthy individuals are vaccinated, vaccination must be more precise. It should also be very safe to use and require a considerable amount of time to produce. As part of the development process, 20 to 80 people will be evaluated for the medications in the first phase, followed by 100–300 healthy participants in the second phase [21]. If there are no harmful substances detected, a license should grant for production, with appropriate precautions for refrigeration. This process requires time. The production is scheduled accordingly for vaccinating the people [22]. For example, the production of the Ebola vaccine took four years. Tests conducted on mice, rabbits, monkeys, and even people in the future. The substance does not cause harm and generates messenger ribonucleic acid (mRNA). In addition, if all of the trials prove to be successful, the World Health Organization (WHO) hopes to shorten the process. The viral infection of COVID-19 is the result of SARS-Cov-2 (Severe Acute Respiratory Syndrome). Coronavirus 2 is similar to the SARS coronavirus, which caused an outbreak in 2002 [23]. According to reports, bat in Wuhan, China, was the primary source of human infection. Later, it spread to other parts of the world. There were 903826 cases worldwide and 45335 deaths in April 2020. Around 5% of the total cases were fatal [9]. In addition, many cases were asymptotic carriers of the Coronavirus, which resulted in an exact death rate of 0.7% for the total number of cases, according to the study [24]. Reports from CDC expect the number of people affected by these diseases to grow, with up to 25% of the population showing symptoms. With a proper health care system and protective equipment, a curve plot on the y-axis represents the full potential of the health care system to combat COVID-19. Because of this, the data presented in various official sources only show the records that need some visualization plot for better future action.

Methodology and data analysis using R programming

Records of COVID-19 updated within 24 h after the pandemic is declared. This study focuses on the interactive presentation of these figures and data. The data sources for this research include https://datahub.io/core/covid-19 # resources-covid-19_zip [25] and https://www.mdpi.com/2079-9292/9/5/827/htm [26]. Generating graphical outputs from raw open-source records and manage and organize the raw data to create data presentations. After loading various libraries (leaflet), (tidy verse), (ggmap), (htmltools), (leaflet. extras), (maps), (ggplot2), (mapproj), (mapdata), (spData) on the R console, the world city data with 15493 records on 11 variables is stored in the data. The tidier data format changed by using the tbl_df function. However, the most modern R package readr provides several functions (read_delim(), read_tsv() and read_csv()), which are faster than R base functions and import data into R as a (pronounced as “tibble diff”). A data structure containing city name, asci name, longitude-latitude name, and population of the capital city is available open-source on https://datahub.io/core/world-cities. As shown below, the data frame looks like to access one variable in a dataset, use the dollar sign "$". To create a new table from combining multiple vectors, use the function (): #str(data) output shows variables, data type, and few records of the data Variables are categorized using the structure command, which describes their properties. In the tidverse command, data is filtered via shift-ctrl-m, using the United States data only depending on the information required for the selected country, as described below. #Using Dplyr library, the Filter function used to filter the data of the country United States In the same way, the world map records with longitude, latitude group, and region stored in the w variable as w=map data ('world'). Based on the pattern below, the structure appears like the one below. #Structure of Map Data Data represent the world by country and its subregions. Then use the settings ink=map data ('world', region=c ('Nepal', 'China') to plot the map of that country. Inbuilt ggplot2 functions typically plot three-country maps, as in the selection below. This Rmarkdown file use to test Choropleth's plot on the Japan map from the map data package. Maps include regions only, not subregions. The ggplot command uses longitude and latitude groups with filled-in region names, and the polygon supports the black border of each county. Like the U.S., each state's data is taken from the world data of each state and matched and grouped with a geographical map. The ggplot2, one of the core members of the tidyverse, accesses the datasets by running the below code: #Plot the data The system provides a user-friendly web-based interface for viewing COVID-19 data and metrics. World, country and regional maps are color-coded to represent various selectable infection attributes at those locations. Clicking on any given location brings up a set of pages that provide details about that location—from raw statistics to charts to advanced metrics and commentary. The user interface provides several ways to navigate, such as Map View, Trend View, Stats View, Hotspots View. Below is a detailed introduction to each view (see Fig. 1, Fig. 2).
Fig. 1

Country maps of China, India, and Nepal.

Fig. 2

U.S. State maps.

Application libraries like a library (ggfortify), library (mapdata), library (maps), and library (ggplot2) Fig. 3 depicts the map of Japan from the world2 data set plotted with jpg *-ggplot2: map_data ('world2′, 'japan'). It contains 1097 observations with six variable geographic points (see Fig. 4, Fig. 5, Fig. 6).
Fig. 3

Japan country map.

Fig. 4

Covid data in the U.S.

Fig. 5

Covid maps with respective data of the U.S.

Fig. 6

Interactive geographic map of U.S.

Country maps of China, India, and Nepal. U.S. State maps. Japan country map. Covid data in the U.S. Covid maps with respective data of the U.S. Interactive geographic map of U.S. #Structure of Map Data #Japan Map plotted using ggplot library The above maps illustrate the COVID data of country and city records. In addition to the open-source COVID-19 data, a zip file [25] with data projections for country-wise and state-wise records was also available. The COVID data is read into a file whose structure resembles data. As shown below, data frame 528282 records 13 variables from 58 states in the U.S. #Structure of data of Covid confirmed cases In many cases, the data format of collected data can be converted from the first format with qualifying digits to numeric data variables after converting longitude and latitude and rounding their data management values. #Data filtered to the U.S. & summarised by Province. States using its count Prepare a map data set, a summary, and an arranger used after filtering the records of a single country, such as the United States (U.S.). Therefore the COVID records of the United States (U.S.) can be viewed using the filter with the group with the help of command on 06/07/2020, which includes the counts of 56 states from the United States (U.S.) and the corresponding COVID cases in decreasing order, as shown below. #Check Province. State summarised records. Convert Province. State Names to Lower Case However, another table has street information listed in lower cases, so the lower function changed to the lower cases below. #Sample records after converting to lower case As shown below, the state information provides the state for the state, providing longitude, latitude, grouping order, region long lat grouping order, and subregion long lat grouping order. In order to complete the merge operation, map information and COVID cases combine and merge into a standard index as the name of the state and region, while the lower case finishes for the map data organization. The data1 = merge (ss, cc, by.x='region', by.y='Province. State') command groups combine, region, longitude, latitude, group, order, sub-region with COVID counted. In the same way, the tabular information selects only the essential records for the map, eliminating unnecessary data. #Required fields selected after subsetting the summarised data In the case of integer types with the region, longitude, latitude, group, and count, the following records of 15537 observations based on five variables chosen for map plotting: This includes 11178 11178 11178 11178 11178 11178 11178 11178 11178 11178 11178 11178 11178 11178 11178 11178 11178 11178 11178 11178 11178 11178 11178 as outlined below: #Data is plotted on U.S. map using ggplot library The command displays the U.S. map mapped with the number of COVID cases ranked higher and lower levels. In a legend scale, the color indicates the highest to the lowest state. As a result, the output is more than adequate to present tabular data using the statistics above, like India's tabular records. According to the legend on the left, there are a total of 79 cases with fading colors. Interactive maps of COVID record data could easily be visualized using clusters. Through using multiple map providers, the leaflet packet connects the online world map. Marker clusters, for example, add tiles that are circular markers with names and colors that add to a map. #Data plotted using leaflet library on U.S. map Using a leaflet, addTiles with circle markers, and describing each state's respective locations help plot a map of COVID cases in each state. Viewing the state aggregate features of the country's map in zoomed-in mode. By highlighting the boundaries of maps, planners can better judge what measures should consider. Using the zoom in and zoom out feature, the leaflet with added tiles could plot COVID total cases state-wise and city-wise with the border area location of the open street base maps. In the following manner, COVID data from many states and cities shall be share interactively.

Discussion

The COVID-19 pandemic is challenging our society and economy in an unprecedented way. Overall, the U.S. response to containing COVID-19 may not have been as effective as other countries, like Japan and India. These studies may have been due to insufficient or delayed testing and a lack of alternative monitoring tools near the pandemic's beginning [27]. Early warning and detection may represent a critical opportunity for India, Japan, and the U.S. to track the rate of respiratory illness and quickly institute policies to prevent or at least mitigate a future outbreak. The study recommended that the COVID-19 diagnosis and prognosis models adhere to transparent and open-source reporting methods to reduce bias and encourage real-time application. Secondly, with the help of the system, it provides a user-friendly web-based interface for viewing COVID-19 data and metrics. World, country and regional maps are color-coded to represent various selectable attributes of the infection at those locations with latitude and longitude values for the countries like India, Japan, and the U.S. Real-time epidemiologic data is critical to managing different aspects of a pandemic. For instance, this data can help public health authorities forecast demand/surge models, thereby allowing public or private organizations to reposition resources or reallocate personnel quickly. These are corroborating data that should consider in combination with other indicators, such as the officially reported number of newly positive laboratory tests, disease-related hospitalizations, and disease-related deaths.

Conclusion

In this paper, we have studied COVID data interpretation and visualization using open-data sources for larger countries like the USA and India to understand better how COVID is spreading nationwide and internationally. An effective tool for updating country-specific analysis and visualizing epidemiological indicators of the COVID-19 epidemic, COVID19-World aims to fill a gap in the field by presenting a set of valuable tools for the current global COVID-19 epidemic. Through the web application, a better understanding of the epidemiological development in each country can be obtained and help with country-specific surveillance. Despite promising results, some additional suggestions could enhance the performance of the algorithms and make them more useful, for example, by strengthening the datasets of several health centers throughout the country or by ensuring that all details in the screening are filling with accuracy.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
  18 in total

1.  Bivalent Vaccine Effectiveness Against Anal Human Papillomavirus Positivity Among Female Sexually Transmitted Infection Clinic Visitors in the Netherlands.

Authors:  Petra J Woestenberg; Audrey J King; Birgit H B Van Benthem; Suzan Leussink; Marianne A B Van der Sande; Christian J P A Hoebe; Johannes A Bogaards
Journal:  J Infect Dis       Date:  2020-03-28       Impact factor: 5.226

Review 2.  Controlling hospital-acquired infection: focus on the role of the environment and new technologies for decontamination.

Authors:  Stephanie J Dancer
Journal:  Clin Microbiol Rev       Date:  2014-10       Impact factor: 26.132

3.  Failing the Test - The Tragic Data Gap Undermining the U.S. Pandemic Response.

Authors:  Eric C Schneider
Journal:  N Engl J Med       Date:  2020-05-15       Impact factor: 91.245

4.  Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study.

Authors:  Joseph T Wu; Kathy Leung; Gabriel M Leung
Journal:  Lancet       Date:  2020-01-31       Impact factor: 79.321

5.  A rapid advice guideline for the diagnosis and treatment of 2019 novel coronavirus (2019-nCoV) infected pneumonia (standard version).

Authors:  Ying-Hui Jin; Lin Cai; Zhen-Shun Cheng; Hong Cheng; Tong Deng; Yi-Pin Fan; Cheng Fang; Di Huang; Lu-Qi Huang; Qiao Huang; Yong Han; Bo Hu; Fen Hu; Bing-Hui Li; Yi-Rong Li; Ke Liang; Li-Kai Lin; Li-Sha Luo; Jing Ma; Lin-Lu Ma; Zhi-Yong Peng; Yun-Bao Pan; Zhen-Yu Pan; Xue-Qun Ren; Hui-Min Sun; Ying Wang; Yun-Yun Wang; Hong Weng; Chao-Jie Wei; Dong-Fang Wu; Jian Xia; Yong Xiong; Hai-Bo Xu; Xiao-Mei Yao; Yu-Feng Yuan; Tai-Sheng Ye; Xiao-Chun Zhang; Ying-Wen Zhang; Yin-Gao Zhang; Hua-Min Zhang; Yan Zhao; Ming-Juan Zhao; Hao Zi; Xian-Tao Zeng; Yong-Yan Wang; Xing-Huan Wang
Journal:  Mil Med Res       Date:  2020-02-06

6.  Comparing the socio-economic implications of the 1918 Spanish flu and the COVID-19 pandemic in India: A systematic review of literature.

Authors:  Aadya Sharma; Dibyashree Ghosh; Neha Divekar; Manisha Gore; Saikat Gochhait; S S Shireshi
Journal:  Int Soc Sci J       Date:  2021-03-11

7.  Rapid asymptomatic transmission of COVID-19 during the incubation period demonstrating strong infectivity in a cluster of youngsters aged 16-23 years outside Wuhan and characteristics of young patients with COVID-19: A prospective contact-tracing study.

Authors:  Lei Huang; Xiuwen Zhang; Xinyue Zhang; Zhijian Wei; Lingli Zhang; Jingjing Xu; Peipei Liang; Yuanhong Xu; Chengyuan Zhang; Aman Xu
Journal:  J Infect       Date:  2020-04-10       Impact factor: 6.072

8.  Epidemiological and clinical characteristics of 161 discharged cases with coronavirus disease 2019 in Shanghai, China.

Authors:  Sheng Lin; Hao Pan; Huanyu Wu; Xiao Yu; Peng Cui; Ruobing Han; Chenyan Jiang; Dechuan Kong; Yaxu Zheng; Xiaohuan Gong; Wenjia Xiao; Shenghua Mao; Bihong Jin; Yiyi Zhu; Xiaodong Sun
Journal:  BMC Infect Dis       Date:  2020-10-20       Impact factor: 3.090

9.  Characteristics of and Important Lessons From the Coronavirus Disease 2019 (COVID-19) Outbreak in China: Summary of a Report of 72 314 Cases From the Chinese Center for Disease Control and Prevention.

Authors:  Zunyou Wu; Jennifer M McGoogan
Journal:  JAMA       Date:  2020-04-07       Impact factor: 56.272

10.  Breadth of concomitant immune responses prior to patient recovery: a case report of non-severe COVID-19.

Authors:  Irani Thevarajan; Thi H O Nguyen; Marios Koutsakos; Julian Druce; Leon Caly; Carolien E van de Sandt; Xiaoxiao Jia; Suellen Nicholson; Mike Catton; Benjamin Cowie; Steven Y C Tong; Sharon R Lewin; Katherine Kedzierska
Journal:  Nat Med       Date:  2020-04       Impact factor: 87.241

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.