| Literature DB >> 34764552 |
Junaid Shuja1,2,3, Eisa Alanazi4,3, Waleed Alasmary2,3, Abdulaziz Alashaikh5.
Abstract
In December 2019, a novel virus named COVID-19 emerged in the city of Wuhan, China. In early 2020, the COVID-19 virus spread in all continents of the world except Antarctica, causing widespread infections and deaths due to its contagious characteristics and no medically proven treatment. The COVID-19 pandemic has been termed as the most consequential global crisis since the World Wars. The first line of defense against the COVID-19 spread are the non-pharmaceutical measures like social distancing and personal hygiene. The great pandemic affecting billions of lives economically and socially has motivated the scientific community to come up with solutions based on computer-aided digital technologies for diagnosis, prevention, and estimation of COVID-19. Some of these efforts focus on statistical and Artificial Intelligence-based analysis of the available data concerning COVID-19. All of these scientific efforts necessitate that the data brought to service for the analysis should be open source to promote the extension, validation, and collaboration of the work in the fight against the global pandemic. Our survey is motivated by the open source efforts that can be mainly categorized as (a) COVID-19 diagnosis from CT scans, X-ray images, and cough sounds, (b) COVID-19 case reporting, transmission estimation, and prognosis from epidemiological, demographic, and mobility data, (c) COVID-19 emotional and sentiment analysis from social media, and (d) knowledge-based discovery and semantic analysis from the collection of scholarly articles covering COVID-19. We survey and compare research works in these directions that are accompanied by open source data and code. Future research directions for data-driven COVID-19 research are also debated. We hope that the article will provide the scientific community with an initiative to start open source extensible and transparent research in the collective fight against the COVID-19 pandemic. © Springer Science+Business Media, LLC, part of Springer Nature 2020.Entities:
Keywords: Artificial intelligence; COVID-19; Coronavirus; Data sets; Machine learning; Open source; Pandemic
Year: 2020 PMID: 34764552 PMCID: PMC7503433 DOI: 10.1007/s10489-020-01862-6
Source DB: PubMed Journal: Appl Intell (Dordr) ISSN: 0924-669X Impact factor: 5.086
Fig. 1Taxonomy of COVID-19 open source data sets
Fig. 2A generic work-flow of AI/ML based COVID-19 diagnosis
Fig. 3A generic work-flow of Social media based ML and NLP applications [83]
Fig. 4A work-flow of speech based COVID-19 diagnosis
Comparison of COVID-19 medical image data sets
| Study | Application | Data type | Machine learning | Link |
|---|---|---|---|---|
| Cohen et al. [ | COVID-19 diagnosis | X-ray and CT Scan | Proposed Deep and transfer learning | |
| Zhao et al. [ | COVID-19 diagnosis | CT scans | Deep Convolutional network | |
| Wang et al. [ | COVID-19 diagnosis | CT scans | Deep Convolutional network, Transfer learning | |
| Shan+ et al. [ | COVID-19 infected area segmentation | Segmented CT scans | Deep Convolutional Network | NA |
| Jun et al. [ | COVID-19 infected area segmentation | Segmented CT scans | NA | |
| Medical segmentation | COVID-19 infected area segmentation | Segmented CT scans | U-Net model | |
| Coronacases Initiative | COVID-19 diagnosis | 3D CT scans | NA | |
| BSTI | COVID-19 diagnosis and reference | Miscellaneous | NA | |
| SIRM | COVID-19 diagnosis and reference | Miscellaneous | NA | |
| Radiopaedia | COVID-19 diagnosis and reference | Miscellaneous | NA | |
| Wang and Wong [ | COVID-19 diagnosis | X-ray images | Deep Convolutional network, transfer learning | |
| Hemdan et al. [ | COVID-19 diagnosis | X-ray | Deep learning | |
| Apostolopoulos and Mpesiana [ | COVID-19 diagnosis | X-ray | CNN and transfer learning | |
| Apostolopoulos et al. [ | COVID-19 diagnosis, extract biomarkers | X-ray | CNN and transfer learning | [ |
| Narin et al. [ | COVID-19 diagnosis | X-ray | CNN | [ |
| Sethy and Behera [ | COVID-19 diagnosis | X-ray | CNN + SVM | |
| Afshar et al. [ | COVID-19 diagnosis | X-ray | Capsule network + Transfer learning | |
| Hussain and Khan [ | COVID-19 diagnosis | X-ray | CNN + SVM | Cohen et al. [ |
| El-Shafa et al. [ | COVID-19 data set augmentation | X-ray and CT Scan | NA | |
| Chowdhury et al. [ | COVID-19 diagnosis | X-ray | CNN | |
| Born et al. [ | COVID-19 diagnosis | Ultra-sound | CNN |
Comparison of COVID-19 case report data sets
| Study | Application | Data type | Statistical method | Link |
|---|---|---|---|---|
| Dong et al. [ | Reporting global cases | COVID-19 cases | NA | |
| Dey et al. [ | COVID-19 visual analysis | COVID-19 statistics | Exploratory data analysis | WHO + John Hopkins + Chinese Center for Disease Control and Prevention |
| Liu et al. [ | COVID-19 city wise case analysis in China | COVID-19 statistics | NA | |
| Xu et al. [ | Reporting China cases | Location and epidemiological data | NA | |
| Killeen et al. [ | US county level data | 348 socioeconomic parameters | Proposed ML for epidemiological analysis | |
| Kucharski et al. [ | Estimating new cases | COVID-19 cases | stochastic transmission dynamic | |
| Benvenuto et al. [ | COVID-19 spread | COVID-19 statistics | ARIMA | |
| Lachmann et al. [ | Correcting under-reported cases | Reported case and world demographics | Statistical | |
| Kraemer et al. [ | Mobility-transmission analysis | Mobility and epidemiological data | Statistical | |
| Anzai et al. [ | Cases exported from China | epidemiological data set | Statistical | |
| Lai et al. [ | Effect of NPI on COVID-19 in China | Location and epidemiological data | NA | |
| Flaxman et al. [ | Effect of NPI on COVID-19 in Europe | Location and epidemiological data | Semi-mechanistic Bayesian hierarchical model | |
| Wells et al. [ | International travel control analysis | COVID-19 statistics, flight data | Statistical | |
| Tian et al. [ | COVID-19 Transmission control analysis | COVID-19 statistics | regression analysis | |
| Tindale et al. [ | Community transmission | COVID-19 cases | Expectation-maximization | |
| Du et al. [ | Community transmission | COVID-19 cases (dates) | Maximum likelihood fitting and the Akaike information criterion | |
| Nishiura et al. [ | Community transmission | COVID-19 cases (dates) | Bayesian approach |
Comparison of COVID-19 social media and scholarly article data sets
| Study | Application | Data Type | Statistical method | Link |
|---|---|---|---|---|
| Kleinberg et al. [ | Measuring emotions | Textual data | Statistical analysis (correlation and regression) | |
| Banda et al. [ | Social dynamics data | Tweets | Statistical analysis | |
| Chen et al. [ | Conversation dynamics | Tweets | NA | |
| Alqurashi et al. [ | Societal issues | Tweets (arabic) | NA | |
| Yu [ | Government and Media Tweets | Tweets | NA | |
| Lopez et al. [ | Perception and policies | Tweets | Proposed NLP, data mining | |
| Zarei et al. [ | Fake new identification | Instagram posts | NA | |
| Sarker et al. [ | COVID-19 symptoms identification | Tweets | Data mining | |
| Wang et al. [ | Collecting published articles on COVID-19 | Published articles | Proposed data extraction, retrieval mining | |
| Adhikari et al. [ | Analyzing published articles on COVID-19 | Published articles | Statistical analysis | |
| Wynants et al. [ | Systematic review of COVID-19 diagnosis articles | Published articles | CHARM and PROBAST tools | |
| COVID Scholar | NLP based search portal | Published articles | NLP |
Comparison of COVID-19 Mobility and NPI data sets
| Type | Organization | Application | Source | Coverage | Format | Link |
|---|---|---|---|---|---|---|
| Mobility | Analyze response to the pandemic | Google location service | Global | CSV and dashboard | ||
| Apple | Analyze mobility patterns in the pandemic | Apple location service | Global | CSV and dashboard | ||
| GeoDS lab | Investigate travel changes at U.S. county level | Descartes Labs and SafeGraph | U.S. | Dashboard | ||
| Baidu Inc. | Investigate migration changes in China | Baidu location service | China | Dashboard | ||
| NPI | Oxford University [ | Investigate NPI stringency | Media and gov. reports | Global | CSV and dashboard | |
| A volunteer group | Investigate effectiveness of NPI | Our World in Data | Global | CSV and dashboard | ||
| ACAPS | Investigate NPI | Media and gov. reports | Global | CSV and dashboard |
Comparison of COVID-19 Speech data sets
| Study | Application | Data type | ML method | Sample size | Link |
|---|---|---|---|---|---|
| Imran et al. [ | Cough based COVID-19 diagnosis | Voice data | Deep and ML classifiers | NA | NA |
| Brown et al. [ | Cough and breath based COVID-19 diagnosis | Voice data | Logistic Regression, Gradient Boosting Trees, and SVM | 7000 | |
| Sharma et al. [ | Cough, breath, and speech based COVID-19 diagnosis | Voice data | NA | approx. 1000 | |
| Virufy | Cough based COVID-19 diagnosis | Voice data | NA | 16 | |
| Faezipour and Abuzneid [ | Breath based COVID-19 diagnosis transmission | Voice data | NA | NA | NA |
| Trivedy et al. [ | Lung disease classification | Breath samples | Stacked AutoEncoders, Long Short Term Memory Network, and CNN | 150 | NA |
| Han et al. [ | COVID-19 speech analysis | Voice data | SVM with linear kernel | 52 | NA |