Literature DB >> 35594810

The accuracy of machine learning approaches using non-image data for the prediction of COVID-19: A meta-analysis.

Kuang-Ming Kuo¹, Paul C Talley², Chao-Sheng Chang³.

Abstract

OBJECTIVE: COVID-19 is a novel, severely contagious disease with enormous negative impact on humanity as well as the world economy. An expeditious, feasible tool for detecting COVID-19 remains yet elusive. Recently, there has been a surge of interest in applying machine learning techniques to predict COVID-19 using non-image data. We have therefore undertaken a meta-analysis to quantify the diagnostic performance of machine learning models facilitating the prediction of COVID-19.
MATERIALS AND METHODS: A comprehensive electronic database search for the period between January 1st, 2021 and December 3rd, 2021 was undertaken in order to identify eligible studies relevant to this meta-analysis. Summary sensitivity, specificity, and the area under receiver operating characteristic curves were used to assess potential diagnostic accuracy. Risk of bias was assessed by means of a revised Quality Assessment of Diagnostic Studies.
RESULTS: A total of 30 studies, including 34 models, met all of the inclusion criteria. Summary sensitivity, specificity, and area under receiver operating characteristic curves were 0.86, 0.86, and 0.91, respectively. The purpose of machine learning models, class imbalance, and feature selection are significant covariates useful in explaining the between-study heterogeneity, in terms of both sensitivity and specificity.
CONCLUSIONS: Our study findings show that non-image data can be used to predict COVID-19 with an acceptable performance. Further, class imbalance and feature selection are suggested to be incorporated whenever building models for the prediction of COVID-19, thus improving further diagnostic performance.

Entities: Chemical

Keywords: COVID-19; Diagnostic test accuracy; Machine learning approach; Meta-analysis

Mesh：

Year: 2022 PMID： 35594810 PMCID： PMC9098530 DOI： 10.1016/j.ijmedinf.2022.104791

Source DB: PubMed Journal: Int J Med Inform ISSN： 1386-5056 Impact factor: 4.730

Introduction

Coronavirus Disease 19 (COVID-19), caused by severe acute respiratory syndrome Coronavirus 2 (SARS-CoV-2) [1], has posed tremendous challenge due to the pandemic declared after March 2020 [2]. As of December 23, 2021, more than 27 million cases have been confirmed, including over 5 million deaths [3]. Currently, widely accepted management strategies for minimizing the spread of COVID-19 include forced lockdowns, travel restrictions, quarantines, social distancing, isolation, and infection-control measures [4]. As for those individuals who were infected, supportive care is the primary treatment available since specific effective and curative therapeutics remain elusive [4]. Commonly, COVID-19 adverse prognosis includes hospitalization, transfer to intensive care units, or even mortality [5], [6]. Further, advanced COVID-19 is combined with heterogeneous clinical features [5]. A large number of those infected remain asymptomatic due to the nature of COVID-19 symptomatology [7]. Efficient diagnosis of COVID-19 is difficult to achieve. The lack of optimal sensitivity and specificity in clinical detection methods has been shown to be a significant reason behind the rapid spread of COVID-19 [8]. The use of real-time, reverse transcription polymerase chain reaction (rRT-PCR) is presently the diagnostic gold standard used to confirm COVID-19 infection [1]. Materials required for this assay however are reportedly in short supply, leading to possible delays in diagnostic results throughout the pandemic period [9]. In view of such complex circumstances, a rapid and early diagnostic tool, or a ready system efficacious to identify the infected individuals, thus plays a vital role in managing the spreading COVID-19 pandemic. Compared with traditional model-building approaches, machine learning techniques can model data without strict statistical assumptions [10], and there have been significant advances in modeling clinical data for predicting diagnosis or prognosis of differing diseases [11], [12], [13], COVID-19 being no exception. Two types of clinical data are usually employed to establish COVID-19 predictive models by means of machine learning techniques: structured data and/or image data. Structured data may include demographic information [14], [15], [16], [17], [18], [19], [20], [21], medical histories [22], [23], [24], symptoms [14], [25], [26], [27], [28], vital signs [20], [25], [29], [30], [31], [32], or laboratory tests [6], [15], [16], [17], [18], [19], [32]; and, image data may include lung computed tomography (CT) scan [33], [34] and/or chest X-rays (CXR) [35], [36]. Imagery data, especially CT scans, are found to be the most accurate diagnostic modality of SARS-CoV-2 according to several case reports, reviews and meta-analyses [8], [37], [38], [39], [40], [41], [42]. Compared with image data, structured clinical data can be easily obtained after patients’ first encounters with frontline healthcare professionals, and thus can be incorporated into machine learning models to predict COVID-19 more efficiently. Based on the notion that machine learning techniques have been emerging as a potential tool for healthcare professionals to accelerate their process of decision-making and to improve diagnostic accuracy. A meta-analysis to investigate the potential of using machine learning to identify COVID-19 is thus not only essential but also quite timely. Until recently, a systematic understanding of how machine learning techniques using non-imagery-based data can be used to predict COVID-19 is still lacking, not to mention a thorough meta-analysis of extant diagnostic accuracy. To fill this research gap, the objectives of this study are, as follows: 1) to meta-analyze the accuracy of the diagnosis of and prognosis of COVID-19 based on non-image data via machine learning techniques; and, 2) to compare and to contrast the diagnostic accuracy of plausible covariates that can account for the heterogeneity found among selected studies. Hence, two research questions (RQ) were proposed: 1) RQ1: What is the diagnostic accuracy of machine learning models, based on non-image data, for the diagnosis and prognosis of COVID-19 patients?; and, 2) RQ2: What covariates may contribute to the heterogeneity between these selected studies? The remainder of the article will be organized accordingly. The “Material and methods” section describes the research method used in this study. The “Results” section presents the analytical results, the “Discussion” section discusses the significance of the findings, and the “Conclusions” section summarizes the findings of the current study.

Material and methods

This section describes the search strategy and selection process for the literature used to make up this study, as well as the method of extraction for required information taken from that literature. In this section, we also describe the tools for quality assessment and statistical techniques employed.

Search strategy and selection process

A comprehensive search of electronic databases was made. It included PubMed, ScienceDirect, and SpringerLink, and it was carried out between 1st January 2020 and 3rd December 2021 using the keyword combinations of COVID-19, machine learning, deep learning, and artificial intelligence. We did not search other databases such as Scopus or Web of Science for two principal considerations: 1) PubMed primarily focuses on medicine and biomedical sciences, which is more specific to this study, while Scopus and Web of Science cover multi-disciplinary fields [43]. 2) PubMed is free and easier to use than Scopus and Web of Science [43]. In addition, we used Google Scholar as a supplementary source to search articles. Despite Google Scholar not being suggested as stand-alone for purposes of a literature search [44], Google Scholar possesses sufficient stability in terms of article coverage to be used compared with either Scopus or Web of Science [45]. Detailed search queries for each database are shown in Table 1 . Studies to be considered relevant were expected to meet the following criteria: 1) studies must have investigated the predictive accuracy of COVID-19; 2) studies should have leveraged structured data as features; 3) studies should have used artificial intelligence to predict the spread of the COVID-19 disease; 4) studies should have provided sufficient outcomes of predictive models; and, 5) studies taken from the literature must have been written in English and peer-reviewed. Studies meeting the following criteria were excluded: 1) studies using images (CT or CXR), image- associated reports, or unstructured data used as predictive features; 2) studies irrelevant to our research goal; and, 3) studies where full texts were unavailable for purposes of examination. Based on the stated inclusion and exclusion criteria, the data were first assessed by the first author (K.M.) and cross-checked by the third author (C.S.). Any discrepancies were resolved between both authors through a consensus discussion to ensure database accuracy/consistency. Finally, we located 30 studies including 34 models that predicted COVID-19 (see Fig. 1 ). Among the 30 studies, 17 studies including 20 models were extracted from PubMed, ScienceDirect, and SpringerLink, and 13 studies [6], [14], [17], [19], [21], [26], [27], [28], [29], [46], [47], [48], [49] including 14 models were identified by Google Scholar. Furthermore, three studies [21], [30], [32] included more than one model.

Table 1

Search strategy for each database.

Database	Search strategy
PubMed	COVID-19[Title/Abstract] AND ((machine learning[Title/Abstract]) OR (deep learning[Title/Abstract]) OR (artificial intelligence[Title/Abstract]))
ScienceDirect	Title, abstract or author-specified keywords: COVID-19 AND ((machine learning) OR (deep learning) OR (artificial intelligence))
SpringerLink	“COVID-19″ AND (”machine learning“ OR ”deep learning“ OR ”artificial intelligence“)
Google Scholar	COVID-19 machine learning deep learning artificial intelligence

Fig. 1

PRISMA flow diagram.

Search strategy for each database. PRISMA flow diagram.

Data extraction

From each of the included studies, the following information was extracted: purposes of the predictive models, types of prognostic predictive models, types of features for establishing predictive models, geographic areas of the samples used to build predictive models, machine learning techniques adopted to build the predictive models, whether class imbalance issues were handled substantively, and whether extra feature selection strategies were adopted. We extracted or calculated the original true/false positives and true/false negatives from each study to derive summary outcome measures.

Methodological analysis

Diagnostic accuracy studies are often at the risk of being biased since they originate from differences in methodology, sample recruitment, or data collection [50]. We therefore assessed the quality of studies according to the revised Quality of Diagnostic Studies (QUADAS-2) guidelines, including four domains: sample selection, index test, reference standard, flow, and timing [51].

Statistical analysis

We meta-analyzed the diagnostic accuracy by using lme4 [52], mada [53], and meta [54] packages for R statistics. Sensitivity and specificity were pooled in accordance with a bivariate model [55]. Area under receiver operating characteristic curve (AUROC), diagnostic odds ratio (DOR), positive likelihood ratio (LR+), and negative likelihood ratio (LR-) were also estimated for purposes of this study. Forest plots were created to show heterogeneity among the models up for further consideration. Moreover, a summary receiver operating characteristic curve with 95% confidence interval (CI) and 95% prediction interval (PI) were employed to assess the existence of a threshold effect among the models. According to prior suggestions about possible sources of heterogeneity between the selected studies [50], meta-regression was undertaken with three plausible types of covariates: 1) model purpose related covariates: including purpose of predictive models, and types of prognosis predicted; 2) sample related covariates: including feature type (e.g., demographic data, vital signs, laboratory data, medical history), and geographic areas of the patients; and, 3) machine learning related covariates: including types of artificial intelligence adopted, strategies for class imbalance, and feature selection strategies. Based on the due diligence performed, the Institutional Review Board of E-Da Hospital approved the study protocol (EMRP-109-158).

Results

In this section, we report the characteristics of included studies, as well as the results of the quality assessment made. Subsequently, we report the summary diagnostic accuracy of included studies and potential covariates used for explaining between-study heterogeneity.

General study characteristics

Among the 34 predictive models examined in this study, 14 models (41.18%) aim to diagnose the prevalent COVID-19 disease while 20 models (58.82%) aim to predict the prognosis of the COVID-19 patients (see Table 2 ). In the study parameters, numbers 7, 12, and 1 models serve to predict whether patient status ends-up in critical care, mortality, or hospitalization, respectively. Twenty-four models used only laboratory data or combined laboratory data with other clinical data (such as demographic information, symptoms, vital signs, or history) to predict the COVID-19 disease, while 10 models used only clinical data to predict COVID-19 disease without inclusion of any laboratory data. Most samples belong to a Western (75.76%) context, such as American or European. Thirty-two models (94.12%) were based on machine learning techniques (e.g., random forest, support vector machine, or XGBoost), only 2 models used deep learning techniques [14], [23]. About 35% of the models leveraged extra approaches (e.g., increasing the weight of the minority class or random under/over sampling) to deal with class imbalance issues. Finally, 9 models (26.47%) used additional strategies, such as recursive feature elimination or least absolute shrinkage, to select important features before training the predictive model [6], [16], [23], [27], [28], [48], [49], [56], [57], [58].

Table 2

Characteristics of included models (n = 34).

Characteristics	Values	Frequency	%
Purpose	Diagnosis	14	41.18
Purpose	Prognosis	20	58.82
Prognosis (n = 20)	Critical care	7	35.00
	Mortality	12	60.00
	Hospitalization	1	5.00
Feature type	Laboratory data included	24	70.59
Feature type	None laboratory data	10	29.41
Geographic area (n = 33)	Eastern	8	24.24
Geographic area (n = 33)	Western	25	75.76
AI techniques	Machine learning	32	94.12
AI techniques	Deep learning	2	5.88
Class imbalance processed	Yes	12	35.29
Class imbalance processed	No	22	64.71
Feature selection	Yes	9	26.47
Feature selection	No	25	73.53

Note: One study may be designed to predict more than one COVID-19 disease.

Characteristics of included models (n = 34). Note: One study may be designed to predict more than one COVID-19 disease.

Quality assessment

We assessed the quality of the selected studies based on QUADAS-2 [51] (see Fig. 2 ). For identifying bias, the 10 and 24 models were classified as having some concerns (29.41%) and low-risk of bias (70.59%) related to patient selecting domain, respectively. All 34 models were considered as low-risk of bias regarding index test and reference standard domains. Further, 13 and 21 models were regarded as generating some concerns (38.24%) and low-risk of bias (61.76%) about their flow and timing domains, respectively. As for the applicability judgment, 11 and 23 models were considered to be of some concern (32.35%) and low concern of applicability (67.65%), respectively. Finally, all 34 models were considered to be of low concern of applicability regarding the index test and reference standard domains.

Fig. 2

Methodological assessment by QUADAS-2.

RQ1: Diagnostic accuracy of non-image predictive models based on machine learning

The effect size pooled by traditional univariate meta-analysis can sometimes be misleading [59]. We therefore pooled the effect sizes based on the bivariate model [55]. As shown in Table 3 , the overall pooled area under receiver operating characteristic curve for machine learning to predict the COVID-19 disease is about 0.91. Moreover, pooled sensitivity, specificity, diagnostic odds ratio, positive likelihood ratio, and negative likelihood ratio were 0.86, 0.86, 37.93, 6.20, and 0.16 respectively (see Table 3). Fig. 3 and Fig. 4 show the forest plot of sensitivity/specificity and the summary receiver operating characteristic curves with 95% confidence interval and prediction interval for the 34 predictive models, respectively. Two χ2 tests were conducted to test for equality of sensitivity and of specificity, and these showed significant results, χ2(33) = 1090.94, p < 0.001 and χ2(33) = 113615.20, p < 0.001, indicating significant between-study heterogeneity existed in terms of both sensitivity and specificity.

Table 3

Performance of predicting COVID-19 disease by artificial intelligence.

Metrics	Performance (95% CI)
AUROC	0.91
Sensitivity	0.86 (0.81, 0.90)
Specificity	0.86 (0.79, 0.91)
DOR	37.93 (21.96, 53.90)
LR+	6.20 (3.78, 8.63)
LR-	0.16 (0.12, 0.21)

Note: AUROC: Area under receiver operating characteristic curve, DOR: Diagnostic odds ratio, LR+: Positive likelihood ratio, LR-: Negative likelihood ratio, CI: confidence interval.

Fig. 3

Forest plot of sensitivity and specificity in this study.

Fig. 4

Summary receiver operating characteristic curves for collected studies.

Performance of predicting COVID-19 disease by artificial intelligence. Note: AUROC: Area under receiver operating characteristic curve, DOR: Diagnostic odds ratio, LR+: Positive likelihood ratio, LR-: Negative likelihood ratio, CI: confidence interval. Forest plot of sensitivity and specificity in this study. Summary receiver operating characteristic curves for collected studies.

RQ2: Plausible covariates explaining between-study heterogeneity

Due to the significant between-study heterogeneity for both sensitivity and specificity, we also conducted sub-group analysis by means of meta-regression to further identify potential covariates that might influence the performance of the COVID-19 disease predictive models. As shown in Table 4 and Fig. 5 (a), the sensitivity was significantly (p = 0.002) higher for the 14 models designed to diagnose COVID-19 (0.92; 95% CI, 0.88–0.95) than for the other 20 models for predicting the prognosis of COVID-19 (0.79; 95% CI, 0.71–0.86). The corresponding specificity of the 14 models for COVID-19 diagnosis (0.80; 95% CI, 0.67–0.89) was albeit lower than those of the 20 models for COVID-19 prognosis (0.89; 95% CI, 0.82–0.94), but didn’t reach statistical significance (p = 0.144). If we go deeper into the models for COVID-19 prognosis, the sensitivity of the 7 models for predicting critical care (i.e., patients getting transferred to intensive care units or using ventilation apparatus) (0.73; 95% CI, 0.49–0.88) due to COVID-19 was lower than for the 12 models for predicting mortality (0.81; 95% CI, 0.73–0.87) due to COVID-19, but didn’t reach statistical significance (p = 0.255), as shown in Table 4 and Fig. 5 (b). The corresponding specificity was proximate between the 7 models for critical care and the 12 models for mortality (0.88 vs. 0.90, p = 0.689). There is only one model for predicting hospitalization due to COVID-19, as such we did not include that model into the sub-group analysis.

Table 4

Summary estimates for sensitivity and specificity with covariates, Note: CI denotes confidence interval.

Type of covariates	Covariates	Values	Metrics	Summary estimates	95% CI	P value
	Overall (n = 34)		Sensitivity	0.86	[0.81, 0.90]	< 0.001
	Overall (n = 34)		Specificity	0.86	[0.79, 0.91]	< 0.001
Model purpose related	Purpose	Diagnosis (n = 14)	Sensitivity	0.92	[0.88, 0.95]	0.002
		Diagnosis (n = 14)	Specificity	0.80	[0.67, 0.89]	0.144
		Prognosis (n = 20)	Sensitivity	0.79	[0.71, 0.86]	[Reference]
		Prognosis (n = 20)	Specificity	0.89	[0.82, 0.94]	[Reference]
	Prognosis	Critical care (n = 7)	Sensitivity	0.73	[0.49, 0.88]	0.255
		Critical care (n = 7)	Specificity	0.88	[0.73, 0.95]	0.689
		Mortality (n = 12)	Sensitivity	0.81	[0.73, 0.87]	[Reference]
		Mortality (n = 12)	Specificity	0.90	[0.79, 0.96]	[Reference]
Sample related	Data type	Lab data included (n = 24)	Sensitivity	0.88	[0.83, 0.92]	0.154
		Lab data included (n = 24)	Specificity	0.86	[0.76, 0.93]	0.754
		Lab data not included (n = 10)	Sensitivity	0.80	[0.66, 0.90]	[Reference]
		Lab data not included (n = 10)	Specificity	0.87	[0.80, 0.92]	[Reference]
	Geographic area	Western (n = 25)	Sensitivity	0.86	[0.79, 0.90]	0.650
		Western (n = 25)	Specificity	0.83	[0.75, 0.88]	0.107
		Eastern (n = 8)	Sensitivity	0.88	[0.74, 0.95]	[Reference]
		Eastern (n = 8)	Specificity	0.93	[0.76, 0.98]	[Reference]
Machine learning related	AI techniques	Machine learning (n = 32)	Sensitivity	0.85	[0.79, 0.89]	0.090
		Machine learning (n = 32)	Specificity	0.86	[0.79, 0.91]	0.780
		Deep learning (n = 2)	Sensitivity	0.99	[0.32, 1.00]	[Reference]
		Deep learning (n = 2)	Specificity	0.86	[0.81, 0.90]	[Reference]
	Class imbalance processed	Yes (n = 12)	Sensitivity	0.74	[0.60, 0.84]	0.001
		Yes (n = 12)	Specificity	0.92	[0.83, 0.96]	0.076
		No (n = 22)	Sensitivity	0.90	[0.87, 0.93]	[Reference]
		No (n = 22)	Specificity	0.82	[0.72, 0.89]	[Reference]
	Feature selection	Yes (n = 9)	Sensitivity	0.88	[0.77, 0.95]	0.520
		Yes (n = 9)	Specificity	0.95	[0.83, 0.99]	0.022
		No (n = 25)	Sensitivity	0.85	[0.79, 0.90]	[Reference]
		No (n = 25)	Specificity	0.82	[0.74, 0.88]	[Reference]

Note: CI denotes confidence interval.

Fig. 5

Pooled sensitivity and specificity with 95% confidence interval for different covariates.

Summary estimates for sensitivity and specificity with covariates, Note: CI denotes confidence interval. Note: CI denotes confidence interval. Pooled sensitivity and specificity with 95% confidence interval for different covariates. The 24 models used laboratory data with/without other data (e.g., demographic information, symptoms, vital signs, or history) as features achieved a higher sensitivity (0.88; 95% CI, 0.83–0.92 vs. 0.80; 95% CI, 0.66–0.90), but didn’t reach statistical significance (p = 0.154), and a near-tie specificity (0.86; 95% CI, 0.76–0.93 vs. 0.87; 95% CI, 0.80–0.92) than models incorporating data other than laboratory test results (e.g., demographic information, symptoms, vital signs, or history) as depicted in Table 4 and Fig. 5 (c). The models that used patients from western contexts, as demonstrated in Table 3 and Fig. 5 (d), achieved a lower sensitivity (0.86; 95% CI, 0.79–0.90 vs. 0.88; 95% CI, 0.74–0.95) and specificity (0.83; 95% CI, 0.75–0.88 vs. 0.93; 95% CI, 0.76–0.98) than models using patients from the eastern contexts, but neither reached statistical significance (p = 0.650 and p = 0.107). The 2 models that adopted deep learning techniques had a higher sensitivity (0.99; 95% CI, 0.32–1.00 vs. 0.85; 95% CI, 0.79–0.89) and a tied specificity (0.86; 95% CI, 0.81–0.90 vs. 0.86; 95% CI, 0.79–0.91) than the remaining 32 models that adopted machine learning techniques, but both did not reach statistical significance (p = 0.090 and p = 0.780), as illustrated in Table 4 and Fig. 5 (e). The 12 models that adopted extra strategies, as depicted in Table 4 and Fig. 5 (f), to deal with class imbalance had a lower sensitivity (p = 0.001) than models without extra strategies for handling class imbalance (0.74; 95% CI, 0.60–0.84 vs. 0.90; 95% CI, 0.87–0.93). The specificity of the models that adopted extra strategies to deal with class imbalance was however higher than the remaining models without extra strategies for handling class imbalance (0.92; 95% CI, 0.83–0.96 vs. 0.82; 95% CI, 0.72–0.89), but no statistical difference was established (p = 0.076). Finally, the 9 models that leveraged feature selection strategies before building predictive models had a higher sensitivity than models without employing feature selection strategies (0.88; 95% CI, 0.77–0.95 vs. 0.85; 95% CI, 0.79–0.90), but no statistical significance was confirmed (p = 0.520). However, the models that leveraged feature selection strategies showed a significant higher specificity (p = 0.022) compared to models without feature selection strategies (0.95; 95% CI, 0.83–0.99 vs. 0.82; 95% CI, 0.74–0.88), as shown in Table 4 and Fig. 5 (g).

Discussion

The global COVID-19 pandemic is a growing public health concern requiring unprecedented efforts in nearly every field of endeavor. Effective coping strategies for this disease are however still under development or of nascent consideration. Machine learning has the potential to play a key role in this fight against the COVID-19 pandemic. However, there has been a lack of meta-analysis studies that are focused on the diagnostic accuracy of COVID-19 casework based on non-image data. Based on such an understanding, our study investigated the performance of machine learning approaches based on non-image data for predicting COVID-19 incidence, undertaking a bivariate meta-analysis. The results demonstrate strong diagnostic performance with a pooled sensitivity of 0.86, a pooled specificity of 0.86, and an AUC of 0.91, respectively. Prior meta-analysis [42] shown the pooled sensitivity and specificity of artificial intelligence for CT scan was 0.90 and 0.91, respectively which is higher than those of artificial intelligence based on non-image data. Nonetheless, non-image data are often far more obtainable than image data among those hospitals with limited material resources. The purpose of predictive models, type of prognosis, feature type, geographic area, type of AI techniques, whether class imbalance issues were dealt with, and where extra feature selection strategies were implemented, were further included in bivariate meta-regression. This was done to account for potential instances of heterogeneity among the primary studies made. The findings demonstrated that sensitivity was significantly dependent on the purpose of predictive models and upon whether class imbalance issues were handled. It may be noted that specificity was significantly dependent on whether the extra-strategies technique was used to select features before training predictive models. In terms of predictive models purposefulness, the sensitivity of diagnostic models is significantly higher than that of models for prognosis (0.92 vs. 0.79) while specificity of models for diagnosis is lower than models for prognosis (0.80 vs. 0.90). There was however no significant difference between the specificity of models for diagnosis and prognosis. It may be reasoned that models for diagnosis had a higher sensitivity may be because infection by COVID-19 can be confirmed by a comparison of the rRT-PCR testing results [60]. The prognosis of the COVID-19 disease however is more complicated since it relates to various factors such as age, gender, obesity, comorbidities, or time of anti-viral treatment [61], [62]. For example, there is prior evidence [62] that COVID-19 patients experiencing diabetes and hypertension or some other comorbidities such as cardiovascular disease, chronic obstructive pulmonary disease, and cancer are more likely to have adverse outcomes. Further, the time span between onset of severe outcomes and anti-viral treatment application for COVID-19 are major factors influencing the prognosis [62]. We further analyzed the 19 models for predicting two types of prognoses: critical care and mortality. The pooled sensitivity and specificity is 0.73 and 0.88 when the models are used to predict critical care regimens for COVID-19 patients. These two figures are lower than those of models used for the prediction of patient risk-of-mortality, but no statistically significant difference was confirmed. The plausible reason that mortality predictive models reached a higher sensitivity and specificity than did critical care models may be due in large part to the fact that the situations of patients close to mortality are simpler to explain than those of critical care patients. This situation includes various patient situations, such as being transferred to intensive care units or using ventilation apparatus, that are readily apparent. Especially during the apex of the epidemic, the number of patients exceeded the service capacity of most primary healthcare facilities; as such, the criteria for defining critically ill patients would thus be different from the more stable pandemic contagion periods. Furthermore, the variant of the SARS-CoV-2 virus continues to mutate [63] making it difficult to estimate its exact impact on current patient-loads. Available evidence [64] suggests that the incidence of SARS-CoV-2 should be closely monitored since patients from different locations have already shown different mutated COVID-19 sequences. In addition to rRT-PCR test, laboratory data (e.g., transaminases, lymphocytes, eosinophils, calcium, and separate aminotransferase) with/without other demographic and clinical data (e.g., symptoms, vital signs, or medical histories) predicted COVID-19 in 24 models, while the remaining models used only demographic/clinical data as features for similarly predicting COVID-19. Our meta-analysis showed that models that included laboratory data performed better than with models without laboratory data. This included heightened sensitivity (0.88 vs. 0.80). Models without any laboratory data included slightly out-performed models with laboratory data included in terms of specificity (0.87 vs. 0.86). Both sensitivity and specificity however did not reach any real statistical significance. Previous studies [65], [66], [67] found that laboratory data can provide useful information for COVID-19 diagnostics. For example, prior evidence [67] has found that the platelet count can dynamically reflect patho-physiological changes prevalent in COVID-19 patients. Other evidence [68] however found that some laboratory results involved with COVID-19 patients are different between pregnant women, children, and other members of a general population. Including and testing a wider variety of laboratory data may be required to achieve a more stable predictive platform when dealing with COVID-19 as a whole. Further, before the widely acknowledged rRT-PCR test [69] becomes universal, other data such as laboratory tests that have shown potential should be identified in order to effectively widen the prediction of COVID-19 incidence. It would be more helpful if machine learning models can incorporate routinely available laboratory tests to correctly predict COVID-19, which would streamline the diagnosis and treatment of COVID-19 patients, saving considerable time and decision-making. Since the first case of COVID-19 was reported in Wuhan, China and then the rest of the world, knowledge about COVID-19 has altered and expanded to a certain extent. Hence, the question remains of whether it is possible that the performance of predictive models based on eastern countries may be different from western-based predictive models through a variety of circumstances (i.e., transparency of research procedures, availability of data, reliability of findings, geo-political considerations). We therefore conducted a sub-group analysis based on samples from different geographic areas to make such a determination. The sub-group analysis showed that models using samples from the western contexts had a lower sensitivity (0.86 vs. 0.88) and specificity (0.83 vs. 0.93) than models using samples taken from eastern contexts. The plausible reason for this result is complex; so, we suspect it may be due to the algorithms adopted by these models. In the eastern category, eight models adopted only two major types of algorithms, including ensemble learning and deep learning. With appropriate configurations, these two types of machine learning models are generally considered to have a better predictive performance when compared with other algorithms [70], [71]. On the other hand, the western group, consisting of 25 models, applied seven different types of algorithms in the mix. Such a variety of different algorithms may thus contribute to a higher variation status in the models’ predictive performance, which may explain why the pooled sensitivity and collective specificity of the western group appeared lower than that of the eastern group. The pooled sensitivity and specificity are (0.85, 0.86) and (0.99, 0.86) respectively when machine learning and deep learning techniques were in use. Deep learning outperformed machine-learning in terms of sensitivity, but it tied with machine-learning in specificity. There was however no significant difference between these two techniques shown. Despite the sensitivity of deep learning being quite high (0.99) in our study, its 95% confidence interval however is also very wide (0.32–1), indicating the sample sizes were too small, which is just the case in our study (n = 2 for deep-learning). Still more deep learning studies are required to verify if its true performance in predicting the COVID-19 is based on non-image data. In regards to classification tasks, the receiver operator characteristic (ROC) plot and the AUROC delineate how an adaptable threshold causes changes in two types of errors: false-positives and false-negatives [72]. However, the ROC curve and AUROC are only partially informative whenever used with imbalanced data [72]. The explainability, traceability, and interpretability of performance measures will have greater future importance in dealing with imbalanced data. Hence, problems relevant to class imbalance are often dealt with by use of various strategies such as with a synthetic minority over-sampling technique [73]. Our study demonstrates that the pooled sensitivity for models without extra-strategies for class imbalance is significantly higher than that of models with extra-strategies for class imbalance (0.90 vs. 0.74). This may be compared to the pooled specificity for models with extra-strategies being higher than that of models without extra-strategies for class imbalance (0.92 vs. 0.82), but it did not show a statistically significant difference. Ramezankhani, Pournik, Shahrabi, Azizi, Hadaegh and Khalili [74] adopted an over-sampling strategy to deal with class imbalance problems for predicting type 2 diabetes. It was found that the original training dataset had a higher sensitivity, and a lower specificity, than a balanced-training dataset, therefore indicating such a strategy does not guarantee for better performance. Our study however showed a lower sensitivity and a higher specificity which may be due to the re-sampling strategies propounded by Ramezankhani, Pournik, Shahrabi, Azizi, Hadaegh and Khalili [74] and applied only in the training dataset. The performance data that our study collected was mainly adapted from the test dataset, and that may indicate class-imbalance handling strategies cannot necessarily guarantee the overall performance of test dataset. In order to enhance the performance of predictive models, pre-processing such as a feature-selection aspect can be adopted before training machine learning models [10]. Our study showed that 9 models with feature-selection had a higher sensitivity (0.88 vs. 0.85) and specificity (0.95 vs. 0.82) than did 25 models that were without feature-selection, but only specificity reached any statistical significance. Based on the findings of our meta-analysis, the importance of feature-selection should not be overlooked during preliminary model-building processes. The findings based on our study identified some current gaps in the state-of-the-art and in future research challenges. First, despite more studies based on machine learning leveraging non-image-specific data for the prediction of COVID-19, the number of studies remain less than those studies utilizing image data. More studies are thus required to better investigate the potential of non-image data for predicting COVID-19. Second, the paucity of studies using deep learning techniques for non-image data is another plausible issue since deep learning is considered better-suited for use with image data than other machine learning techniques. Future studies can be used to leverage various deep learning techniques for predicting COVID-19 based on non-image data. Third, the diagnostic accuracy achieved by machine learning based on non-image data is still lower than that of diagnostic accuracy based on image data. Future studies may combine image and non-image data to establish a sufficient model that can achieve better diagnostic performance. Our findings may have important implications for the medical practice as well. First, hospitals that are short of material and staff resources can adopt these machine learning models based on non-image data routinely available to assist those who are identifying possible COVID-19 patients. By doing so, the contact risk of COVID-19 infection due to a lack of rRT-PCR or CT testing measures may be diminished. Second, developers of machine learning models can consider adopting strategies for feature-selection and class-imbalance features during model-building planning and formulation. By doing so, a predictive model with better performance, to support informed decision-making by healthcare professionals, may be established. Further, models based on machine learning techniques may be applied to predict other epidemics and/or diseases in future times. To achieve this purpose, specific features can be carefully selected based on the specific pandemic/disease, and then different types of machine learning algorithms can be compared. In this way, the best performed algorithm can be determined based on their demonstrated learning capabilities.

Conclusions

Our study aims to meta-analyze the accuracy of diagnostic tests of artificial intelligence techniques to confront the COVID-19 pandemic. By searching multiple electronic databases, 30 studies including 34 predictive models were included in this meta-analysis. A bivariate meta-analysis of diagnostic test accuracy was conducted to estimate sensitivity, specificity, and summary receiver operating characteristic curve. Strong diagnostic performances were obtained with the models used in this study. These findings may indicate that machine learning models that use non-image data can be implemented in hospital settings, especially in diminished-resource locations, in order to effectively predict the incidence or prevalence of COVID-19. These models show the potential of becoming more accurate and further representative as data sets increase in terms of their size. Furthermore, covariates including diagnosis purpose, whether class-imbalance issues are processed, and whether extra-feature selection strategies being adopted were found to partially explain some of the heterogeneity found among the primary studies evaluated.

Summary points

What was already known on the topic? COVID-19 has had a serious impact on human lives and upon economic livelihoods; however, a quick and feasible tool for detecting COVID-19 incidence remains elusive. Real-time reverse transcription polymerase chain reaction is currently the “gold standard” for diagnosing COVID-19, but it requires a longer turn-around time in terms of efficacy. Pulmonary computed-tomography scan and chest radiography can be used to complement the practical diagnosis COVID-19. What this study added to our knowledge? Strong diagnostic test accuracy of COVID-19 can be achieved by using non-image data. Non-image data, taken as predictive features, can assist hospitals with limited financial and human resources to identify cases of COVID-19. Class-imbalance and feature-selection strategies may be considered before building predictive models useful for diagnosing COVID-19.

63 in total

1. Critical Supply Shortages - The Need for Ventilators and Personal Protective Equipment during the Covid-19 Pandemic.

Authors: Megan L Ranney; Valerie Griffeth; Ashish K Jha
Journal: N Engl J Med Date: 2020-03-25 Impact factor: 91.245

2. Covid-19: identifying and isolating asymptomatic people helped eliminate virus in Italian village.

Authors: Michael Day
Journal: BMJ Date: 2020-03-23

3. Machine learning predicts mortality in septic patients using only routinely available ABG variables: a multi-centre evaluation.

Authors: Bernhard Wernly; Behrooz Mamandipoor; Philipp Baldia; Christian Jung; Venet Osmani
Journal: Int J Med Inform Date: 2020-10-24 Impact factor: 4.046

Review 4. Chest CT in COVID-19 pneumonia: A review of current knowledge.

Authors: C Jalaber; T Lapotre; T Morcet-Delattre; F Ribet; S Jouneau; M Lederlin
Journal: Diagn Interv Imaging Date: 2020-06-11 Impact factor: 4.026

5. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China.

Authors: Chaolin Huang; Yeming Wang; Xingwang Li; Lili Ren; Jianping Zhao; Yi Hu; Li Zhang; Guohui Fan; Jiuyang Xu; Xiaoying Gu; Zhenshun Cheng; Ting Yu; Jiaan Xia; Yuan Wei; Wenjuan Wu; Xuelei Xie; Wen Yin; Hui Li; Min Liu; Yan Xiao; Hong Gao; Li Guo; Jungang Xie; Guangfa Wang; Rongmeng Jiang; Zhancheng Gao; Qi Jin; Jianwei Wang; Bin Cao
Journal: Lancet Date: 2020-01-24 Impact factor: 79.321

6. A Machine Learning Prediction Model of Respiratory Failure Within 48 Hours of Patient Admission for COVID-19: Model Development and Validation.

Authors: Douglas Barnaby; Theodoros P Zanos; Siavash Bolourani; Max Brenner; Ping Wang; Thomas McGinn; Jamie S Hirsch
Journal: J Med Internet Res Date: 2021-02-10 Impact factor: 5.428

7. Clinical and inflammatory features based machine learning model for fatal risk prediction of hospitalized COVID-19 patients: results from a retrospective cohort study.

Authors: Xin Guan; Bo Zhang; Ming Fu; Mengying Li; Xu Yuan; Yaowu Zhu; Jing Peng; Huan Guo; Yanjun Lu
Journal: Ann Med Date: 2021-12 Impact factor: 4.709

8. Machine learning methods to predict mechanical ventilation and mortality in patients with COVID-19.

Authors: Limin Yu; Alexandra Halalau; Bhavinkumar Dalal; Amr E Abbas; Felicia Ivascu; Mitual Amin; Girish B Nair
Journal: PLoS One Date: 2021-04-01 Impact factor: 3.240

9. A predictive tool for identification of SARS-CoV-2 PCR-negative emergency department patients using routine test results.

Authors: Rohan P Joshi; Vikas Pejaver; Noah E Hammarlund; Heungsup Sung; Seong Kyu Lee; Al'ona Furmanchuk; Hye-Young Lee; Gregory Scott; Saurabh Gombar; Nigam Shah; Sam Shen; Anna Nassiri; Daniel Schneider; Faraz S Ahmad; David Liebovitz; Abel Kho; Sean Mooney; Benjamin A Pinsky; Niaz Banaei
Journal: J Clin Virol Date: 2020-06-10 Impact factor: 3.168

2 in total

1. Looseness Identification of Track Fasteners Based on Ultra-Weak FBG Sensing Technology and Convolutional Autoencoder Network.

Authors: Sheng Li; Liang Jin; Jinpeng Jiang; Honghai Wang; Qiuming Nan; Lizhi Sun
Journal: Sensors (Basel) Date: 2022-07-28 Impact factor: 3.847

2. An Intelligent Sensor Based Decision Support System for Diagnosing Pulmonary Ailment through Standardized Chest X-ray Scans.

Authors: Shivani Batra; Harsh Sharma; Wadii Boulila; Vaishali Arya; Prakash Srivastava; Mohammad Zubair Khan; Moez Krichen
Journal: Sensors (Basel) Date: 2022-10-02 Impact factor: 3.847

2 in total