Literature DB >> 32652499

Use of Machine Learning and Artificial Intelligence to predict SARS-CoV-2 infection from Full Blood Counts in a population.

Abhirup Banerjee¹, Surajit Ray², Bart Vorselaars³, Joanne Kitson⁴, Michail Mamalakis⁵, Simonne Weeks⁶, Mark Baker⁷, Louise S Mackenzie⁸.

Abstract

Since December 2019 the novel coronavirus SARS-CoV-2 has been identified as the cause of the pandemic COVID-19. Early symptoms overlap with other common conditions such as common cold and Influenza, making early screening and diagnosis are crucial goals for health practitioners. The aim of the study was to use machine learning (ML), an artificial neural network (ANN) and a simple statistical test to identify SARS-CoV-2 positive patients from full blood counts without knowledge of symptoms or history of the individuals. The dataset included in the analysis and training contains anonymized full blood counts results from patients seen at the Hospital Israelita Albert Einstein, at São Paulo, Brazil, and who had samples collected to perform the SARS-CoV-2 rt-PCR test during a visit to the hospital. Patient data was anonymised by the hospital, clinical data was standardized to have a mean of zero and a unit standard deviation. This data was made public with the aim to allow researchers to develop ways to enable the hospital to rapidly predict and potentially identify SARS-CoV-2 positive patients. We find that with full blood counts random forest, shallow learning and a flexible ANN model predict SARS-CoV-2 patients with high accuracy between populations on regular wards (AUC = 94-95%) and those not admitted to hospital or in the community (AUC = 80-86%). Here, AUC is the Area Under the receiver operating characteristics Curve and a measure for model performance. Moreover, a simple linear combination of 4 blood counts can be used to have an AUC of 85% for patients within the community. The normalised data of different blood parameters from SARS-CoV-2 positive patients exhibit a decrease in platelets, leukocytes, eosinophils, basophils and lymphocytes, and an increase in monocytes. SARS-CoV-2 positive patients exhibit a characteristic immune response profile pattern and changes in different parameters measured in the full blood count that are detected from simple and rapid blood tests. While symptoms at an early stage of infection are known to overlap with other common conditions, parameters of the full blood counts can be analysed to distinguish the viral type at an earlier stage than current rt-PCR tests for SARS-CoV-2 allow at present. This new methodology has potential to greatly improve initial screening for patients where PCR based diagnostic tools are limited.

Entities: Disease Gene Species

Keywords: Artificial Neural Network (ANN); Full blood count; Leukocytes; Machine Learning; Monocytes; SARS-CoV-2; Screening

Mesh：

Year: 2020 PMID： 32652499 PMCID： PMC7296324 DOI： 10.1016/j.intimp.2020.106705

Source DB: PubMed Journal: Int Immunopharmacol ISSN： 1567-5769 Impact factor: 4.932

Introduction

The World Health Organization (WHO) characterized the COVID-19 pandemic on 11th March 2020 [1]. The symptoms of COVID-19 induced by the novel pathogen SARS-CoV-2, are difficult to differentiate from other common infections in a large proportion of those infected. It is estimated that about 40% of cases will experience mild disease (cough, fever), 40% experience moderate disease (bilateral pneumonia), 15% develop severe disease and 5% will have critical disease [2]. A recommendation by the WHO has been to conduct early screening of patients to identify, isolate and track those infected and prevent transmission [2]. Reverse transcription Polymerase Chain Reaction (rt-PCR) based methodologies have been the gold standard in confirming that the individual presenting with COVID-19 has active viral shedding of SARS-CoV-2 [3], [4]. However, the ability to conduct wide scale testing of patients has been limited by a number of factors including suitable resources for rt-PCR based testing for the presence of SARS-CoV-2. In addition, the standard test used has an 80% accuracy (compared to chest CT scan results) [5], which may depend on the specific level of viral shedding by any individual at the time of sample test. Moreover, the time from sample collection to informing the patient is estimated to take many hours to several days according to the systems in place. These complex issues hand in hand with the wide-ranging symptoms presenting in patients and the discrepant results between chest CT, symptoms and rt-PCR results [5], indicates that testing for the direct presence of virus requires significant improvement. While highly specific tests for SARS-CoV-2 are under development using CRISPR [6] and Biosensors [7], [8], these innovative applications will require highly specialised equipment and resources. This will negatively affect less affluent areas that will be unable to access these technologies in the short time frame in order to contain the pandemic. Therefore, there is a global challenge to enable screening without the need for sophisticated equipment and resources. Machine learning (ML) and artificial intelligence (AI) approaches are rapidly being developed to aid clinical procedures in the current pandemic, such as predicting the specificity of new therapies [9] and diagnosing COVID 19 patients from radiographic patterns on CT scans [10]. In the current study we focused on predicting whether a person is SARS-CoV-2 positive or negative in the early stage of the disease. The approach taken is based on the reported limitation to conduct rt- PCR tests and the need to quickly differentiate between individuals presenting with similar symptoms as seen between COVID 19 and other common infections. The dataset used here comes from a public challenge by mindstream-ai [11] to use AI to predict the test result for SARS-CoV-2 (positive/negative) solely from full blood counts. The Hospital Israelita Albert Einstein is located in the state of Sao Paulo, Brazil, with a population of 12 million people had 477 confirmed cases of Covid 19 and 30 associated deaths by the 23rd March 2020. The hospital publicly released full blood counts [12] from 598 anonymised patients, along with the rt-PCR result for SARS-CoV-2 and their age quantile (symptoms or patient history were not released). Here we present evidence that patients having SARS-CoV-2 can be identified by random forest, ML and artificial neural networks (ANN) to patients not admitted to hospital (community) and to patients in a regular ward setting through recognition of the altered immune cell profile. These will allow for a rapid early screening purely based on the blood profile.

Methods

Patient dataset

All data processed in this study is published on a public forum [12], as part of a challenge to accelerate approaches to combat the spread of SARS-CoV-2. This dataset contains anonymized data from patients seen at the Hospital Israelita Albert Einstein, at São Paulo, Brazil, and who had samples collected to perform the SARS-CoV-2 rt-PCR and additional laboratory tests during a visit to the hospital. All data were anonymized following the best international practices and recommendations [11]. All clinical data were standardized to have a mean of zero and a unit standard deviation. Data provided included age (percentile group), outcome from rt-PCR SARS-CoV-2 test and standard full blood count: hematocrit, haemoglobin, platelets, mean platelet volume (MPV), red blood cells (RBC), lymphocytes, mean corpuscular haemoglobin concentration (MCHC), leukocytes, basophils, neutrophils, mean corpuscular haemoglobin (MCH), eosinophils, mean corpuscular volume (MCV), monocytes and red blood cell distribution width (RBCDW) [13]. The full dataset released included 5,644 individual patients tested between the 28th of March 2020 and 3rd of April 2020, of which 598 full blood count results were used for statistical analysis. The remaining 5046 results were not used due to lack of full blood count data. Without in-depth data on the timeline of the tests performed in the duration of the infection, analysis was performed on the basis of severity according to the location of the patient within the hospital system. The blood counts are from four classifications of patients: Community (patients not admitted to hospital), Regular Ward, Semi Intensive Unit and Intensive Care Unit (ICU) (Table 1 ).

Table 1

Sao Paulo Dataset groups; only datasets from patients with full blood counts and rt-PCR SARS-CoV-2 outcome included; Pathogen test conducted on 356 subset of the 598 tested for SARS-CoV-2.

Number of patients	Community	Regular Ward	Semi Intensive Unit	Intensive Care Unit ICU)	Total
SARS-CoV-2 negative	431(92%)	31(54%)	34(81%)	21(72%)	517(86%)
SARS-CoV-2 positive	39(8%)	26(46%)	8(19%)	8(28%)	81(14%)
Diagnosis of other pathogens (non SARS-CoV-2)	149(32%)	12(21%)	17(40%)	12(41%)	188(31%)

Sao Paulo Dataset groups; only datasets from patients with full blood counts and rt-PCR SARS-CoV-2 outcome included; Pathogen test conducted on 356 subset of the 598 tested for SARS-CoV-2. Patients that are in semi-intensive unit and ICU were excluded from our modelling, so as to ensure prediction is based on early indicators. Also neutrophils are not included, since this is not reported for all 598 patients. Furthermore, we exclude age from our modelling. There is a large imbalance of positive (8%) vs negative (92%) SARS-CoV-2 patients in the community. As a result it is more informative to test separately for the specificity (% of negative patients correctly identified as negative) and the sensitivity (% of positive patients correctly identified as positive) rather than solely the total accuracy. Of the 598 complete counts analysed, 367 patients had also been tested for other pathogens: Adenovirus, Bordetella pertussis, Chlamydophila pneumoniae, Coronavirus 229E, Coronavirus HKU1, Coronavirus NL63, Coronavirus OC43, Influenza A H1N1 2009, Influenza A, Influenza B, Metapneumovirus, Parainfluenza 1, Parainfluenza 2, Parainfluenza 3, Parainfluenza 4, Respiratory Syncytial Virus or Rhinovirus Enterovirus.

Model definitions: random forest and Lasso based regularized generalized linear models and artificial neural network

For our 2-class (SARS-CoV-2 positive vs negative) classification we employ several ML and ANN network models. For the ML models we apply random forest [14], [15], [16] and Lasso-elastic-net regularized generalized linear (glmnet) models for classification. Random forest as a classifier is based on several decision trees. To classify a new object, each decision tree provides a classification for input data and the random forest uses the mode of those classification to decide on the class. In this paper, glmnet, on the other hand, fits a traditional logistic model. But instead of using all the predictors, it uses a regularized path to select the most important variables and only use them in the logistic regression. For both these methods, we present results for 10-fold cross-validation and their corresponding variable importance plots [17]. Furthermore, an ANN [13] model is defined with 14 input parameters and three hidden layers and trained for 100 epochs. The classification performance of the ANN model is evaluated with stratified 10-fold cross validation.

Model performance measures

The performance of each model is expressed in terms of AUC, sensitivity, specificity and accuracy. They are defined as follows: sensitivity is the fraction of the SARS-CoV-2 positive patients correctly identified; specificity is the fraction of SARS-CoV-2 negative patients correctly identified as such; accuracy is the total number of patients correctly identified. By lowering the threshold of detecting SARS-CoV-2 positive patients, the sensitivity can increase at the expense of specificity. Hence we also look at the commonly employed AUC. This is the area under the receiver-operating characteristics curve; the curve when plotting sensitivity vs (1-specificity) upon changing the threshold. The AUC, also known as the Wilcoxon-Mann-Whitney statistic, is the probability that a SARS-CoV-2 positive patient is higher ranked than a SARS-CoV-2 negative patient. A higher AUC generally implies a better performing model. We note that the drawback of using accuracy alone is that, in an unbalanced set of mainly SARS-CoV-2 negative patients (as is the case in the community dataset), the accuracy can be high using zero sensitivity.

Results

Statistical analysis

Significant differences (p < 0.05) between 9 of the 15 blood count parameters were shown between patients in a regular ward setting who tested positive or negative to SARS-CoV-2 presence (Fig. 1 ). In order of importance (lower p) for the significant increased values for SARS-CoV-2 positive patients: MPV > RBC > lymphocytes > hematocrit > hemoglobin. The decreased values are eosinophils > leukocytes > platelets. For community-patients we found statistically significant differences (p < 0.05) in 6 blood count parameters; the increased values are monocytes > MPV, while leukocytes > platelets > neutrophils > eosinophils show a significant decrease for SARS-CoV-2 positive patients.

Fig. 1

Box plots showing median and 1st/3rd quartile of individual parameters of the full blood counts categorized by whether tested positive (red box) or negative (blue box) by the rt-PCR test for SARS-CoV-2 and by whether they remained in the community or were admitted in the regular ward. MPV; mean platelet volume, RBC; red blood cells, RBCDW; red blood cell distribution width. The p-values are tests of equality of population using the Wilcoxon rank sum test, where p < 0.05 implies statistically significant difference between the populations. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Modelling

Among the 598 patients with full blood count profiles, 57 were admitted in the regular ward (26 tested rt-PCR positive to SARS-CoV-2, and 31 negative). Furthermore, a total of 470 patients were not admitted to the hospital (39 tested positive for SARS-CoV-2 and 431 negative). We will report model predictions for both sets of patients. The defined ANN model for the regular ward patients produces an average classification accuracy of 90% over stratified 10-fold cross-validation. The receiver operating characteristic (ROC) curves for all 10 folds and the values of area under the curve (AUC) are presented in Fig. 2 a, along with the average AUC (0.95 ± 0.08). The normalized confusion matrix corresponding to fold 2 (the worst AUC value), is presented in Fig. 2b.

Fig. 2

a. The ROC curve of the defined ANN model over patients admitted to regular ward; b. The normalized confusion matrix, corresponding to fold 2 (corresponds to the worst value of AUC). ROC: Receiver operating characteristic; AUC: area under the curve. The defined ANN model for the community ward produces an average classification accuracy of 89% over stratified 10-fold cross-validation. The ROC curves for all 10-folds produce an average AUC of 0.77 ± 0.08, while the sensitivity and specificity indices are estimated as 0.28 and 0.95, respectively. Since the imbalance between two classes in the dataset (positive/negative = 0.09) degrades the performance of the defined ANN model, the Synthetic Minority Oversampling Technique (SMOTE) [18] is adapted for balancing the two classes in the training dataset. The stratified 10-fold cross-validation technique is again incorporated and it is repeated for 10 times. The average accuracy, AUC, sensitivity, and specificity are estimated as 0.87, 0.80, 0.43, and 0.91, respectively. The ROC curves and the values of AUC for one repetition of the stratified 10-fold cross-validation are presented in Fig. 3 a, along with the average AUC (0.80 ± 0.05). The normalized confusion matrix, corresponding to fold 8, is presented in Fig. 3b.

Fig. 3

a. The ROC curve of the defined ANN model over patients not admitted to the hospital (community); b. The normalized confusion matrix, corresponding to fold 8 (corresponds to the worst value of AUC). ROC: Receiver operating characteristic; AUC: area under the curve. The results from the implementation of random forest and glmnet on the 57 patients in the regular ward gives an average AUC of 94% over 10 fold classification, while for the patients in the community the AUC is 84–86%. The full array of sensitivity, specificity and accuracy values for optimal choices of cutoff for these two classifiers are given Table 2 and Table 3 . As compared to the ANN, both random forest and glmnet provide more insight into the most important variables (Fig. 4 (a) and (b)) and a clear indication of how the decision has been obtained. Additionally, glmnet does a variable selection, by providing a much smaller stable set of variables among 14 highly correlated predictors.

Table 2

Model predictions for patients in regular ward testing for SARS-CoV-2 positive.

Variables	Classifier	Sensitivity	Specificity	Accuracy	AUC
14 different blood counts	ANN	0.85	0.94	0.90	0.95
14 different blood counts	RF	0.82	0.90	0.91	0.94
14 different blood counts	glmnet	0.92	0.93	0.91	0.94
y = monocytes - leukocytes - eosinophils - platelets	LR	0.85	0.77	0.81	0.81
y = monocytes - leukocytes - eosinophils	LR	0.85	0.71	0.77	0.79
y = monocytes - leukocytes	LR	0.65	0.58	0.61	0.65

Table 3

Model predictions for patients not admitted to hospital (community) testing for SARS-CoV-2 positive.

Variables	Classifier	Sensitivity	Specificity	Accuracy	AUC
14 different blood counts	ANN	0.43	0.91	0.87	0.80
14 different blood counts	RF	0.60	0.88	0.82	0.86
14 different blood counts	glmnet	0.65	0.81	0.81	0.84
y = monocytes - leukocytes - eosinophils - platelets	LR	0.82	0.78	0.79	0.85
y = monocytes - leukocytes - eosinophils	LR	0.82	0.79	0.79	0.84
y = monocytes - leukocytes	LR	0.74	0.77	0.77	0.81

Fig. 4

Variable importance plot of (a) random forest and (b) glmnet classification of SARS-CoV-2 positive patients who are in the regular hospital ward. The plot shows the importance of variables in building the respective predictive model. MPV; mean platelet volume, RBC; red blood cells, MCHC; mean corpuscular haemoglobin concentration, MCH; mean corpuscular haemoglobin, MCV; mean corpuscular volume, RBCDW; red blood cell distribution width. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Model predictions for patients in regular ward testing for SARS-CoV-2 positive. Model predictions for patients not admitted to hospital (community) testing for SARS-CoV-2 positive. Variable importance plot of (a) random forest and (b) glmnet classification of SARS-CoV-2 positive patients who are in the regular hospital ward. The plot shows the importance of variables in building the respective predictive model. MPV; mean platelet volume, RBC; red blood cells, MCHC; mean corpuscular haemoglobin concentration, MCH; mean corpuscular haemoglobin, MCV; mean corpuscular volume, RBCDW; red blood cell distribution width. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) Analysis of the top four variables according to glmnet that corresponded to patients in community (Supplementary Fig. 1) and regular ward (Supplementary Fig. 2) indicated a clear recognisable pattern. In particular to note in patients in the regular ward is in the decreased pool of leukocytes overall, increase in red blood cells, and in particular a significant decrease in eosinophils. Community patients having SARS-CoV-2 have distinctively high levels of monocytes and low levels of leukocytes. In order to enable a rapid prediction model for clinics [12] where clinicians may want to choose only two (Fig. 5 a) three (Fig. 5b) or four parameters (Fig. 5c), the monocytes/leukocytes/eosinophils/platelets trends in Supplementary Figs. 1 and 2 and Fig. 1 were analysed by adding monocytes and subtracting leukocytes, eosinophils and platelets. This difference indicates little overlap between patients who test positive and negative for SARS-CoV-2, however it must be noted that this is a simple additive formula based on normalised data.

Fig. 5

Box plots of blood characteristics: a. monocytes - leukocytes (m-l) b. monocytes - leukocytes - eosinophils (m-l-e) and c. monocytes - leukocytes - eosinophils - platelets (m-l-e-p); all normalized values, categorized by whether tested negative (blue box) or positive (red box) to rt-PCR SARS-CoV-2 test, and whether they remained in the community or were admitted in the regular ward. All p-values are tests of equality of population using Wilcoxon rank sum test and suggest statistically significant difference. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) For example a simple logistic regression (LR) with derived variable y = monocytes - leukocytes - eosinophils - platelets, shows this blood characteristic can predict the SARS-CoV-2 test outcome with an average AUC = 85% over 10 fold cross-validation among the patients in the community, and AUC = 81% for patients in regular ward. The model predictions from ANN, random forest, glmnet and the formula: monocytes - leukocytes - eosinophils (and m-l-e-p) are summarized in Table 2 and Table 3. Of the patients testing for SARS-CoV-2 and full blood count tests, 366 patients were tested for other pathogens, of which 188 patients were diagnosed with other infections, to note Rhinovirus and Influenza B (Table 4 ). Collectively, 51% tested positive for: Respiratory Syncytial Virus, Influenza A, Influenza B, Parainfluenza 1, Coronavirus NL63, Rhinovirus Enterovirus, Coronavirus HKU1, Parainfluenza 3, Chlamydophila pneumonia, Adenovirus, Parainfluenza 4, Coronavirus 229E, Coronavirus OC43, Influenza A H1N1 2009, Bordetella pertussis, Metapneumovirus and Parainfluenza 2. The similar presentation of many of these infections to COVID19 may however be differentiated by the clear difference in immune response to SARS-CoV-2 using the ANN model of full blood count analysis.

Table 4

Pathogen	Community	Regular ward	Semi Intensive Unit	ICU	Total per pathogen
Adenovirus	0	0	0	0	0
Bordetella pertussis	18	1	1	1	21
Chlamydophila pneumoniae	1	1	1	0	3
Coronavirus 229E	1	0	0	0	1
Coronavirus HKU1	0	0	0	0	0
Coronavirus NL63	9	1	1	1	12
Coronavirus OC43	3	0	0	0	3
Influenza A H1N1 2009	2	0	0	0	2
Influenza A	3	0	0	0	3
Influenza B	20	2	2	1	25
Metapneumovirus	0	0	0	0	0
Parainfluenza 1	0	0	0	0	0
Parainfluenza 2	4	1	0	0	5
Parainfluenza 3	4	0	1	0	5
Parainfluenza 4	1	0	0	0	1
Respiratory Syncytial Virus	3	0	2	4	9
Rhinovirus Enterovirus	78	6	9	5	98
Total	147	12	17	12	188

Sao Paulo Dataset: table showing number of patients who tested positive for pathogens other than SARS-CoV-2, by patient admission (all patients had been tested for SARS-CoV-2 and had full blood count results). Of the 598 patients, only one tested positive for SARS-CoV-2 and for at least one other pathogen; that patient also tested positive for Influenza B and for Coronavirus NL63 and was in ICU. In addition to the changing profile of immune cells, a change in red blood cells and platelets were noted. In order to describe the profile of cells, the mean changes were plotted from patients in Regular Ward (Supplementary Fig. 3).

Discussion

We developed multiple independent models (statistical, random forest and shallow learning) that can predict SARS-CoV-2 with an AUC of up to 86% for community and 95% for regular ward patients, using only data collected from their normalized full blood counts. This provides an initial screen of SARS-CoV-2 positive from negative using biomarkers at an early stage in the disease presentation. This screen has been conducted on a set of data based on severity judged by the location of the patient in hospital (admitted to the regular ward compared to not admitted to hospital; ICU patients were excluded). Hence the models are able to distinguish from altered blood profiles in patients who were later diagnosed with other pathogens. It is well recognised that the symptoms of COVID-19 are accompanied by a significant change in immune response [19], with a decreased population of leukocytes, lymphocytes [20], [21] and eosinophils [19], [20], [21] found throughout all stages of infection. Indeed it was suggested in one case report from Wuhan that eosinopenia together with lymphopenia may be a potential indicator for diagnosis [22]. Other early reports used similar predictive patterns found in Full Blood Count parameters, suggesting that elevated neutrophil to lymphocyte ratio could be used as part of the diagnosis [23]. In that study they investigated the change in blood parameters in a total of 93 patients (severe and non severe collected together), using commonly used ratios used in the diagnosis of viral respiratory diseases such as: neutrophil-to-lymphocyte ratio, derived NLR ratio (d-NLR, neutrophil count divided by the result of WBC count minus neutrophil count), platelet-to-lymphocyte ratio and lymphocyte-to-monocyte ratio. However, these parameters commonly alter in respect to other viral infections, and we initiated research to identify new ratios to distinguish SARS-CoV-2 from other pathogens. Here we show that a new simple calculation based on leukocytes, monocytes, eosinophils and platelets (normalized monocytes - leukocytes - eosinophils - platelets) can be used to predict with 85% AUC the presence of SARS-CoV-2 for early-stage community patients. Further validation will be required to determine whether our model can distinguish fully from other pathogens, although initial small numbers indicate a trend that is positive. Leukocytes are a family of white blood cells that includes neutrophils, lymphocytes, monocytes, eosinophils and basophils. Specific pathogens induce specific responses, such as an increase in circulating neutrophils and monocytes typically increase during sepsis as part of the immune defence mechanism, and several studies have indicated that ratios between different blood cells can be used to predict outcome. The ratio of monocyte: neutrophil can be used to determine sepsis severity [24], and platelet: monocyte aggregation has been reported to be increased in patients with Influenza A (H1N1) [25]. Indeed, the clinical interest in using the relationship between different parameters of the full blood counts is of great value and interest due to its simplicity and readily measurable parameters. While not yet in common clinical use, it has been shown that neutrophil: lymphocyte and mean platelet volume: platelet count could predict the outcome for critically ill patients with peritonitis and pancreatitis (bacteremia) [26]. Our results presented here clearly demonstrates that an early prediction of those infected with SARS-CoV-2 may be made by the relationship between monocytes, eosinophils and leukocytes to differentiate them from other pathogen induced infections. The data analysed in this report focussed on the clear early stages in order to differentiate between patients presenting with common symptoms prior to the need to be in ICU. The changes in parameters of the full blood counts is easily distinguishable from other pathogen-induced infections, with an apparent decrease in leukocyte population consisting of a large proportion of monocytes. This observation is in keeping with other tools used to diagnose the severity of COVID-19 by the measurement of IL6 which significantly increases throughout the progression of disease. The cytokine IL6 is predominantly synthesised by monocytes and macrophage (which derive from monocytes), and is partly responsible for the drive of the immune response to deleterious cytokine storm. In order to best understand the immune response to SARS-CoV-2 it is useful to compare it to other similar coronaviruses. There are two main example: closest known example is that of SARS-CoV virus (SARS) which infected 8,096 people and killed 774 in 2002–2003, and Middle East Respiratory syndrome coronavirus (MERS-CoV) which in 2013 infected 2,102 people and killed 780. The majority of all other coronaviruses infect the upper respiratory tract and cause mild respiratory and gastrointestinal infections. In contrast, the highly infectious and highly pathogenic coronaviruses SARS and MERS were noted to have similar effects on the immune response, and more recently overlaps with the pathogenesis now witnessed with SARS-CoV-2 infections. The SARS virus directly infects human monocytes, which then produces the cytokines that attract neutrophils, macrophage and activated T lymphocytes [27]; MERS was shown to increase monocytes and their IL6 production [28]. In parallel, this study indicates that the pathogenesis to SARS-CoV-2 may be linked to monocytes and the production of IL6. Here in our study, the analysis of monocyte involvement will be crucial in the prediction of SARS-CoV-2 infection. Additionally, use of AI and ML to recognise the altered pattern of key blood parameters will be a useful tool in future with the emergence of any future coronavirus that is equally pathogenic and contagious. The decrease in platelets in patients testing positive for SARS-CoV-2 in the regular ward is opposite to that observed in separate reports of patients with Influenza A [29]. A subset of the patients included in this analysis were tested for other pathogens, and a proportion of those negative for SARS-CoV-2 were positive for other pathogens including Coronaviruses (NL63, HKU1 and 229E) and few had Influenza A or B. This suggests that platelet count is a good indicator in this predictive model, and may be a good way to differentiate SARS-CoV-2 infection from Influenza A. The suggests a potential decreased platelet presence in patients, which is of concern due to its link to thrombocytopenia and increased internal bleeding. This supports other reports of an increase in thrombocytopenia being associated with higher mortality in COVID-19 patients [21]. The condition of thrombocytopenia may be the reason for the recently noted rashes observed in patients, especially young children [30]. However there have been reports that clotting is increased in COVID-19 patients, and the data we show is normalised and may be misleading. This normalised data also indicates an increase in platelet size (MPV), which suggests that there is rapid platelet production by the bone marrow. Indeed, it is recommended that COVID-19 patients are administered antiplatelet therapy to protect against thrombosis [31]. Overall this highlights the difficulty of interpreting mean normalised data. This report is the first to use primary patient data of full blood counts to test and train an ANN to predict from patients in a regular ward as well as those in the community who will test positive for SARS-CoV-2. This preliminary model will be further trained and adapted with the aim to address the shortfall in direct SARS-CoV-2 testing methods in hospitals. This will enable a prediction that allows health care providers to conduct rapid cheap screening to separate patients into those who are most likely to have SARS-CoV-2 and those who do not. Early screening allows segregation of patients and early treatment intervention.

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

34 in total

Review 1. Artificial intelligence-based approaches for COVID-19 patient management.

Authors: Lan Lan; Wenbo Sun; Dan Xu; Minhua Yu; Feng Xiao; Huijuan Hu; Haibo Xu; Xinghuan Wang
Journal: Intell Med Date: 2021-06-10

2. Diagnosing Coronavirus Disease 2019 (COVID-19): Efficient Harris Hawks-Inspired Fuzzy K-Nearest Neighbor Prediction Methods.

Authors: Hua Ye; Peiliang Wu; Tianru Zhu; Zhongxiang Xiao; Xie Zhang; Long Zheng; Rongwei Zheng; Yangjie Sun; Weilong Zhou; Qinlei Fu; Xinxin Ye; Ali Chen; Shuang Zheng; Ali Asghar Heidari; Mingjing Wang; Jiandong Zhu; Huiling Chen; Jifa Li
Journal: IEEE Access Date: 2021-01-19 Impact factor: 3.367

3. A Systematic Review on the Use of AI and ML for Fighting the COVID-19 Pandemic.

Authors: Muhammad Nazrul Islam; Toki Tahmid Inan; Suzzana Rafi; Syeda Sabrina Akter; Iqbal H Sarker; A K M Najmul Islam
Journal: IEEE Trans Artif Intell Date: 2021-03-01

4. Deep Generative Learning-Based 1-SVM Detectors for Unsupervised COVID-19 Infection Detection Using Blood Tests.

Authors: Abdelkader Dairi; Fouzi Harrou; Ying Sun
Journal: IEEE Trans Instrum Meas Date: 2021-11-25 Impact factor: 5.332

Review 5. Artificial intelligence for forecasting and diagnosing COVID-19 pandemic: A focused review.

Authors: Carmela Comito; Clara Pizzuti
Journal: Artif Intell Med Date: 2022-03-28 Impact factor: 7.011

6. Diagnosis and Prognosis of COVID-19 Disease Using Routine Blood Values and LogNNet Neural Network.

Authors: Mehmet Tahir Huyut; Andrei Velichko
Journal: Sensors (Basel) Date: 2022-06-25 Impact factor: 3.847

Review 7. Applications of artificial intelligence in battling against covid-19: A literature review.

Authors: Mohammad-H Tayarani N
Journal: Chaos Solitons Fractals Date: 2020-10-03 Impact factor: 5.944

8. Identification of potential antiviral compounds against SARS-CoV-2 structural and non structural protein targets: A pharmacoinformatics study of the CAS COVID-19 dataset.

Authors: Rolando García; Anas Hussain; Prasad Koduru; Murat Atis; Kathleen Wilson; Jason Y Park; Inimary Toby; Kimberly Diwa; Lavang Vu; Samuel Ho; Fajar Adnan; Ashley Nguyen; Andrew Cox; Timothy Kirtek; Patricia García; Yanhui Li; Heather Jones; Guanglu Shi; Allen Green; David Rosenbaum
Journal: Comput Biol Med Date: 2021-04-19 Impact factor: 6.698

9. SMOTE-NC and gradient boosting imputation based random forest classifier for predicting severity level of covid-19 patients with blood samples.

Authors: Elif Ceren Gök; Mehmet Onur Olgun
Journal: Neural Comput Appl Date: 2021-06-11 Impact factor: 5.606

10. Machine learning application for the prediction of SARS-CoV-2 infection using blood tests and chest radiograph.

Authors: Richard Du; Efstratios D Tsougenis; Joshua W K Ho; Joyce K Y Chan; Keith W H Chiu; Benjamin X H Fang; Ming Yen Ng; Siu-Ting Leung; Christine S Y Lo; Ho-Yuen F Wong; Hiu-Yin S Lam; Long-Fung J Chiu; Tiffany Y So; Ka Tak Wong; Yiu Chung I Wong; Kevin Yu; Yiu-Cheong Yeung; Thomas Chik; Joanna W K Pang; Abraham Ka-Chung Wai; Michael D Kuo; Tina P W Lam; Pek-Lan Khong; Ngai-Tseung Cheung; Varut Vardhanabhuti
Journal: Sci Rep Date: 2021-07-09 Impact factor: 4.379