| Literature DB >> 34031483 |
Matjaž Kukar1,2, Gregor Gunčar1,3, Tomaž Vovko4, Simon Podnar5, Peter Černelč6, Miran Brvar7, Mateja Zalaznik4, Mateja Notar1, Sašo Moškon1, Marko Notar8.
Abstract
Physicians taking care of patients with COVID-19 have described different changes in routine blood parameters. However, these changes hinder them from performing COVID-19 diagnoses. We constructed a machine learning model for COVID-19 diagnosis that was based and cross-validated on the routine blood tests of 5333 patients with various bacterial and viral infections, and 160 COVID-19-positive patients. We selected the operational ROC point at a sensitivity of 81.9% and a specificity of 97.9%. The cross-validated AUC was 0.97. The five most useful routine blood parameters for COVID-19 diagnosis according to the feature importance scoring of the XGBoost algorithm were: MCHC, eosinophil count, albumin, INR, and prothrombin activity percentage. t-SNE visualization showed that the blood parameters of the patients with a severe COVID-19 course are more like the parameters of a bacterial than a viral infection. The reported diagnostic accuracy is at least comparable and probably complementary to RT-PCR and chest CT studies. Patients with fever, cough, myalgia, and other symptoms can now have initial routine blood tests assessed by our diagnostic tool. All patients with a positive COVID-19 prediction would then undergo standard RT-PCR studies to confirm the diagnosis. We believe that our results represent a significant contribution to improvements in COVID-19 diagnosis.Entities:
Year: 2021 PMID: 34031483 PMCID: PMC8144373 DOI: 10.1038/s41598-021-90265-9
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1A flow chart of patients included in the model building and validation process.
Demographic features of included patient groups.
| Training group—COVID-19 | ||
|---|---|---|
| Negative | Positive | |
| Number | 5333 2971 viral infections 2362 bacterial infections | 160 38 with acute respiratory failure (ARF) 10 died (9 with ARF) |
| Age median | 57 | 55.5 |
| Female sex [number/%] | 2155/40% | 67/42% |
Figure 2Blood parameters sorted by their XGBoost importance score. More important parameters are shown on the left. Group median values and IQR of the blood parameters used in model building are shown, centered, and scaled to reference intervals. Median bar for the C-reactive protein in bacterial infections is out of the scale at 38 mg/L. Groups (COVID-19/other virus/bacteria) were evaluated by the Anderson–Darling test. The significance levels (0.05 or 0.01) of the test results are depicted at the bottom of the figure.
Figure 3Visualization of bacteria/virus/COVID-19 parameter space with t-SNE method. Each dot represents a patient or more specifically, an embedding of his/her blood parameters into a two-dimensional space, and its color represents the group. Blue dots represent patients with viral infections other than COVID-19, orange dots patients with bacterial infections and red dots patients with COVID-19. Green dots in panel (a) represent COVID-19 patients who died (10 patients) and in panel (b) COVID-19 patients diagnosed with acute respiratory failure (38 patients). Medoids of bacteria/virus/COVID-19/”COVID-19 death” groups on panel (a) and bacteria/virus/COVID-19/”COVID-19 ARF” groups on panel (b) are also marked.
Confusion matrix for the cross-validated training group.
| Positive | Negative | |
|---|---|---|
| Predicted positive | 131 | 112 |
| Predicted negative | 29 | 5221 |
Figure 4ROC, PR (precision-recall), and F2 curves for COVID-19 diagnosis calculated from the training data using ten-fold stratified cross-validation. Vertical and horizontal dashed lines connect the F2 (gray) max point with the PR curve (orange) and the ROC curve (blue) in order to obtain the operational ROC point with sensitivity = 0.819, specificity = 0.979 (depicted with red dots), and AUC = 0.97.