Literature DB >> 33932751

CT-Based COVID-19 triage: Deep multitask learning improves joint identification and severity quantification.

Mikhail Goncharov1, Maxim Pisov2, Alexey Shevtsov3, Boris Shirokikh2, Anvar Kurmukov3, Ivan Blokhin4, Valeria Chernina4, Alexander Solovev5, Victor Gombolevskiy4, Sergey Morozov4, Mikhail Belyaev6.   

Abstract

The current COVID-19 pandemic overloads healthcare systems, including radiology departments. Though several deep learning approaches were developed to assist in CT analysis, nobody considered study triage directly as a computer science problem. We describe two basic setups: Identification of COVID-19 to prioritize studies of potentially infected patients to isolate them as early as possible; Severity quantification to highlight patients with severe COVID-19, thus direct them to a hospital or provide emergency medical care. We formalize these tasks as binary classification and estimation of affected lung percentage. Though similar problems were well-studied separately, we show that existing methods could provide reasonable quality only for one of these setups. We employ a multitask approach to consolidate both triage approaches and propose a convolutional neural network to leverage all available labels within a single model. In contrast with the related multitask approaches, we show the benefit from applying the classification layers to the most spatially detailed feature map at the upper part of U-Net instead of the less detailed latent representation at the bottom. We train our model on approximately 1500 publicly available CT studies and test it on the holdout dataset that consists of 123 chest CT studies of patients drawn from the same healthcare system, specifically 32 COVID-19 and 30 bacterial pneumonia cases, 30 cases with cancerous nodules, and 31 healthy controls. The proposed multitask model outperforms the other approaches and achieves ROC AUC scores of 0.87±0.01 vs. bacterial pneumonia, 0.93±0.01 vs. cancerous nodules, and 0.97±0.01 vs. healthy controls in Identification of COVID-19, and achieves 0.97±0.01 Spearman Correlation in Severity quantification. We have released our code and shared the annotated lesions masks for 32 CT images of patients with COVID-19 from the test dataset.
Copyright © 2021 The Authors. Published by Elsevier B.V. All rights reserved.

Entities:  

Keywords:  COVID-19; Chest computed tomography; Convolutional neural network; Triage

Mesh:

Year:  2021        PMID: 33932751      PMCID: PMC8015379          DOI: 10.1016/j.media.2021.102054

Source DB:  PubMed          Journal:  Med Image Anal        ISSN: 1361-8415            Impact factor:   8.545


Introduction

During the first months of 2020, COVID-19 infection spread worldwide and affected millions of people (Li et al., 2020b). Though a virus-specific reverse transcription-polymerase chain reaction (RT-PCR) testing remains the gold standard (World Health Organization et al., 2020), chest imaging, including computed tomography (CT), is helpful in diagnosis and patient management (Bernheim, Mei, Huang, Yang, Fayad, Zhang, Diao, Lin, Zhu, Li, et al., 2020, Akl, Blazic, Yaacoub, Frija, Chou, Appiah, Fatehi, Flor, Hitti, Jafri, et al., 2020, Rubin, Ryerson, Haramati, Sverzellati, Kanne, 2020). Moreover, compared to RT-PCR, CT has higher sensitivity (98% compared to 71% at ) for some cohorts Fang et al. (2020). Fleischner Society has addressed the role of thoracic imaging in COVID-19, providing recommendations intended to guide medical practitioners with one scenario including medical triage in moderate-to-severe clinical features and a high pretest probability of disease (Rubin et al., 2020). Radiology departments can respond to the pandemic by division into four areas (contaminated, semi-contaminated, buffer, and clean), strict disinfection and management criteria (Huang et al., 2020b). The International Society of Radiology surveyed current practices in managing patients with COVID-19 in 50 radiology departments representing 33 countries across all continents. In symptomatic patients with suspected COVID-19, imaging was performed in 89% of cases, in 34% of cases - chest CT. Faster results than molecular tests (51%) and easy access (39%) were the main reasons for imaging use (Blažić et al., 2020) The pandemic dramatically increased the need for medical care and resulted in the overloading of healthcare systems (Tanne et al., 2020). Many classification and segmentation algorithms were developed to assist radiologists in COVID-19 identification and severity quantification, see Section 1.1.1. However, little research has been conducted to investigate automatic image analysis for triage, i.e. ranking of CT studies. During an outbreak, many CT scans require rapid decision-making to sort patients into those who need care right now and those who will need scheduled care (Mei et al., 2020). Therefore, the study list triage is relevant and may shorten the report turnaround time by increasing the priority of CT scans with suspected pathology for faster interpretation by a radiologist compared to other studies, see Fig. 1 .
Fig. 1

A schematic representation of the automatic triage process. Left: the chronological order of the studies. Center: re-prioritized order to highlight findings requiring radiologist’s attention (P denotes COVID-19 Identification probability). Right: accompanying algorithm-generated X-ray-like series to assist the radiologist in fast decision making (color bar from green to red denotes Severity of local COVID-19-related changes).

A schematic representation of the automatic triage process. Left: the chronological order of the studies. Center: re-prioritized order to highlight findings requiring radiologist’s attention (P denotes COVID-19 Identification probability). Right: accompanying algorithm-generated X-ray-like series to assist the radiologist in fast decision making (color bar from green to red denotes Severity of local COVID-19-related changes). The triage differs from other medical image analysis tasks, as in this case, automatic programs provide the first reading. The radiologist then becomes the second reading. Technically, many of the developed methods may provide a priority score for triage, e.g., output probability of a classifier or the total lesion volume extracted from a binary segmentation mask. However, these scores must be properly used. We assume that there are two different triage problems: Identification. The first challenging task is to identify studies of patients with COVID-19 and prioritize them so the physician can isolate potentially infected patients as early as possible (Sverzellati et al., 2020). Severity quantification. Second, within COVID-19 patients, a triage algorithm must prioritize those who will require emergency medical care (Kherad et al., 2020). Binary classification provides a direct way to formalize Identification, but the optimal computer science approach to estimate Severity is not as obvious. It was shown that human-based quantitative analysis of chest CT helps assess the clinical severity of COVID-19. (Colombi et al., 2020) had quantified affected pulmonary tissue and established a high correlation between the healthy pulmonary tissue volume and the outcomes (transfer to an intensive care unit or death). The threshold value for the volume of healthy pulmonary tissue was 73%. This result and similar ones motivate clinical recommendations in several countries: COVID-19 patients need to be sorted based on quantitative evaluation of lung lesions. In particular, the Russian Federation adopted the following approach (Morozov et al., 2020c): the volume ratio of lesions in each lung is calculated separately and the maximal ratio is treated as the overall severity score. However, manual binary segmentation of the affected lung tissue is extremely time-consuming and may take several hours (Shan et al., 2020). For this reason, a visual semi-quantitative scale was implemented rather than a fully quantitative one. The original continuous score is split up into five categories: from CT-0 to CT-4 with a 25% step so that CT-0 corresponds to normal cohort and CT-4 - to 75%-100% of damaged lung tissue. Patients with CT-3 (severe pneumonia) are hospitalized, and CT-4 (critical pneumonia) are admitted to an intensive care unit. The scale is based on a visual evaluation of approximate lesion volume in both lungs (regardless of postoperative changes). A retrospective study (Morozov et al., 2020b) analyzed the CT 0–4 scores and lethal outcomes in 13,003 COVID-19 patients. The chance of a lethal outcome increased from CT-0 to CT-4 by 38% on the average (95% CI 17.1–62.6%). Another retrospective analysis (Petrikov et al., 2020) found a significant correlation between an increase of CT grade and clinical condition deterioration (). These two triage strategies, Identification and Severity quantification, are not mutually exclusive, and their priority may change depending on the patient population structure and current epidemiological situation. An outpatient hospital in an area with a small number of infected patients may rely on Identification solely. An infectious diseases hospital may use Severity quantification to predict the need for artificial pulmonary ventilation and intensive care units. Finally, an outpatient hospital during an outbreak needs both systems to identify and isolate COVID-19 patients as well as quantify disease severity and route severe cases accordingly. This paper explores the automation of both Identification and Severity quantification intending to create a robust system for all scenarios, see Fig. 2 .
Fig. 2

An example of joint COVID-19 identification and severity estimation by the proposed method for several studies.

An example of joint COVID-19 identification and severity estimation by the proposed method for several studies.

Related work

CT Analysis for COVID-19 identification and severity estimation

As briefly discussed above, we consider two problems: COVID-19 identification and severity quantification in chest CTs. In both cases, researchers usually calculate a continuous score of COVID-19 presence or severity, depending on their task. An overview of the existing indices can be found in Tab. 1 . Below, we present only some of the existing CT-based algorithms for a more comprehensive review we refer to (Shi et al., 2020a).
Table 1

Overview of continuous output indices proposed in previous works. The Type column denotes score type: COVID-19 identification, COVID-19 severity or both. Type of the Identification is given in brackets COVID vs. : P - Pneumonia, NP - non-Pneumonia, HC - Healthy controls, N - Nodules, C - Cancer. The Metric column contains reported ROC AUC values unless otherwise indicated. Remarks. 1. Accuracy because ROC AUC was not reported. 2. The metric was provided for the identification problem only. 3. Pearson correlation. 4. The average volume error, measured in cm. 5. The paper does not provide a score, Dice score for the output masks is reported.

PaperRanking score descriptionTypeMetric
Bai et al. (2020)Probabilities of 2.5D EfficientNetIden. (P)0.95
Kang et al. (2020)Probabilities of a NN for raidomicsIden. (P)Acc.1 0.96
Shi et al. (2020b)Probabilities of RF for radiomicsIden. (P)0.94
Li et al. (2020a)Probabilities of 2.5D ResNet-50Iden. (P, NP)0.96
Wang et al. (2020a)Probabilities of a 3D Resnet-based NNIden. (HC, P)0.97
Han et al. (2020)Probabilities of a 3D CNNIden. (HC, P)0.99
Jin et al. (2020b)Probabilities of ResNet-50Iden. (HC, P, N)0.99
Jin et al. (2020a)Custom aggregation of a 2D CNN predicitonsIden. (HC, P)0.97
Gozes et al. (2020a)Fractions of affected slices (by 2D ResNet)Iden. (HC, C)0.99
Amine et al. (2020)Probabilities of 3D U-Net (encoder part)Iden. (HC, P)0.97
Wang et al. (2020b)Probabilities of a 3D CNNIden. (HC)0.96
Chen et al. (2020)2D Bounding boxes + post-processingIden. (other disease)Acc.1 0.99
Gozes et al. (2020b)A score based on 2D ResNet attentionBoth (fever)0.952
Chaganti et al. (2020)Affected lung percentage, a combined scoreSev.Corr.3 0.95
Huang et al. (2020a)Affected lung percentage by 2D U-NetSev.N/A
Shen et al. (2020)Affected lung percentage by non trainable CVSev.Corr.3 0.81
Shan et al. (2020)Volume of segm. masks by a 3D CNNSev.Vol.4 10.7
Fan et al. (2020)Segmentation maskSev.Dice5 0.60
Tang et al. (2020)Random Forrest probabilitiesSev.0.91
Overview of continuous output indices proposed in previous works. The Type column denotes score type: COVID-19 identification, COVID-19 severity or both. Type of the Identification is given in brackets COVID vs. : P - Pneumonia, NP - non-Pneumonia, HC - Healthy controls, N - Nodules, C - Cancer. The Metric column contains reported ROC AUC values unless otherwise indicated. Remarks. 1. Accuracy because ROC AUC was not reported. 2. The metric was provided for the identification problem only. 3. Pearson correlation. 4. The average volume error, measured in cm. 5. The paper does not provide a score, Dice score for the output masks is reported. The majority of reviewed works use a pre-trained network for lung extraction or bounding box estimation as a necessary preprocessing step. We will skip the description of this step below for all works. Binary classification Researchers usually treat the problem of identification as binary classification, e.g. COVID-19 versus all other studies. Likely, the most direct way to classify CT images with varying slice thicknesses is to train well established 2D convolutional neural networks. For example, authors of (Jin et al., 2020b) train ResNet-50 (He et al., 2016a) to classify images using the obtained lung mask. An interesting and interpretable way to aggregate slice predictions into whole-study predictions is proposed in (Gozes et al., 2020a), where the number of affected slices is used as the final output of the model. Also, this work employs Grad-cam (Selvaraju et al., 2017) to visualize network attention. A custom slice-level predictions aggregation is proposed in (Jin et al., 2020a) to filter out false positives. The need for a post-training aggregation of slice prediction can be avoided by using 3D convolutional networks, (Han, Wei, Hong, Li, Cong, Zhu, Wei, Zhang, 2020, Wang, Deng, Fu, Zhou, Feng, Ma, Liu, Zheng, 2020). (Wang et al., 2020a) propose a two-headed architecture based on 3D ResNet. This approach is a way to obtain hierarchical classification as the first head is trained to classify CTs with and without pneumonia. In contrast, the second one aims to distinguish COVID-19 from other types of pneumonia. Alternatively, slice aggregation may be inserted into network architectures to obtain an end-to-end pipeline, as proposed in (Li, Qin, Xu, Yin, Wang, Kong, Bai, Lu, Fang, Song, et al., 2020, Bai, Wang, Xiong, Hsieh, Chang, Halsey, Tran, Choi, Wang, Shi, et al., 2020). Within this setup, all slices are processed by a 2D backbone (ResNet-50 for (Li et al., 2020a), EfficientNet (Tan and Le, 2019) for (Bai et al., 2020)) while the final classification layers operate with a pooled version of feature maps from all slices. Segmentation The majority of papers for tackling severity estimation are segmentation based. For example, the total absolute volume of involved lung parenchyma can be used as a severity score (Shan et al., 2020). Relative volume (i.e., normalized by the total lung volume) is a more robust approach taking into account the normal variation of lung sizes. Affected lung percentage is estimated in several ways including a non-trainable computer vision algorithm (Shen et al., 2020), 2D U-Net (Huang et al., 2020a), and 3D U-Net (Chaganti et al., 2020). Alternatively, an algorithm may predict the severity directly, e.g., with Random Forest based on a set of radiomics features (Tang et al., 2020) or a neural network. Multitask approach As discussed above, many papers address either COVID-19 identification or severity estimation. However, little research has been conducted to study both tasks simultaneously. (Gozes et al., 2020b) propose an original Grad-cam-based approach to calculate a single attention-based score. Though the authors mention both identification and severity quantification in the papers, they do not provide direct quality metrics for the latter. Amine et al. (2020) propose a multi-head architecture to solve both segmentation and classification problems in an end-to-end manner. They use a 2D U-Net backbone with an additional classification head after the encoder part, which takes a latent feature map from the bottom of U-Net as input. Even though they do not tackle the problem of severity identification, they demonstrate that solving two tasks jointly could benefit both. However, they report metrics only for classification and segmentation of 2D axial slices and do not propose an approach to applying their method to the whole 3D CT series.

Deep learning for triage

As mentioned above, we define triage as a process of ordering studies to be examined by a radiologist. There are two major scenarios where such an approach could be useful: Studies with a high probability of dangerous findings must be prioritized. The most important example is triage within emergency departments, where minutes of acceleration may save lives (Faita, 2020), but it may be useful for other departments as well. For example, the study (Annarumma et al., 2019) estimates the average reporting delay in chest radiographs as 11.2 days for critical imaging findings and 7.6 days for urgent imaging findings. The majority of studies do not contain critical findings. This is a common situation for screening programs, e.g., CT-based lung cancer screening (Team, 2011). In this scenario, triage systems aim to exclude studies with the smallest probability of important findings to reduce radiologists’ workload. Medical imaging may provide detailed information useful for automatic patient triage, as shown in several studies. (Annarumma et al., 2019) propose a deep learning-based algorithm to estimate the urgency of imaging findings on adult chest radiographs. The dataset includes 470388 studies annotated in an automated way via text report mining. The Inception v3 architecture (Szegedy et al., 2016) is used to model clinical priority as ordinal data via solving several binary classification problems as proposed in (Lin and Li, 2012). The average reporting delay is reduced to 2.7 and 4.1 days for critical and urgent imaging findings correspondingly in a simulation on historical data. A triage system for screening mammograms, another 2D image modality, has been developed in (Yala et al., 2019). The authors draw attention to reducing the radiologist’s load by maximizing system recall. The underlying architecture is ResNet-18 (He et al., 2016a), which is trained on 223109 screening mammograms. The model achieves 0.82 ROC AUC on the whole test population and demonstrates the capability to reduce workload by 20% while preserving the same level of diagnostic accuracy. Prior research confirms that deep learning may assist in triage of more complex images such as volumetric CT. A deep learning-based system for rapid diagnosis of acute neurological conditions caused by stroke or traumatic brain injury is proposed in (Titano et al., 2018). A 3D adaption of ResNet-50 (Korolev et al., 2017) analyzes head CT images to predict critical findings. To train the model, the authors utilize 37236 studies; labels are also generated by text reports mining. The classifier’s output probabilities serve as ranks for triage, and the system achieves ROC AUC 0.73-0.88. Stronger supervision is investigated in (Chang et al., 2018), where authors use 3D masks of all hemorrhage subtypes of 10159 non-contrast CT. The detection and quantification of 5 subtypes of hemorrhages are based on a modified Mask R-CNN (He et al., 2017) extended by pyramid pooling to map 3D input to 2D feature maps (Lin et al., 2017). More detailed and informative labels combined with an accurately designed method provide reliable performance as ROC AUC varies from 0.85 to 0.99 depending on hemorrhage type and size. A similar finding is reported in (De Fauw et al., 2018) for optical coherence tomography (OCT). The authors employ a two-stage approach. First, 3D U-Net (Çiçek et al., 2016) is trained on 877 studies with dense 21-class segmentation masks. Then output maps for another 14884 cases are processed by a 3D version of DenseNet (Huang et al., 2017) to identify urgent cases. The obtained combination of two networks provided excellent performance achieving 0.99 ROC AUC.

Contribution

First, we highlight the need for triage systems of two types: for COVID-19 identification and severity quantification. We study existing approaches and demonstrate that a system trained for one task shows low performance in the other. Second, we have developed a multitask learning-based approach to create a single neural network which achieves top results in both triage tasks. In contrast to common multitask architectures, classification layers take the spatially detailed 3D feature map as input and return the single probability for the whole CT series. Finally, we provide a framework for reproducible comparison of various models (see the details below).

Reproducible research

A critical meta-review (Wynants et al., 2020) of machine learning models for COVID-19 diagnosis highlights low reliability and high risk of biased results for all 27 reviewed papers, mostly due to a non-representative selection of control patients and poor analysis of results, including possible model overfitting. The authors use (Wolff et al., 2019) PROBAST (Prediction model Risk Of Bias Assessment Tool), a systematic approach to validate the performance of machine learning-based approaches in medicine and identified the following issues. Poor patient structure of the validation set, including several studies where control studies were sampled from different populations. Unreliable annotation protocol where only one rater assessed each study without subsequent quality control or the model output influenced annotation. Lack of comparison with other well-established methods for similar tasks. Low reproducibility due to several factors such as unclear model description and incorrect validation approaches (e.g., slice-level prediction rather than study-level prediction). The authors conclude the paper with a call to share data and code to develop an established system for validating and comparing different models collaboratively. Though (Wynants et al., 2020) is an early review and does not include many properly peer-reviewed papers mentioned above, we agree that current COVID-19 algorithmic research lacks reproducibility. We aim to follow the best practices of reproducible research and address these issues in the following way. We selected fully independent test dataset and retrieved all COVID-19 positive and COVID-19 negative cases from the same population and the same healthcare system, see details in Section 3.5. Two raters annotated the test data independently. If raters contours were not aligned, the meta-rater requested annotation correction, see Section 3.5. We carefully selected several key ideas from the related works and implemented them within the same setup as our method, see Section 2. We publicly released the code to share technical details of the compared architectures2 . Finally, we use solely open CT images for training and testing. We also annotate and release the lesions masks for the COVID-19 positive cases from the test set, see details in the Section 3.5. Therefore, our experiments are reproducible as they rely on the open data.

Method

As discussed in Section 1 method should solve two tasks: identification of COVID-19 cases and ranking them in descending order of severity. Therefore, we organize Section 2 as follows. In Section 2.1 we describe lungs segmentation as a common preprocessing step for all methods. In Section 2.2 we tackle the severity quantification task. We describe methods which predict segmentation mask of lesions caused by COVID-19 and provide a severity score based on that. In Section 2.3 we discuss two straightforward baselines for the identification task. First is to use segmentation results and identify patients with non-empty lesions masks as COVID-19 positive. Second is to use separate neural network for classification of patients into COVID-19 positive or negative. However, as we show in Section 5 these methods yield poor identification quality, especially due to false positive alarms in patients with bacterial pneumonia. In Section 2.4 we propose a multitask model which achieves better COVID-19 identification results than the baselines. In particular, as we show in Section 5, this model successfully distinguishes between COVID-19 and bacterial pneumonia cases. In Section 2.5 we introduce quality metrics for both identification and severity quantification tasks to formalize the comparison of the methods.

Lungs segmentation

We segment lungs in two steps. First, we predict single binary mask for both lungs including pathological findings, e.g. ground-glass opacity, consolidation, nodules and pleural effusion. Then we split the obtained mask into separate left and right lungs’ masks. Binary segmentation is performed via fully-convolutional neural network in a standard fashion. Details of the architecture and training setup are given in Section 4.2. On the second step voxels within the lungs are clustered using -means algorithm () with Euclidean distance as a metric between voxels. Then we treat resulting clusters as separate lungs.

COVID-19 Severity quantification

To quantify COVID-19 severity we solve COVID-19-specific lesions segmentation task. Using predicted lungs’ and lesions’ masks, we calculate the lesions’ to lung’s volume ratio for each lung and use the maximum of two ratios as a final severity score for triage, according to recommendations discussed in Section 1. Threshold-based As a baseline for lesions segmentation, we choose a thresholding-based method. As pathological tissues are denser than healthy ones, corresponding CT voxels have greater intensities in Hounsfield Units. The method consists of three steps. The first step implements thresholding: voxels with intensity value between and within the lung mask are assigned to the positive class. At the second step, we apply Gaussian blur with smoothing parameter to the resulting binary mask and reassign the positive class to voxels with values greater than 0.5. Finally, we remove 3D binary connected components with volumes smaller than . The hyperparameters and are chosen via a grid-search in order to maximize the average Dice score between predicted and ground truth lesions masks for series from training dataset. U-Net The de facto standard approach for medical image segmentation is the U-Net model (Ronneberger et al., 2015). We trained two U-Net-based architectures for lung parenchyma involvement segmentation which we refer to as 2D U-Net and 3D U-Net. 2D U-Net independently processes the axial slices of the input 3D series. 3D U-Net processes 3D sub-patches of size and then stacks predictions for individual sub-patches to obtain prediction for the whole input 3D series. Thus, we do not need to downsample the input image under the GPU memory restrictions. For each model, we replace plain 2D and 3D convolutional layers with 2D and 3D residual convolutional blocks (He et al., 2016b), correspondingly. Both models were trained using the standard binary cross-entropy loss (see other details in Section 4.3).

COVID-19 Identification

We formalize COVID-19 identification task as a binary classification of 3D CT series. CT series of patients with verified COVID-19 are positive class. CT series of patients with other lung diseases, e.g. bacterial pneumonia, non-small cell lung cancer, etc., as well as normal patients are negative class. Segmentation-based One possible approach is to base the decision rule on the segmentation results: classify a series as positive if the segmentation-based severity score exceeds some threshold. We show that this leads to a trade-off between severity quantification and identification qualities: models which yield the best ranking results perform worse in terms of classification, and vice versa. Moreover, despite some segmentation-based methods accurately classify COVID-19 positive and normal cases, all of them yields a significant number of false positives in patients with bacterial pneumonia (see Section 5.1). ResNet-50 Another approach is to tackle the classification task separately from segmentation and explicitly predict the probability that a given series is COVID-19 positive. The advantage of this strategy is that we only need weak labels for model training, which are much more available than ground truth segmentations. To assess the performance of this approach we follow the authors of (Li, Qin, Xu, Yin, Wang, Kong, Bai, Lu, Fang, Song, et al., 2020, Bai, Wang, Xiong, Hsieh, Chang, Halsey, Tran, Choi, Wang, Shi, et al., 2020) and train the ResNet-50 (He et al., 2016b) which takes a series of axial slices as input and independently extracts feature vectors for each slice. After that the feature vectors are combined via a pyramid max-pooling operation (He et al., 2014) along all the slices. The resulting vector is passed into two fully connected layers followed by sigmoid activation which predicts the final COVID-19 probability for the whole series. In our paper, we denote this architecture as ResNet-50 (see other details in Section 4.4).

Multitask

Baselines for the identification task described in Section 2.3 do not perform well, as we show in Section 5. Therefore, we propose to solve the identification task simultaneously with the segmentation task via a single two-headed convolutional neural network. The segmentation part of the architecture is slice-wise 2D U-Net model. As earlier, its output is used for the evaluation of the severity score. The classification head shares a common intermediate feature map (per slice) with the segmentation part. These feature maps are stacked and aggregated into a feature vector via a pyramid pooling layer (He et al., 2014). Finally, two fully connected layers followed by sigmoid activation transform the feature vector to the COVID-19 probability. Following (Amine et al., 2020), the shared feature maps can be the outputs of the U-Net’s encoder and have no explicit spatial structure in the axial plane. We refer to this approach as Multitask-Latent. In contrast, we argue that the identification task is connected to the segmentation task and the classification model can benefit from the spatial structure of the input features. Therefore, we propose to share the feature map from the very end of the U-Net architecture, as shown in Fig. 3 . We refer to the resulting architecture as Multitask-Spatial-1. More generally, shared feature maps can be taken from the -th upper level of the U-Net’s decoder. Together they form a 3D spatial feature map, which is aligned with the input 3D series downsampled in the axial plane by a factor of . We denote this approach as Multitask-Spatial- . Since 2D U-Net architecture has 7 levels, can vary from 1 to 7.
Fig. 3

Schematic representation of the Multitask-Spatial-1 model. Identification score is the probability of being a COVID-19 positive series; Severity score is calculated using predicted lesions’ mask and precomputed lungs’ masks.

Schematic representation of the Multitask-Spatial-1 model. Identification score is the probability of being a COVID-19 positive series; Severity score is calculated using predicted lesions’ mask and precomputed lungs’ masks. As a loss function we optimize a weighted combination of binary cross entropies for segmentation and classification (see other details in Section 4).

Metrics

To assess the quality of classification of patients into positive, i.e. infected by COVID-19, and negative, i.e. with other lung pathologies or normal, we use areas under the ROC-curves (ROC AUC) calculated on several subsamples of the test sample described in Section 3.5. The first subsample contains only COVID-19 positive and healthy subjects, while studies with other pathological findings are excluded (ROC AUC COVID-19 vs. Normal). The second subsample contains only patients infected by COVID-19 or bacterial pneumonia (ROC AUC COVID-19 vs. Bac. Pneum.). The third subsample contains COVID-19 positive patients and patients with lung nodules typical for non-small cell lung cancer (ROC AUC COVID-19 vs. Nodules). The last ROC AUC is calculated on the whole test sample (ROC AUC COVID-19 vs. All others). ROC-curves are obtained by thresholding the predicted probabilities for ResNet-50 and multitask models, and by thresholding the predicted severity score for segmentation-based methods. We evaluate the quality of ranking studies in order of descending COVID-19 severity on the test subsample, which contains only COVID-19 positive patients. As a quality metric, we use Spearman’s rank correlation coefficient (Spearman’s ) between the severity scores calculated for ground truth segmentations and the predicted severity scores . It is defined aswhere is a sample covariance, is a sample standard deviation and is the vector of ranks, i.e. resulting indices of elements after their sorting in the descending order. To evaluate the COVID-19 lesions segmentation quality we use Dice score coefficient between the predicted and the ground truth segmentation masks. Similar to Spearman’s we evaluate the mean Dice score only for COVID-19 positive cases.

Data

We use several public datasets in our experiments: NSCLC-Radiomics and LUNA16 to create a robust lung segmentation model. Mosmed-1110, MedSeg-29 and NSCLC-Radiomics to train and validate all triage models. Mosmed-Test as a hold-out test set for the final evaluation of all models.

Mosmed-1110

1110 CT scans from Moscow outpatient clinics were collected from 1st of March, 2020 to 25th of April, 2020, within the framework of outpatient computed tomography centers in Moscow, Russia (Morozov et al., 2020a). Scans were performed on Canon (Toshiba) Aquilion 64 units in with standard scanner protocols and, particularly 0.8 mm inter-slice distance. However, the public version of the dataset contains every 10th slice of the original study, so the effective inter-slice distance is 8mm. The quantification of COVID-19 severity in CT was performed with the visual semi-quantitative scale adopted in the Russian Federation and Moscow in particular (Morozov et al., 2020c). According to this grading, the dataset contains 254 images without COVID-19 symptoms. The rest is split into 4 categories: CT1 (affected lung percentage 25% or below, 684 images), CT2 (from 25% to 50%, 125 images), CT3 (from 50% to 75%, 45 images), CT4 (75% and above, 2 images). Radiologists performed an initial reading of CT scans in clinics, after which experts from the advisory department of the Center for Diagnostics and Telemedicine (CDT) independently conducted the second reading as a part of a total audit targeting all CT studies with suspected COVID-19. Additionally, 50 CT scans were annotated with binary masks depicting regions of interest (ground-glass opacity and consolidation).

Medseg-29

MedSeg web-site3 shares 2 publicly available datasets of annotated volumetric CT images. The first dataset consists of 9 volumetric CT scans from a web-site4 that were converted from JPG to Nifti format. The annotations of this dataset include lung masks and COVID-19 masks segmented by a radiologist. The second dataset consists of 20 volumetric CT scans shared by (Jun et al., 2020). The left and rights lungs, and infections are labeled by two radiologists and verified by an experienced radiologist.

NSCLC-Radiomics

NSCLC-Radiomics dataset (Kiser, Ahmed, Stieb, et al., 2020, Aerts, Velazquez, Leijenaar, Parmar, Grossmann, Cavalho, Bussink, Monshouwer, Haibe-Kains, Rietveld, et al., 2015) represents a subset of The Cancer Imaging Archive NSCLC Radiomics collection (Clark et al., 2013). It contains left and right lungs segmentations annotated on 3D thoracic CT series of 402 patients with diseased lungs. Pathologies — lung cancerous nodules, atelectasis and pleural effusion — are included in the lung volume masks. Pleural effusion and cancerous nodules are also delineated separately, when present. Automatic approaches for lungs segmentation often perform inconsistently for patients with diseased lungs, while it is usually the main case of interest. Thus, we use NSCLC-Radiomics to create robust for pathological cases lungs segmentation algorithm. Other pathologies, e.g. pneumothorax, that are not presented in NSCLC-Radiomics could also lead to poor performance of lungs segmentation. But the appearance of such pathologies among COVID-19 cases is extremely rare. For instance, it is less than for pneumothorax (Zantah et al., 2020). Therefore, we ignore the possible presence of other pathology cases, while training and evaluating our algorithm.

LUNA16

LUNA16 (Jacobs et al., 2016) is a public dataset for cancerous lung nodules segmentation. It includes 888 annotated 3D thoracic CT scans from the LIDC/IDRI database (Armato III et al., 2011). Scans widely differ by scanner manufacturers (17 scanner models), slice thicknesses (from 0.6 to 5.0 mm), in-plane pixel resolution (from 0.461 to 0.977 mm), and other parameters. Annotations for every image contain binary masks for the left and right lungs, the trachea and main stem bronchi, and the cancerous nodules. The lung and trachea masks were originally obtained using an automatic algorithm (van Rikxoort et al., 2009) and the lung nodules were annotated by 4 radiologists (Armato III et al., 2011). We also exclude 7 cases with absent or completely broken lung masks and extremely noisy scans.

Mosmed-Test

We ensure the following properties of the test dataset: All cases are full CT series without missing slices and/or lacking metadata fields (e.g., information about original Hounsfield units). Data for all classes comes from the same population and the same healthcare system to avoid domain shifts within test data. COVID-19 positive It is a subsample of Mosmed-205 , 42 CT studies collected from 20 patients in an infectious diseases hospital during the second half of February 2020, at the beginning of the Russian outbreak. We remove 5 cases with artifacts related to patients’ movements while scanning. The remaining 37 cases were independently assessed by two raters (radiologists with 2 and 5 years of experience) who have annotated regions of interest (ground-glass opacities and consolidation) via MedSeg6 annotation tool for every of the 37 Mosmed-Test series. During the annotation process, 5 out of 37 images were identified to have no radiomic signs of COVID-19, so we remove these images from the list of COVID-19 positives. Then, we iteratively verify annotations based on two factors: Dice Score between two rates, and missing large connected components of the mask by one of the raters. The discrepancy between the two raters has been analyzed until the consensus is reached — Dice Score over 32 COVID-19 infected cases. We publicly release the final version of COVID-19 positive dataset including both images and annotated lesions masks. Note, that the Mosmed-20 was collected at inpatient clinics, whereas Mosmed-1110 is a subset of Moscow out-patient clinics database created from two to six weeks later, which guarantees that studies are not duplicated. Bacterial pneumonia We use 30 randomly selected cases from a dataset (Korb et al., 2021) with 75 chest CT studies with radiological signs of community-acquired bacterial pneumonia in 2019. Lung nodules We use a subset of MoscowRadiology-CTLungCa-5007 , a public dataset containing 500 chests CT scans randomly selected from patients over 50 years of age. We selected 30 cases randomly among cases with radiologically verified lung nodules. Normal controls The dataset with healthy patients consists of two parts: 5 Mosmed20 cases mentioned above without radiomic signs of COVID-19, and 26 cases from MoscowRadiology-CTLungCa-500 without lung nodules larger than 5mm and other lung pathologies.

Experiments

We design our experiments in order to objectively compare all the triage models described in Section 2. As shown in the Tab. 2 , all the methods are trained on the same datasets and evaluated using the mean values and the standard deviations of the same quality metrics defined in Section 2.5 on the same hold-out test dataset described in Section 3.5. We believe, that the experimental design for training neural networks for triage described in Section 4.3 and 4.4 exclude overfitting. All computational experiments were conducted on Zhores supercomputer (Zacharov et al., 2019).
Table 2

Training, validation and test data splits for all triage models. For each method, we give the optimized training objectives in the corresponding table cells for the training datasets. Every column of Mosmed-Test dataset represents the metrics which are calculated using the corresponding test subset. Remarks. 1. pos. with mask/pos. mean COVID-19 positive cases with or without lesions mask, correspondingly, and neg. means COVID-19 negative cases. 2. DSC means Dice Score. 3. AUCs means ROC AUC COVID-19 vs. All, vs. Normal, vs. Bac. Pneum. and vs. Nodules. 4. Seg. BCE and class. BCE means segmentation and classification Binary Cross-Entropy correspondingly. 5. means Spearman’s . 6. Multitask-Latent, Multitask-Spatial-4, Multitask-Spatial-1.

Training and validation datasets
Mosmed-test
Mosmed-1110
Medseg-29NSCLC-RadiomicsCOVID-19 pos.Bac. Pneum.NodulesNormal
Ground truth1pos. with maskpos.neg.pos. with maskneg.pos. with maskneg.neg.neg.
Num. of images508062542940232303031
ThresholdingDSC2-DSC-AUCs3,ρ, DSCAUCsAUCsAUCs
2D U-Net, 3D U-NetSeg. BCE4-Seg. BCE-AUCs, ρ, DSCAUCsAUCsAUCs
2D U-Net+Seg. BCE-Seg. BCESeg. BCEAUCs, ρ5, DSCAUCsAUCsAUCs
ResNet-50-Class. BCE4-Class. BCEAUCsAUCsAUCsAUCs
Multitask models6Seg. BCEClass. BCESeg. BCEClass. BCEAUCs, ρ, DSCAUCsAUCsAUCs
Training, validation and test data splits for all triage models. For each method, we give the optimized training objectives in the corresponding table cells for the training datasets. Every column of Mosmed-Test dataset represents the metrics which are calculated using the corresponding test subset. Remarks. 1. pos. with mask/pos. mean COVID-19 positive cases with or without lesions mask, correspondingly, and neg. means COVID-19 negative cases. 2. DSC means Dice Score. 3. AUCs means ROC AUC COVID-19 vs. All, vs. Normal, vs. Bac. Pneum. and vs. Nodules. 4. Seg. BCE and class. BCE means segmentation and classification Binary Cross-Entropy correspondingly. 5. means Spearman’s . 6. Multitask-Latent, Multitask-Spatial-4, Multitask-Spatial-1.

Preprocessing

In all our experiments we use the same preprocessing applied separately for each axial slice: rescaling to a pixel spacing of mm and intensity normalization to the range. In our COVID-19 identification and segmentation experiments we crop the input series to the bounding box of the lungs’ mask predicted by our lungs segmentation network. We further show (Section 5) that this preprocessing is sufficient for all the models. Despite the diversity of the training dataset, all the models successfully adapt to the test dataset. For the lungs segmentation task we choose a basic U-Net (Ronneberger et al., 2015) architecture with 2D convolutional layers, individually apply to each axial slice of an incoming series. The model was trained on NSCLC-Radiomics and LUNA16 datasets for 16k batches of size 30. We use Adam (Kingma and Ba, 2014) optimizer with default parameters and an initial learning rate of 0.001, which was decreased to 0.0001 after 8k batches. We assess the model’s performance using 3-fold cross-validation and additionally using MedSeg-29 dataset as hold-out set. Dice Score of cross-validation is for both LUNA16 and NSCLC-Radiomics datasets, and only on NSCLC-Radiomics dataset. The latter result confirms our model to be robust to the cases with pleural effusion. Dice Score on MedSeg-29 dataset is which shows the robustness of our model to the COVID-19 cases.

Lesions segmentation

We use all the available 79 images of COVID-19 positive patients with annotated lesions masks (50 images from Mosmed-1110 and 29 images from MedSeg-29) to train the threshold-based, 2D U-Net, 3D U-Net models. Additionally, we train the 2D U-Net’s architecture on the same 79 cases along with 402 images from the NSCLC-Radiomics dataset. These 402 images were acquired long before the COVID-19 pandemic, therefore we assume that ground truth segmentations for them are zero masks. During training this model we resample series such that batches contain approximately equal numbers of COVID-19 positive and negative cases. We refer to this model as 2D U-Net+. 2D U-Net and 2D U-Net+ were trained for 15k batches using Adam (Kingma and Ba, 2014) optimizer with default parameters and an initial learning rate of 0.0003. Each batch contains 5 series of axial slices. 3D U-Net was optimized via plain stochastic gradient descent for 10k batches using a learning rate of 0.01. Each batch consists of 16 3D patches. In order to estimate mean values and standard deviations of models’ quality metrics defined in Section 2.5 each segmentation network was trained 3 times with different random seeds. Resulting networks were evaluated on the hold-out test dataset, described in Section 3.5.

Resnet-50 and multitask models

The remaining 806 positive images without ground truth segmentations and 254 negative images from the Mosmed-1110 and 402 negative images from NSCLC-Radiomics were split 5 times in a stratified manner into a training set and a validation set. Each of the 5 validation sets contains 30 random images. For each split we train the ResNet-50 and the classification heads of Multitask-Latent, Multitask-Spatial-1 and Multitask-Spatial-4 models on the defined training set, while segmentation heads of the multitask models were trained on the same 79 images, as 2D U-Net (see Section 4.3). For each network on each training epoch we evaluate the ROC AUC between the predicted COVID-19 probabilities and the ground truth labels on the validation set. We save the networks’ weights which resulted in the highest validation ROC AUC during training. For all the multitask models as well as for ResNet-50 top validation ROC AUCs exceeded 0.9 for all splits. We train all networks for 30k batches using Adam (Kingma and Ba, 2014) optimizer with the default parameters and an initial learning rate of reduced to after 24k batches. Each batch contains 5 series of axial slices. During training the multitask models we resample examples such that batches contained an approximately equal number of examples which were used to penalize either classification or segmentation head. However, we multiplied by 0.1 the loss for the classification head, because it resulted in better validation ROC AUCs. For each of 5 splits, we evaluated each trained network on the hold-out test dataset described in Section 3.5. We report the resulting mean values and standard deviations of the quality metrics in Section 5.

Results

In this section we report and discuss the results of the experiments described in Section 4. In Tab. 3 we compare all the methods described in Section 2 using quality metrics defined in Section 2.5 and evaluated on the test dataset described in Section 3.5.
Table 3

Quantitative comparison of all the methods discussed in Section 2. Trade-off between qualities of COVID-19 identification and ranking by severity is observed for segmentation-based methods. The proposed Multitask-Spatial-1 model yields the best identification results. Results are given as .

ROC AUC (COVID-19 vs. ·)
Spearman’s ρDice Score
vs. All othersvs. Normalvs. Bac. Pneum.vs. Nodules
Thresholding.51±0.00.68±0.00.46±0.00.45±0.00.92±0.00.42±0.00
3D U-Net.76±0.02.89±0.02.59±0.01.79±0.03.97±0.01.65±0.00
2D U-Net.78±0.01.93±0.01.62±0.01.79±0.00.97±0.00.63±0.00
2D U-Net+.86±0.01.98±0.01.68±0.02.91±0.01.80±0.03.59±0.01
ResNet-50.62±0.19.67±0.21.55±0.13.65±0.22N/AN/A
Multitask-Latent.79±0.06.84±0.05.73±0.06.80±0.07.97±0.00.61±0.02
Multitask-Spatial-4.89±0.03.94±0.03.83±0.05.91±0.03.98±0.00.61±0.02
Multitask-Spatial-1.93±0.01.97±0.01.87±0.01.93±0.00.97±0.01.61±0.02
Quantitative comparison of all the methods discussed in Section 2. Trade-off between qualities of COVID-19 identification and ranking by severity is observed for segmentation-based methods. The proposed Multitask-Spatial-1 model yields the best identification results. Results are given as .

Segmentation-based methods

In this subsection we discuss the performance of four methods: the threshold-based baseline, 3D U-Net, 2D U-Net and 2D U-Net+. We expect two major weaknesses of the threshold-based method: False Positive (FP) predictions on the vessels and bronchi, and inability to distinguish COVID-19 related lesions from other pathological findings. It is clearly seen from the extremely low ROC AUC scores (Tab. 3). One could also notice massive FP predictions even in healthy cases (Fig. 4 , column B). However, the method often provides a reasonable segmentation of the lesion area (Fig. 4, column A).
Fig. 4

Examples of axial CT slices from the test dataset along with ground truth annotations (first row) and predicted masks (second row) of COVID-19-specific lesions. Column A: COVID-19 positive case; Column B: normal case; Column C: case with bacterial pneumonia. Lesions’ masks are represented by the contours of their borders for clarity.

Examples of axial CT slices from the test dataset along with ground truth annotations (first row) and predicted masks (second row) of COVID-19-specific lesions. Column A: COVID-19 positive case; Column B: normal case; Column C: case with bacterial pneumonia. Lesions’ masks are represented by the contours of their borders for clarity. Neural networks considerably outperform the threshold-based baseline in terms of any quality metric. We observe neither quantitative (Tab. 3) nor qualitative (Fig. 4) significant difference between 2D U-Net’s and 3D U-Net’s performances. They yield accurate severity scores within the COVID-19 positive population (Spearman’s ). However, severity scores quantified for the whole test dataset do not allow to accurately distinguish between COVID-19 positive cases and cases with other pathological findings (ROC AUC COVID-19 vs. Bac. Pneum. ROC AUC COVID-19 vs. Nodules ) due to FP segmentations (Fig. 4, columns B and C). As one could expect, training on images with non-small cell lung cancer tumors from NSCLS-Radiomics dataset results in the enhancement of ROC AUC vs. Nodules (0.91 for 2D U-Net+ compared to 0.79 for 2D U-Net). Interestingly, in this experiment we observe a degradation in terms of Spearman’s for ranking of COVID-19 positive cases (0.8 for 2D U-Net+ compared to 0.97 for 2D U-Net). We conclude that one should account for this trade-off and use an appropriate training setup depending on the task. All the segmentation-based models perform poorly in terms of classification into COVID-19 and bacterial pneumonia (ROC AUC COVID-19 vs. Bac. Pneum. ). This motivates to discuss the other methods.

Resnet-50

Despite that validation ROC AUCs for all the trained ResNet-50 networks exceed 0.9, their performance on the test dataset is extremely unstable: ROC AUC COVID-19 vs. All varies from 0.43 to 0.85, see also high standard deviation values for all tasks in Tab. 3.

Multitask models

In this subsection we discuss the performance of Multitask-Latent, Multitask-Spatial-4 and the proposed Multitask-Spatial-1 models on identification, segmentation and severity quantification tasks in comparison to each other, ResNet-50 and segmentation-based methods. As seen from mean values and standard deviations of ROC AUC scores in Tab. 3, Multitask-Latent model yields better and more stable identification results than ResNet-50. Both these models classify the latent representations of the input images. We show that sharing these features with the segmentation head, i.e. decoder of the U-Net architecture improves the classification quality. Moreover, one can see in Tab. 3 that this effect is enhanced by sharing the spatial feature maps from the upper levels of the U-Net’s decoder. The proposed Multitask-Spatial-1 architecture (see Fig. 3) with shallow segmentation and classification heads directly sharing the same spatial feature map shows the top classification results. Especially, it most accurately distinguish between COVID-19 and other lung diseases (ROC AUC COVID-19 vs. Bac. Pneum.  ROC AUC COVID-19 vs. Nodules ). As seen in Tab. 3 and Fig. 4 there is no significant difference in terms of segmentation and severity quantification qualities between the multitask models and the neural networks for single segmentation task. Therefore, the single proposed Multitask-Spatial-1 model can be applied for both triage problems: identification of COVID-19 patients followed by their ranking according to the severity. In Fig. 5 we visualize these two steps of triage pipeline for the test dataset, described in Section 3.5. One can see the several false positive alarms in cases with non-COVID-19 pathological findings. We discuss the possible ways to resolve them in Section 6. The overall pipeline for triage, including preprocessing, lungs segmentation, and multitask inference takes 8s and 20s using nVidia V100 and GTX 980 GPUs respectively.
Fig. 5

COVID-19 triage: identification of COVID-19 positive patients (left) and ranking them in the descending order of severity (right) via the proposed single Multitask-Spatial-1 model. In the right plot bars correspond to the ranked studies. Absolute values of the predicted affected lungs fractions are represented as bars’ lengths along the -axis. The bars’ colors denote ground truth labels.

COVID-19 triage: identification of COVID-19 positive patients (left) and ranking them in the descending order of severity (right) via the proposed single Multitask-Spatial-1 model. In the right plot bars correspond to the ranked studies. Absolute values of the predicted affected lungs fractions are represented as bars’ lengths along the -axis. The bars’ colors denote ground truth labels.

Discussion

We have highlighted two important scores: COVID-19 Identification and Severity and discussed their priorities in different clinical scenarios. We have shown that these two scores aren’t aligned well. Existing methods operate either with Identification or Severity and demonstrate deteriorated performance for the other task. We have presented a new method for joint estimation of COVID-19 Identification and Severity score and showed that the proposed multitask architecture achieves top quality metrics for both tasks simultaneously. Finally, we have released the code and used public data for training, so our results are fully reproducible. Besides classification between COVID-19 and healthy patients we evaluate classification between COVID-19 and other lung abnormalities: bacterial pneumonia and cancerous nodules. As shown in Fig. 5, our method yields false positive alarms, mainly in patients with bacterial pneumonia (COVID-19 vs. bacterial pneumonia specificity ). However, we find this result promising, given the fact that we do not use any explicit training dataset with bacterial pneumonia patients. The proposed multitask model can be trained with the addition of bacterial or/and viral (not COVID-19) pneumonia cases, which can partially reduce the classification error. However, there is also an irreducible classification error in cases when radiomic features are not allow to distinguish between COVID-19 and non-COVID-19 pneumonia. Fortunately, in practice, usage of an automated triage system always implies second reading, so the model’s false positives are assumed to be resolved by a radiologist, while the most controversial cases can be resolved by the RT-PCR testing. Thus, we conclude that the identification part of our triage system may be used as a highly sensitive first reading tool. The role of the Severity Quantification part is more straightforward. As we mentioned in Section 1, radiologists perform the severity classification into groups from CT0 (no COVID-19 related lesions) and CT1 (up to 25% of lungs affected) to CT4 (more than 75%) in a visual semi-quantitative fashion. We believe that such estimation may be highly subjective and may contain severe discrepancies. To validate this assumption, we additionally analyzed Mosmed-1110, which includes not only 50 segmentation masks but also 1110 multiclass labels CT0-CT4. Within our experiments, we binarized these labels and effectively removed information about COVID-19 severity. We examined mask predictions for the remaining 1050 cases, excluding healthy patients (CT0 group) and grouped the predictions by these weak labels, as shown in Fig. 6 . An expert radiologist validated analyzed the most extreme mismatches visualized in Fig. 6 and confirmed the correctness of our model’s predictions. As we see, the severity of many studies was highly underestimated during the visual semi-quantitative analysis. This result implies that deep-learning-based medical image analysis algorithms, including the proposed method, are great intelligent radiologists’ assistants in a fast and reliable estimation of time-consuming biomarkers such as COVID-19 severity.
Fig. 6

The comparison of visual subjective estimation and automatic segmentation for weakly annotated cases from the Mosmed-1110 dataset. Each distribution corresponds to a set of cases with the same Severity group according to the radiologist’s subjective judgment. The left y-axis shows the automatically estimated Severity by our method; the right one denotes expected Severity ranges that are [0; 25) for CT-1, [25; 50) for CT-2, [50; 75) for CT-3, [75; 100] for CT-4. The colored arrows denote the correspondence between some visually underestimated cases and their representative axial slices. Note the inconsistency of manual estimation.

The comparison of visual subjective estimation and automatic segmentation for weakly annotated cases from the Mosmed-1110 dataset. Each distribution corresponds to a set of cases with the same Severity group according to the radiologist’s subjective judgment. The left y-axis shows the automatically estimated Severity by our method; the right one denotes expected Severity ranges that are [0; 25) for CT-1, [25; 50) for CT-2, [50; 75) for CT-3, [75; 100] for CT-4. The colored arrows denote the correspondence between some visually underestimated cases and their representative axial slices. Note the inconsistency of manual estimation.

CRediT authorship contribution statement

Mikhail Goncharov: Writing - original draft, Formal analysis, Investigation. Maxim Pisov: Visualization, Writing - original draft, Investigation. Alexey Shevtsov: Investigation, Data curation, Software. Boris Shirokikh: Writing - original draft, Formal analysis, Investigation. Anvar Kurmukov: Software, Writing - review & editing. Ivan Blokhin: Writing - review & editing, Data curation. Valeria Chernina: Writing - original draft. Alexander Solovev: Data curation. Victor Gombolevskiy: Conceptualization, Validation. Sergey Morozov: Supervision. Mikhail Belyaev: Conceptualization, Methodology, Writing - original draft, Writing - review & editing.

Declaration of Competing Interest

The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: M. Belyaev is a founder and CEO of IRA Labs ltd, a medical image processing company. The company didn’t support the study which was conducted on open-sourced datasets; the paper code is also public.
  40 in total

1.  Automatic lung segmentation from thoracic computed tomography scans using a hybrid approach with error detection.

Authors:  Eva M van Rikxoort; Bartjan de Hoop; Max A Viergever; Mathias Prokop; Bram van Ginneken
Journal:  Med Phys       Date:  2009-07       Impact factor: 4.071

2.  Diagnosis of Coronavirus Disease 2019 (COVID-19) With Structured Latent Multi-View Representation Learning.

Authors:  Hengyuan Kang; Liming Xia; Fuhua Yan; Zhibin Wan; Feng Shi; Huan Yuan; Huiting Jiang; Dijia Wu; He Sui; Changqing Zhang; Dinggang Shen
Journal:  IEEE Trans Med Imaging       Date:  2020-05-05       Impact factor: 10.048

3.  Inf-Net: Automatic COVID-19 Lung Infection Segmentation From CT Images.

Authors:  Deng-Ping Fan; Tao Zhou; Ge-Peng Ji; Yi Zhou; Geng Chen; Huazhu Fu; Jianbing Shen; Ling Shao
Journal:  IEEE Trans Med Imaging       Date:  2020-08       Impact factor: 10.048

4.  Accurate Screening of COVID-19 Using Attention-Based Deep 3D Multiple Instance Learning.

Authors:  Zhongyi Han; Benzheng Wei; Yanfei Hong; Tianyang Li; Jinyu Cong; Xue Zhu; Haifeng Wei; Wei Zhang
Journal:  IEEE Trans Med Imaging       Date:  2020-08       Impact factor: 10.048

5.  A Weakly-Supervised Framework for COVID-19 Classification and Lesion Localization From Chest CT.

Authors:  Xinggang Wang; Xianbo Deng; Qing Fu; Qiang Zhou; Jiapei Feng; Hui Ma; Wenyu Liu; Chuansheng Zheng
Journal:  IEEE Trans Med Imaging       Date:  2020-08       Impact factor: 10.048

6.  A Deep Learning Model to Triage Screening Mammograms: A Simulation Study.

Authors:  Adam Yala; Tal Schuster; Randy Miles; Regina Barzilay; Constance Lehman
Journal:  Radiology       Date:  2019-08-06       Impact factor: 11.105

7.  Integrated Radiologic Algorithm for COVID-19 Pandemic.

Authors:  Nicola Sverzellati; Gianluca Milanese; Francesca Milone; Maurizio Balbi; Roberta E Ledda; Mario Silva
Journal:  J Thorac Imaging       Date:  2020-07       Impact factor: 3.000

8.  The use of imaging in COVID-19-results of a global survey by the International Society of Radiology.

Authors:  Ivana Blažić; Boris Brkljačić; Guy Frija
Journal:  Eur Radiol       Date:  2020-09-17       Impact factor: 5.315

9.  Artificial Intelligence Augmentation of Radiologist Performance in Distinguishing COVID-19 from Pneumonia of Other Origin at Chest CT.

Authors:  Harrison X Bai; Robin Wang; Zeng Xiong; Ben Hsieh; Ken Chang; Kasey Halsey; Thi My Linh Tran; Ji Whae Choi; Dong-Cui Wang; Lin-Bo Shi; Ji Mei; Xiao-Long Jiang; Ian Pan; Qiu-Hua Zeng; Ping-Feng Hu; Yi-Hui Li; Fei-Xian Fu; Raymond Y Huang; Ronnie Sebro; Qi-Zhi Yu; Michael K Atalay; Wei-Hua Liao
Journal:  Radiology       Date:  2021-04       Impact factor: 11.105

10.  Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal

Authors:  Laure Wynants; Ben Van Calster; Gary S Collins; Richard D Riley; Georg Heinze; Ewoud Schuit; Marc M J Bonten; Darren L Dahly; Johanna A A Damen; Thomas P A Debray; Valentijn M T de Jong; Maarten De Vos; Paul Dhiman; Maria C Haller; Michael O Harhay; Liesbet Henckaerts; Pauline Heus; Michael Kammer; Nina Kreuzberger; Anna Lohmann; Kim Luijken; Jie Ma; Glen P Martin; David J McLernon; Constanza L Andaur Navarro; Johannes B Reitsma; Jamie C Sergeant; Chunhu Shi; Nicole Skoetz; Luc J M Smits; Kym I E Snell; Matthew Sperrin; René Spijker; Ewout W Steyerberg; Toshihiko Takada; Ioanna Tzoulaki; Sander M J van Kuijk; Bas van Bussel; Iwan C C van der Horst; Florien S van Royen; Jan Y Verbakel; Christine Wallisch; Jack Wilkinson; Robert Wolff; Lotty Hooft; Karel G M Moons; Maarten van Smeden
Journal:  BMJ       Date:  2020-04-07
View more
  13 in total

1.  Challenges of Multiplex Assays for COVID-19 Research: A Machine Learning Perspective.

Authors:  Paul C Guest; David Popovic; Johann Steiner
Journal:  Methods Mol Biol       Date:  2022

2.  An Interpretable Chest CT Deep Learning Algorithm for Quantification of COVID-19 Lung Disease and Prediction of Inpatient Morbidity and Mortality.

Authors:  Jordan H Chamberlin; Gilberto Aquino; Uwe Joseph Schoepf; Sophia Nance; Franco Godoy; Landin Carson; Vincent M Giovagnoli; Callum E Gill; Liam J McGill; Jim O'Doherty; Tilman Emrich; Jeremy R Burt; Dhiraj Baruah; Akos Varga-Szemes; Ismail M Kabakus
Journal:  Acad Radiol       Date:  2022-04-04       Impact factor: 5.482

Review 3.  Automated COVID-19 diagnosis and prognosis with medical imaging and who is publishing: a systematic review.

Authors:  Ashley G Gillman; Febrio Lunardo; Joseph Prinable; Gregg Belous; Aaron Nicolson; Hang Min; Andrew Terhorst; Jason A Dowling
Journal:  Phys Eng Sci Med       Date:  2021-12-17

4.  Artificial Intelligence Predicts Severity of COVID-19 Based on Correlation of Exaggerated Monocyte Activation, Excessive Organ Damage and Hyperinflammatory Syndrome: A Prospective Clinical Study.

Authors:  Olga Krysko; Elena Kondakova; Olga Vershinina; Elena Galova; Anna Blagonravova; Ekaterina Gorshkova; Claus Bachert; Mikhail Ivanchenko; Dmitri V Krysko; Maria Vedunova
Journal:  Front Immunol       Date:  2021-08-27       Impact factor: 7.561

5.  CT-based severity assessment for COVID-19 using weakly supervised non-local CNN.

Authors:  R Karthik; R Menaka; M Hariharan; Daehan Won
Journal:  Appl Soft Comput       Date:  2022-03-29       Impact factor: 8.263

Review 6.  Role of Artificial Intelligence in COVID-19 Detection.

Authors:  Anjan Gudigar; U Raghavendra; Sneha Nayak; Chui Ping Ooi; Wai Yee Chan; Mokshagna Rohit Gangavarapu; Chinmay Dharmik; Jyothi Samanth; Nahrizul Adib Kadri; Khairunnisa Hasikin; Prabal Datta Barua; Subrata Chakraborty; Edward J Ciaccio; U Rajendra Acharya
Journal:  Sensors (Basel)       Date:  2021-12-01       Impact factor: 3.576

7.  COVID-19 Infection Segmentation and Severity Assessment Using a Self-Supervised Learning Approach.

Authors:  Yao Song; Jun Liu; Xinghua Liu; Jinshan Tang
Journal:  Diagnostics (Basel)       Date:  2022-07-26

8.  Detection and Severity Classification of COVID-19 in CT Images Using Deep Learning.

Authors:  Yazan Qiblawey; Anas Tahir; Muhammad E H Chowdhury; Amith Khandakar; Serkan Kiranyaz; Tawsifur Rahman; Nabil Ibtehaz; Sakib Mahmud; Somaya Al Maadeed; Farayi Musharavati; Mohamed Arselene Ayari
Journal:  Diagnostics (Basel)       Date:  2021-05-17

9.  Intelligent Diagnosis Method for New Diseases Based on Fuzzy SVM Incremental Learning.

Authors:  Shi Song-Men
Journal:  Comput Math Methods Med       Date:  2022-01-13       Impact factor: 2.238

10.  The Role of 3D CT Imaging in the Accurate Diagnosis of Lung Function in Coronavirus Patients.

Authors:  Ibrahim Shawky Farahat; Ahmed Sharafeldeen; Mohamed Elsharkawy; Ahmed Soliman; Ali Mahmoud; Mohammed Ghazal; Fatma Taher; Maha Bilal; Ahmed Abdel Khalek Abdel Razek; Waleed Aladrousy; Samir Elmougy; Ahmed Elsaid Tolba; Moumen El-Melegy; Ayman El-Baz
Journal:  Diagnostics (Basel)       Date:  2022-03-12
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.