Rui Zhang1, Chao Cheng2, Xuehua Zhao1, Xuechen Li3. 1. 1 Department of Digital Media, Shenzhen Institute of Information Technology, Shenzhen, Guangdong, China. 2. 2 Department of Nuclear Medicine, Changhai Hospital, Shanghai, People's Republic of China. 3. 3 College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, People's Republic of China.
Abstract
Positron emission tomography (PET) imaging serves as one of the most competent methods for the diagnosis of various malignancies, such as lung tumor. However, with an elevation in the utilization of PET scan, radiologists are overburdened considerably. Consequently, a new approach of "computer-aided diagnosis" is being contemplated to curtail the heavy workloads. In this article, we propose a multiscale Mask Region-Based Convolutional Neural Network (Mask R-CNN)-based method that uses PET imaging for the detection of lung tumor. First, we produced 3 models of Mask R-CNN for lung tumor candidate detection. These 3 models were generated by fine-tuning the Mask R-CNN using certain training data that consisted of images from 3 different scales. Each of the training data set included 594 slices with lung tumor. These 3 models of Mask R-CNN models were then integrated using weighted voting strategy to diminish the false-positive outcomes. A total of 134 PET slices were employed as test set in this experiment. The precision, recall, and F score values of our proposed method were 0.90, 1, and 0.95, respectively. Experimental results exhibited strong conviction about the effectiveness of this method in detecting lung tumors, along with the capability of identifying a healthy chest pattern and reducing incorrect identification of tumors to a large extent.
Positron emission tomography (PET) imaging serves as one of the most competent methods for the diagnosis of various malignancies, such as lung tumor. However, with an elevation in the utilization of PET scan, radiologists are overburdened considerably. Consequently, a new approach of "computer-aided diagnosis" is being contemplated to curtail the heavy workloads. In this article, we propose a multiscale Mask Region-Based Convolutional Neural Network (Mask R-CNN)-based method that uses PET imaging for the detection of lung tumor. First, we produced 3 models of Mask R-CNN for lung tumor candidate detection. These 3 models were generated by fine-tuning the Mask R-CNN using certain training data that consisted of images from 3 different scales. Each of the training data set included 594 slices with lung tumor. These 3 models of Mask R-CNN models were then integrated using weighted voting strategy to diminish the false-positive outcomes. A total of 134 PET slices were employed as test set in this experiment. The precision, recall, and F score values of our proposed method were 0.90, 1, and 0.95, respectively. Experimental results exhibited strong conviction about the effectiveness of this method in detecting lung tumors, along with the capability of identifying a healthy chest pattern and reducing incorrect identification of tumors to a large extent.
Lung tumor is one of the most life-threatening ailment that renders a highly increasing incidence and mortality rate all over the world. Survival and life quality improvement in patients with lung cancer is eminently subjective to early diagnosis and treatment. The 5-year survival of patients with early diagnosis is approximately 54%, while it is only 4% for those who are initially diagnosed at stage 4 cancer.[1] Imaging technology is of paramount significance for the evaluation of lung tumors,[2] since it can facilitate early diagnosis and treatment for such malignancies by essentially discovering tumors at early stages. Positron emission tomography (PET) is an important 3-dimensional imaging technique for lung tumor detection.[3] In addition, as the image scan technique becomes more widely utilized, the number of images required for diagnosis has rapidly increased, thus escalating the work load of radiologists. Consequently, radiologists have been demanding a new approach of diagnosis to lessen their burden called computer-aided diagnosis (CAD or CADx).Computer-aided diagnosis is a popular research topic in medical imaging and diagnostic radiology. The concept of CAD was first proposed by the University of Chicago in the mid-1980s, in order to provide a computer output as a “second opinion” to assist radiologists in interpreting images. Such a tool reinforces the accuracy and consistency in radiological diagnosis, along with the reduction in image interpretation time[4-6] Since then, a large amount of research has been proposed for developing various customized CAD schemes for the detection and classification of numerous abnormalities such as breast diseases,[7-11] lung diseases,[12-15] and other pathologies in different organs.[16-20]In recent years, various computer-aided detection (CADe) methods for lung tumor detection using PET imaging have been developed. A common automatic method revolves around the dynamic definition of threshold values to isolate the lesion.[21,22] Ying et al[23] proposed a novel approach to enhance the effectiveness of lung tumor detection using PET images. This method processed 3-dimensional images through segmentation, multithreshold creation with volume criterion, and heuristics-based tumor candidate ranking. Gifford et al[24] proposed a support vector machine (SVM)-based visual-search algorithm model for tumor detection using PET imaging. Liu et al[25] presented a segmentation algorithm for detecting lung cancer via PET images by using pseudo color and context awareness. Kopriva et al[26] suggested an advanced method for single-channel blind separation of non-overlapping sources and applied it for the first time, to automatic segmentation of lung tumors in PET images. Feng et al[27] developed an iterative threshold method for lung tumor delineation on 18F-FDG (fluorodeoxyglucose—a radiopharmaceutical) PET images that can eliminate the influence of the heart over the imaging. Later, a novel image processing method capable of automatically detecting and ranking tumor candidates in the lungs using full-body PET images was presented by Hao et al.[28] Kano[15] proposed a distinct detection method which could identify malignant tumors in the lung area of a given FDG-PET/computed tomography (CT) image. This method firstly extracts tumor candidates by binarizing the PET image and then rejects false positives by constructing an “Eigen space” (space generated by the eigen vectors corresponding to the same eigen values). Sawada[17] reported a single-class classifier that could distinguish between true malignant tumors and false-positive results.Several other methods employ both PET and CT imaging for lung tumor detection and diagnosis. Teramoto[29] proposed a novel lung tumor detection method which operates with active contour filters to detect the rigorous nodules that were deemed “difficult” in previous CAD schemes. Guo et al[30] and Cuiying et al[31] proposed to apply SVM in order to train the vector of an image and its features, including heterogeneity, extracted from PET image and CT texture, so as to augment the diagnosis and staging of lung cancer. Punithavathy et al[32] proposed an Fuzzy C means clustering-based method that aims at developing a methodology for automatic detection of lung cancer from PET/CT images. In 2015, this research group[33] designed an artificial neural network (ANN) to facilitate the detection of lung cancer that combined the textural and fractal features extracted from PET/CT imaging. Wang presented a deep learning method based on backpropagation-ANN to classify non-small cells mediastinal lymph node metastasis of lung cancer using PET/CT imaging.[34] Ding et al[35] proposed a novel pulmonary nodule detection approach based on deep convolutional neural network.Summarizing, the research on lung tumor detection with PET imaging using deep learning technology is significant but rare. This is because low resolution and oversimplified imaging emanates a large number of false-positive results, irrespective of the admirable sensitivity of PET imaging for lung tumor detection. In this article, we propose a novel deep learning-based method using multiscale Mask Region–Based Convolutional Neural Network (Mask R-CNN) to address the aforementioned issues for detecting lung tumor in PET imaging. In this proposed method, we firstly produced 3 models of Mask R-CNN, which is a state-of-the-art object detection and segmentation model for lung tumor candidate detection. All the 3 models were fine-tuned and trained with certain data sets using images from 3 different scales. Then, these 3 models of Mask R-CNN were integrated using weighted voting strategy to diminish false-positive outcomes. The framework of our method is illustrated in Figure 1.
Imaging Characteristics of Lung Tumors in PET Scan
Functional imaging obtained by PET, which depicts the spatial distribution of metabolic or biochemical activities in the body, is vital in determining the diagnosis for a certain tumor. During a PET scan for cancer inspection, tracers such as FDG (a glucose-mimicking radioactive element) are administered intravenously to a patient. The γ-rays emitted from the patient due to the injected radiopharmaceutical are photographed by the nuclear imaging system. The PET images demonstrate the various levels of absorption of these rays (standard uptake value [SUV]) of the FDG throughout the body. We analyzed the different absorption levels of FDG by tissues and lesions, thus distinguishing between the normal and the pathological regions.[36]In the undertaken PET scan, the lung tumor area showed a higher SUV than the other tissues of the chest cavity. Employing Wang et al’s research,[37] the maximum SUV values of squamous cell carcinoma, small-cell carcinoma, adenocarcinoma, and benign lesions were 12.57 ± 4.34, 10.6 ± 2.90, and 8.19 ± 6.01, respectively, which is evidently higher than the SUVs of normal chest tissues. However, the inflammatory lesions and heart tissues also tend to absorb a higher amount of FDG with an equally higher SUV depiction than the surrounding areas, thus resulting in a false-positive result. This needs to be identified with a true positive by a radiologist or by CAD.
Related Work
Mask R-CNN
Mask R-CNN[38]—a deep neural network that can deduce instance segmentation and classification—is the latest and the most effective and beneficial in-depth learning model. Mask R-CNN extends faster R-CNN[39] by adding a branch for the prediction of segmentation masks on each region of interest (ROI) parallel to the existing branch for classification and bounding box regression. The masked branch is a small fully convolutional network applied to each ROI, predicting a segmentation mask in a pixel-to-pixel manner. The Mask R-CNN comes across as a network that is easy to implement and train due to the Faster R-CNN framework that facilitates a wide range of flexible architectural designs. Additionally, the masked branch only adds a small computational overhead, enabling a faster system and experimentation.
Ensemble Learning
Ensemble learning, as its name implies, enables multiple individual learners to perform deep learning tasks together by combining them. It is often referred to as a “multiclassifier system” or “committee-based learning.” Ensemble learning proposes the idea of amalgamating multiple individual learners with a certain strategy. The combination of multiple learners can aid and guide each other with their own strengths as well as yield better performance together. The combination strategy for individual learners usually includes the following:Major voting: The weight of each classifier is the same, while the minority is subordinate to the majority and more than half of the votes are obtained as the classification result.Weighted voting: Each classifier has different weights. Each weak learner multiplies the number of classified votes by a weight, and finally the weighted votes of each class are totaled. The maximum value of the corresponding class or the voting value above a certain threshold value is identified as the final result.
High sensitivity and low false-positive outcomes are vital parameters for lung tumor candidate detection using CAD. However, due to the ambiguity in PET imaging, excessive false positives become the main barrier in lung tumor detection using PET, even after implementing a deep learning algorithm for feature extraction. To address this problem, we propose a novel method based on multiscale Mask R-CNN, where 3 different scales of Mask R-CNN are used together, which proves lucrative to detect lung tumor candidates from PET images.Images with 3 different scales were used to produce 3 training data sets: PET images with resolution 512 × 512, 768 × 768, and 1024 × 1024, respectively. All the PET images used in this study were obtained from Changhai Hospital PET/CT Center, and the data were stored in the DICOM format. Every training data set included 594 slices from 62 patients with lung cancer. The test data consisted of 134 slices from 18 cases, in which 74 slices from 8 cases were patients with lung cancer, and 60 slices from 10 cases were healthy ones and is shown in Table 1.
Table 1.
The Number of Training Data and Test Data.
Date Type
Number of Slices
Number of Patients/Cases
Training data
594
62
Test data—abnormal
74
8
Test data—normal
60
10
Total
728
80
The Number of Training Data and Test Data.The PET scan system used was Siemens Biograph 64 HD PET-CT, whose supplier is Siemens, Knoxville, Tennessee. Now this machine is located in Changhai Hospital. The image pixels were 168 × 168 and the full-body PET image was of 274 slices. We took the PET slice from 40th to 120th layers that corresponded to the location of the thoracic cavity. The abnormal image data used in this study has been confirmed as lung tumor by pathological examination.All the training images were labeled using “Labelme” software under the guidance of 2 certified radiologists. “Labelme” is an image-tagging software that was developed by Massachusetts Institute of Technology (MIT; Download Link: http://labelme.csail.mit.edu/Release3.0/). The training set input included the original PET image and the segmented lung tumor masked image. Each training image is marked as 2 parts: lung tumor and background (see Figure 2).
Figure 2.
Examples of training image. (Left is original PET image. Middle is the mask of the lung tumor. Right is the fusion image of original PET and mask of lung tumor). PET indicates positron emission tomography.
Examples of training image. (Left is original PET image. Middle is the mask of the lung tumor. Right is the fusion image of original PET and mask of lung tumor). PET indicates positron emission tomography.To fit the size of lung tumor in the PET image with different resolutions, we set the scale of 5 anchors in each model as follows: 4, 8, 16, 32, and 64 for resolution 512 × 512; 8, 16, 32, 64, and 128 for resolution 768 × 768; and 16, 32, 64,128, and 256 for resolution 1024 × 1024. Batch size was 8; steps per epoch were 50; epoch number was 300 with a learning rate of 0.0001. The network was trained on 2 GPUs (GeForce GTX TITAN X, 12 GB RAM).Models with different scales could extract features of lung tumor at different scales, which can provide more comprehensive and enhanced information to aid lung tumor detection. In the next step, the lung tumor candidate extracted by the 3 models could be analyzed by ensemble learning for false-positive reduction, finally achieving suitable lung tumor detection.
Ensemble Model–Based False-Positive Reduction
In this step, an ensemble model was proposed to concatenate different scales of Mask R-CNN so that the sequence produced could diminish the false-positive results. This ensemble model consisted of 2 parts: (1) matching and labeling operation and (2) weighted voting.(1) Matching and labeling operationFigure 3 shows the diagram of matching and labeling operation. As shown in Figure 3B, in Model-512, recorded as , was used to identify the same mask in Model-768 and Model-1024. If the overlap of and the mask in Model-768 was more than 50%; then both the masks (Model-768 and ) were identified as one and were recorded as . could also be identified using the same criteria. Then, was used to match the unlabeled masks in Model-768 and Model-1024, and and could be derived. All masks in Model-512, Model-768, and Model-1024 could be matched and labeled by analogy. The details of matching and labeling are shown in Figure 3A-D.
Figure 3.
Matching and labeling operation. (A) A test result of Model-512, Model-768, and Model-1024. (B) Masks of Model-512 were achieved by the matching and labeling operation with Model-768 and Model-1024, respectively. (C) Matching and labeling operation was achieved between Model-768 and Model-1024. (D) Unlabeled masks of Model-1024 was achieved by the final labeling operation.
Matching and labeling operation. (A) A test result of Model-512, Model-768, and Model-1024. (B) Masks of Model-512 were achieved by the matching and labeling operation with Model-768 and Model-1024, respectively. (C) Matching and labeling operation was achieved between Model-768 and Model-1024. (D) Unlabeled masks of Model-1024 was achieved by the final labeling operation.(2) Weighted votingThe second step was the weighted voting process. The confidence of masks generated by Mask R-CNN was regarded as the weight value, and the masks generated by all the 3 models were voted using this confidence value in order to reduce the number of false positives. The confidence of masks with the same label were summarized and reassigned to the mask. The mask is considered as a false-positive result if its final confidence is less than a certain threshold value. The detail of weighted voting is shown in Figure 4.
Figure 4.
The diagram of weighted voting.
The diagram of weighted voting.As shown in Figure 4, the masks with the same tag “i” were viewed as the mask of the same lung tumor candidate, which is represented as ; represents confidence of . For example, represents the confidence of for Model-512. The values of ,, and are in the range of [0,1], where 0 means that no matching mask has been found in the model. The details of voting operation are given as follows:If certain threshold,is true positive,else is false positiveEnd
Experimental Results and Analysis
Evaluation Criteria
The F score, precision, and recall parameters were used as the evaluation metric. The F score, precision, and recall can be calculated using the equation:where the values of TP (number of true positives), FP (number of false positives), and FN (number of false negatives) were computed according to the definitions proposed in previous work.[40]
Evaluation of Ensemble Mask R-CNN Model
We evaluated the framework of ensemble model by comparing it with the performance of a single model. The value of F score, precision, and recall of 3 single models and ensemble model are shown in Figure 5. Recall of all 3 models was obtained to be 1, which affirmed that every single model was sensitive enough for detecting lung tumor with an effective detection of true positives. Precision value indicated the ratio of true positive numbers to all detected positive values. Precision values of Model-512, Model-768, and Model-1024 were 0.60, 0.53, and 0.59, respectively, which suggested that each single model could still produce several false positives. It can be observed that the ensemble model yielded more accurate and effective results in lung tumor detection. The precision and F score of the ensemble model was 0.90 and 0.95, which was 0.3 and 0.2 higher than that of Model-512, where Model-512 was the best-performing single model. The recall of ensemble model was 1, which is equal to that of the single model. Compared to the single model, the ensemble model extracted more enhanced and comprehensive features and used weighted voting strategy for lung tumor detection, while being more effective and accurate in reducing the false positives.
Figure 5.
Comparative histograms of precision, recall, and F-score between single model and ensemble model.
Comparative histograms of precision, recall, and F-score between single model and ensemble model.Figure 6 shows the P-R curves of the ensemble model and single models. The more convex the top-right corner of the P-R curve, the better was its corresponding model. From Figure 6, we can observe that the ensemble model combined the advantages of all 3 single models and demonstrated optimum overall performance. In the single model, the tumor candidate extracted from Model-512 exhibited a trend, where the confidence of a true positive was higher than that of a false positive on the whole and this trend showed a decline in Model-768 and Model-1024, in turn. Therefore, in the single model, Model-512 performs best, followed by Model-768 and Model-1024. At the same time, although many false-positive results were produced in each single model, the spatial distribution of false-positive results produced by different models showed staggered distribution as shown in Figure 7. Therefore, the 3 models were integrated and evaluated comprehensively from the perspective of spatial distribution and confidence for tumor candidates, such as to reduce the number of false positives, in order to achieve suitable detection of lung tumor.
Figure 6.
P-R curves for single model and ensemble model.
Figure 7.
Staggered spatial distribution for false positives. Top left image, A false positive was extracted in Model-512 but not in Model-768 and Model-1024. Top right image, A false positive was detected in Model-768 but not in Model-512 and Model-1024. Below left image and below right image, The test results of same slice for Model-768 and Model-1024. It showed that a false positive was detected in Model-768 and Model-1024 respectively but not in Model-512.
P-R curves for single model and ensemble model.Staggered spatial distribution for false positives. Top left image, A false positive was extracted in Model-512 but not in Model-768 and Model-1024. Top right image, A false positive was detected in Model-768 but not in Model-512 and Model-1024. Below left image and below right image, The test results of same slice for Model-768 and Model-1024. It showed that a false positive was detected in Model-768 and Model-1024 respectively but not in Model-512.
Evaluation of Weighted Voting Strategy
We compared the 2 ensemble strategies: major voting and weighted voting. Major voting did not consider confidence as weight; hence, it used 1 as the weight. If the overlap rate between the 2 masks was more than 50%, the 2 masks were identified as 1 unified mask, and the weights of these masks got superimposed. If the mask obtained 3 votes, it was considered as a positive result, otherwise as a negative result. The comparative diagram of precision, recall, and F score between major voting and weighted voting are shown in Figure 8. The values of precision, recall, and F score obtained by the weighted voting strategy were higher than those of major voting, while the weighted voting method yields better performance. The weighted voting strategy introduced confidence as an important indicator for judging tumor, which made tumor detection more flexible and effective. Therefore, the precision and F score values of the weighted voting strategy were refined in turn. This affirmed that the weighted voting strategy was much more efficacious in the reduction of false positives.
Figure 8.
Comparative histograms of precision, recall, and F score between major voting and weighted voting.
Comparative histograms of precision, recall, and F score between major voting and weighted voting.However, there is a limitation to this study. The number of training data set is insufficient. Furthermore, in the future work, we will try to increase the accuracy of this method by further cooperation with hospitals and obtaining sufficient image data for experiments.
Conclusion
In this article, we propose a novel lung tumor detection method based on Mask R-CNN. This method incorporates multiscale models based on Mask R-CNN in order to detect lung tumor candidates from PET axial slices. Weighted voting of ensemble learning was used for the curtailment of false positives. Experimental results demonstrate that the proposed method could effectively and precisely detect lung tumors while suitably avoiding incorrect detection of tumors. Thus, this method could prove highly pivotal in aiding radiologists by acquiring proper interpretation of PET images, and rendering efficient auxiliary diagnostic information, to ensure accuracy and consistency in radiological diagnosis as well as reduction in image interpretation time, ensuring timely and promising diagnosis of patients with such ailments.
Authors: Heang-Ping Chan; Jun Wei; Yiheng Zhang; Mark A Helvie; Richard H Moore; Berkman Sahiner; Lubomir Hadjiiski; Daniel B Kopans Journal: Med Phys Date: 2008-09 Impact factor: 4.071
Authors: Peter B Bach; Joshua N Mirkin; Thomas K Oliver; Christopher G Azzoli; Donald A Berry; Otis W Brawley; Tim Byers; Graham A Colditz; Michael K Gould; James R Jett; Anita L Sabichi; Rebecca Smith-Bindman; Douglas E Wood; Amir Qaseem; Frank C Detterbeck Journal: JAMA Date: 2012-06-13 Impact factor: 56.272
Authors: Xianjin Dai; Yang Lei; Tonghe Wang; Jun Zhou; Soumon Rudra; Mark McDonald; Walter J Curran; Tian Liu; Xiaofeng Yang Journal: Phys Med Biol Date: 2022-01-21 Impact factor: 3.609
Authors: Xianjin Dai; Yang Lei; Jacob Wynne; James Janopaul-Naylor; Tonghe Wang; Justin Roper; Walter J Curran; Tian Liu; Pretesh Patel; Xiaofeng Yang Journal: Med Phys Date: 2021-10-13 Impact factor: 4.071
Authors: Che Wei Chang; Feipei Lai; Mesakh Christian; Yu Chun Chen; Ching Hsu; Yo Shen Chen; Dun Hao Chang; Tyng Luen Roan; Yen Che Yu Journal: JMIR Med Inform Date: 2021-12-02
Authors: Martina Sollini; Francesco Bartoli; Andrea Marciano; Roberta Zanca; Riemer H J A Slart; Paola A Erba Journal: Eur J Hybrid Imaging Date: 2020-12-09