Literature DB >> 35462901

A Supervised ML Applied Classification Model for Brain Tumors MRI.

Zhengyu Yu1,2, Qinghu He3, Jichang Yang3, Min Luo1,3.   

Abstract

Brain Tumor originates from abnormal cells, which is developed uncontrollably. Magnetic resonance imaging (MRI) is developed to generate high-quality images and provide extensive medical research information. The machine learning algorithms can improve the diagnostic value of MRI to obtain automation and accurate classification of MRI. In this research, we propose a supervised machine learning applied training and testing model to classify and analyze the features of brain tumors MRI in the performance of accuracy, precision, sensitivity and F1 score. The result presents that more than 95% accuracy is obtained in this model. It can be used to classify features more accurate than other existing methods.
Copyright © 2022 Yu, He, Yang and Luo.

Entities:  

Keywords:  automation; brain tumor; classification; machine learning algorithms; magnetic resonance imaging

Year:  2022        PMID: 35462901      PMCID: PMC9024329          DOI: 10.3389/fphar.2022.884495

Source DB:  PubMed          Journal:  Front Pharmacol        ISSN: 1663-9812            Impact factor:   5.988


Introduction

In the human body, the brain is a complex organ. When brain tumors originate, uncontrolled cell division occurs in an abnormal series of cells forms in the brain (Logeswari and Karnan, 2010). That abnormal series of cells will destroy healthy cells and influent the general activity of the brain. Benign tumors and malignant tumors Brain are two classifications of brain tumors. Benign tumors grow slowly and originate in the brain; They are considered non-progressive or non-cancerous. Benign tumors cannot extend to any other organs inside the body. In contrast, malignant tumors are progressive and cancerous. They grow unexpectedly in an indeterminate manner. Primary malignant tumors can grow themselves. In addition, malignant tumors also can grow in other organs inside the body and spread to the brain. MRI is an imaging technology that can generate high-quality images of human anatomy. MRI provides extensive information for medical diagnosis and research (Zhang et al., 2011). The automation and accurate classification of MRI images has dramatically improved the diagnostic value of MRI (Scapaticci et al., 2012). However, one type of MRI cannot provide full details for brain tumours that contain many different tissues (Sudharani et al., 2016). Different weighted images are combined to develop the image segmentation of brain tumors. Three weighted MRI images (T1, T2, and FLAIR, in Figure 1) are used for image segmentation of the skull on different axial slices (Vannier et al., 1988; Clark et al., 1994; Dou et al., 2007).
FIGURE 1

Comparison of T1, T2 and flair of brain tumors MRI (Clark et al., 2013; Scarpace et al., 2022).

Comparison of T1, T2 and flair of brain tumors MRI (Clark et al., 2013; Scarpace et al., 2022). As one of the best imaging methods, researchers use MRI to analyze the progression of a brain tumor during the stages of detection and treatment. As MRI generates high resolution, brain structure information, such as brain tissue abnormalities, is detailed. Therefore, MRI significantly influences automatic analysis for medical images (Zacharaki et al., 2009; Litjens et al., 2017). Since medical images can be scanned and loaded into a computer, researchers have proposed different automated methods of observation and classification for brain tumor by exploiting brain MRI images (Litjens et al., 2017). Recently, two categories of research have been proposed. First is unsupervised classification, such as fuzzy c-means and self-organization feature maps (Ibrahim et al., 2013). Second is supervised classification, such as K Nearest Neighbours (KNN) and Support Vector Machine (SVM) (Cocosco et al., 2003; Chaplot et al., 2006). According to the results in classification accuracy, the performance of supervised classification is better than unsupervised classification (Zhang and Wu, 2008; Ibrahim et al., 2013). Nevertheless, most of the classification accuracy is less than 95% (Yeh and Fu, 2008). In the past decades, SVM and Neural Network (NN) become popular due to the outstanding performance for detecting and classifying brain tumors (Ibrahim et al., 2013). Recently, deep learning methods have established novel modeling in machine learning. Complex relationships can be displayed effectively without the need for many nodes by deep architectures, such as SVM and KNN. In this case, they have rapidly developed into the most advanced technologies in various health research fields (such as medical image analysis, medical informatics, and bioinformatics) (Pan et al., 2015; Ravì et al., 2016; Litjens et al., 2017).

Materials and Methods

Supervised machine learning algorithms applied classification method is proposed to classify whether the cysts are detected from the MRI of brain tumors. Figure 2 illustrates the workflow diagram for the training and testing models of the classification method. The process is summarised below:
FIGURE 2

Workflow diagram for training model and testing model.

1) Extract datasets of Brain tumors MRI images. The datasets are from the Repository of Molecular Brain Neoplasia Data (REMBRANDT) in this research (Clark et al., 2013; Scarpace et al., 2022). 2) Extract features. Table 1 presents that there are 30 features extracted from brain tumors MRI, including 21 categorical features and 9 numerical features. Feature 8 is selected as a target feature; The rest are selected as attributes.
TABLE 1

Data features extracted from brain tumors MRI.

NumberFeaturesTypeNumberFeaturesType
1Tumor LocationCategorical2Side of Tumor EpicenterCategorical
3Eloquent BrainCategorical4EnhancementCategorical
Quality
5ProportionNumerical6Proportion nCETNumerical
Enhancing
7ProportionNumerical8Cyst(s)Categorical
Necrosis
9Multifocal or MulticentricCategorical10T1/FLAIR RATIOCategorical
11Thickness of enhancing marginCategorical12Definition of the enhancing marginCategorical
13Definition of the non-enhancingCategorical14Proportion of EdemaNumerical
margin
15Edema CrossesCategorical16HemorrhageCategorical
Midline
17DiffusionCategorical18Pial invasionCategorical
19Ependymal invasionCategorical20Cortical involvementCategorical
21Deep WM invasionCategorical22nCET tumorCategorical
Crosses Midline
23Enhancing tumorCategorical24SatellitesCategorical
Crosses Midline
25Calvarial remodelingCategorical26Extent of resection of enhancingNumerical
tumor
27Extent resection of nCETNumerical28Extent resection of vasogenic edemaNumerical
29 and 30Lesion SizeNumerical
3) Machine learning algorithm classification comparison. Supervised machine learning algorithms applied classification methods, such as Decision Tree (DT), SVM, KNN and NN have been compared to estimate the performance for each training model. Cross-validations are computed on different folds to avoid overfitting. 80% of the datasets are used for training model. The result indicates that the model using DT is the most accurate. 4) The testing model is evaluated by using 20% of the datasets; in this stage, feature 8 is also selected as a target feature; the rest of the features are selected as attributes. The results present that the performance of the DT model with 30 cross-validation folds is the best. 5) After the final model has been evaluated, the result is predicted that the accuracy of the final model is 95.9%. Workflow diagram for training model and testing model. Data features extracted from brain tumors MRI.

Datasets

The dataset we used for the research is REMBRANDT (Scarpace et al., 2022). It is accessed from The Cancer Imaging Archive (TCIA) database (Clark et al., 2013). REMBRANDT is purposed to explore the link between the data from genomic characterization and clinical information. and clinical information. REMBRANDT consists of pre-surgical MRI for 130 patients, including 174 studies, 1,483 series, and 110,020 images. Table 1 presents 30 extracted features from brain tumors MRI, including 21 categorical features and 9 numerical features.

Training Algorithms Methods

The DT classifier is a supervised machine learning technique to make decisions in a multistage way. The decision tree’s fundamental concept includes spreading a complicated decision into a group of more straightforward decisions. The result from this technique could be similar to the intended desired result (Hastie et al., 2009). The DT technique is a widely used data mining methodology to classify multiple covariates or predict a target variable by algorithms. Branda-like segments are classified via decision tree to consist of an inverted tree containing leaf node, interal node and the root node. The decision tree algorithm can efficiently determine complex and large data sets as its non-parametric structure. The data for the study is separated for training and validation when the data set size is too large. The training data sets are built for the decision tree model, whereas the validation data sets are built to approach the optimal final solution by appropriate tree size (Boser et al., 1992; Song and Lu, 2015). SVM is a commonly used machine learning methodology that classifies data mining problems by its relative flexibility and simplicity (Hearst et al., 1998). SVMs have been processed in a wide variety of biomedical applications. For instance, SVM can help automatically classify microarray gene data sets, where the gene expression profile can be examined if they are derived from peripheral fluid or a tumour sample for the result of diagnosis or prognosis. In brain diseases search, SVMs are usually applied by multivoxel pattern analysis due to the low possibility of overfitting when processing images with high dimensions. Recently, SVMs have been developed to predict prognosis and diagnosis in brain disorders research (Orrù et al., 2012). KNN is an effective and high-performance learning technique to classify and cluster data from a large scale in big data applications (Zhang et al., 2017). The original KNN technique typically set a value of K and select the nearest samples with the influential group. In selecting K nearest samples, KNN is calculated the similarity of all samples for training (Guo et al., 2003). This algorithm costs high memory of the computer and time to process extensive data. Nevertheless, KNN is one of the top techniques in data mining due to its significant performance (Deng et al., 2016). NN has been introduced as a vital tool for classification in recent research. NN is non-linear and self-adaptive. It is flexible in a complex data environment and can alter itself based on data without explaining of classification functions (Cybenko, 1989; Hornik, 1991). Moreover, NN has the advantage of performing statistical analysis and establishing classification functions with their capability of estimating the probabilities of posterior (Richard and Lippmann, 1991; Zhang, 2000).

Results and Discussion

The confusion matrix is applied to determine the accuracy, precision, sensitivity and F1 score for the performance of the classifier method. Table 2 shows the confusion matrix for the classifier method.
TABLE 2

Confusion matrix for the classifier method.

Actual class
Positive classNegative class
Predicted ClassPositive ClassTrue Positive (TP)False Positive (FP)
Negative ClassFalse Negative (FN)True Negative (TN)
Confusion matrix for the classifier method. The accuracy, precision, sensitivity and F1 score are calculated by equations below:

DT Classifier

After processing the training model, the machine learning classifier using DT algorithms indicates that the most accurate model is 96.2% at 30 folds cross-validation. Table 3 and Figure 3 present the value of accuracy, precision, sensitivity and F1-score for each fold cross-validation. At 30 folds cross-validation, 96.2% accuracy, 97.3% precision, 98.6% sensitivity and 97.9% F1-score are obtained.
TABLE 3

Performance of DT classifier.

Accuracy (%)Precision (%)Sensitivity (%)F1-Score (%)
5 folds91.195.994.695.3
10 folds94.296.397.696.9
15 folds93.794.898.696.7
20 folds91.193.597.395.4
25 folds94.996.198.697.3
30 folds96.297.398.697.9
FIGURE 3

Comparison diagram for the performance of DT classifier.

Performance of DT classifier. Comparison diagram for the performance of DT classifier.

SVM Classifier

After the training model has been computed by SVM algorithms, Table 4 and Figure 4 indicate that the most accurate model is 94.9% at 5, 15, 20 and 30 folds cross-validation. They all obtain 94.9% accuracy, 94.9% precision, 100% sensitivity and 97.4% F1-score.
TABLE 4

Performance of SVM classifier.

Accuracy (%)Precision (%)Sensitivity (%)F1-Score (%)
5 folds94.994.910097.4
10 folds93.793.710096.7
15 folds94.994.910097.4
20 folds94.994.910097.4
25 folds93.793.710096.7
30 folds94.994.910097.4
FIGURE 4

Comparison diagram for the performance of SVM classifier.

Performance of SVM classifier. Comparison diagram for the performance of SVM classifier.

KNN Classifier

In this case, the training model has been processed by KNN Classifier, Table 5 and Figure 5 present that the most accurate model is 93.7% which are at 10 and 20 folds, 25 and 30 folds cross-validation. 93.7% accuracy, 94.8% precision, 98.6% sensitivity and 96.6% F1-score are obtained for all of them.
TABLE 5

Performance of KNN classifier.

Accuracy (%)Precision (%)Sensitivity (%)F1-Score (%)
5 folds92.494.797.395.9
10 folds93.794.898.696.6
15 folds92.494.797.395.9
20 folds93.794.898.696.6
25 folds93.794.898.696.6
30 folds93.794.898.696.6
FIGURE 5

Comparison diagram for the performance of KNN classifier.

Performance of KNN classifier. Comparison diagram for the performance of KNN classifier.

NN Classifier

Table 6 and Figure 6 are generated from the training model by NN classifier, they present that the most accurate model is 92.4% which is at 10 cross-validation with 94.7% precision, 97.3% sensitivity and 95.9%.
TABLE 6

Performance of NN classifier.

Accuracy (%)Precision (%)Sensitivity (%)F1-Score (%)
5 folds86.195.789.292.3
10 folds92.494.797.395.9
15 folds88.695.891.993.8
20 folds89.995.893.294.5
25 folds88.694.593.293.8
30 folds83.594.287.890.9
FIGURE 6

Comparison diagram for the performance of NN classifier.

Performance of NN classifier. Comparison diagram for the performance of NN classifier.

Testing Model

All the classifiers are trained in the previous section. DT training model at 30 folds cross-validation with 96.2% accuracy is selected, which is the highest accurate model among the results. In this research, the testing model is used for evaluation with the rest of the datasets to verify the model’s performance. As Table 7 presented, the accuracy of DT classifier at 30 folds cross-validation in the testing model is 95.9%. Although this is lower than the score in the training model due to the overfitting classification, it is still the best model with the highest performance.
TABLE 7

Performance of Testing model.

DTTraining (%)Testing (%)
30 folds96.295.9
Performance of Testing model.

Conclusion

This article proposes a supervised machine learning applied classification model for brain tumors MRI. This model is developed to obtain higher classification performance of accuracy, precision, sensitivity and F1 score for the classification of features of brain tumors MRI. The optimized classification model with the most accurate result is developed by comparing with different supervised machine learning algorithms at different folds of cross-validation. After testing, the best performance of the model is obtained. This classification model can be used in other features of brain tumors MRI to obtain the most accurate result.
  11 in total

1.  A fully automatic and robust brain MRI tissue classification method.

Authors:  Chris A Cocosco; Alex P Zijdenbos; Alan C Evans
Journal:  Med Image Anal       Date:  2003-12       Impact factor: 8.545

Review 2.  Using Support Vector Machine to identify imaging biomarkers of neurological and psychiatric disease: a critical review.

Authors:  Graziella Orrù; William Pettersson-Yeo; Andre F Marquand; Giuseppe Sartori; Andrea Mechelli
Journal:  Neurosci Biobehav Rev       Date:  2012-01-28       Impact factor: 8.989

3.  Validation of magnetic resonance imaging (MRI) multispectral tissue classification.

Authors:  M W Vannier; T K Pilgram; C M Speidel; L R Neumann; D L Rickman; L D Schertz
Journal:  Comput Med Imaging Graph       Date:  1991 Jul-Aug       Impact factor: 4.790

Review 4.  A survey on deep learning in medical image analysis.

Authors:  Geert Litjens; Thijs Kooi; Babak Ehteshami Bejnordi; Arnaud Arindra Adiyoso Setio; Francesco Ciompi; Mohsen Ghafoorian; Jeroen A W M van der Laak; Bram van Ginneken; Clara I Sánchez
Journal:  Med Image Anal       Date:  2017-07-26       Impact factor: 8.545

5.  Efficient kNN Classification With Different Numbers of Nearest Neighbors.

Authors:  Shichao Zhang; Xuelong Li; Ming Zong; Xiaofeng Zhu; Ruili Wang
Journal:  IEEE Trans Neural Netw Learn Syst       Date:  2017-04-12       Impact factor: 10.451

6.  The Cancer Imaging Archive (TCIA): maintaining and operating a public information repository.

Authors:  Kenneth Clark; Bruce Vendt; Kirk Smith; John Freymann; Justin Kirby; Paul Koppel; Stephen Moore; Stanley Phillips; David Maffitt; Michael Pringle; Lawrence Tarbox; Fred Prior
Journal:  J Digit Imaging       Date:  2013-12       Impact factor: 4.056

7.  Brain tumor grading based on Neural Networks and Convolutional Neural Networks.

Authors:  Jocelyn Wong
Journal:  Conf Proc IEEE Eng Med Biol Soc       Date:  2015-08

8.  Classification of brain tumor type and grade using MRI texture and shape in a machine learning scheme.

Authors:  Evangelia I Zacharaki; Sumei Wang; Sanjeev Chawla; Dong Soo Yoo; Ronald Wolf; Elias R Melhem; Christos Davatzikos
Journal:  Magn Reson Med       Date:  2009-12       Impact factor: 4.668

Review 9.  Deep Learning for Health Informatics.

Authors:  Daniele Ravi; Charence Wong; Fani Deligianni; Melissa Berthelot; Javier Andreu-Perez; Benny Lo; Guang-Zhong Yang
Journal:  IEEE J Biomed Health Inform       Date:  2016-12-29       Impact factor: 5.772

10.  Decision tree methods: applications for classification and prediction.

Authors:  Yan-Yan Song; Ying Lu
Journal:  Shanghai Arch Psychiatry       Date:  2015-04-25
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.