Literature DB >> 34602811

A Systematic Review of Artificial Intelligence Techniques in Cancer Prediction and Diagnosis.

Yogesh Kumar¹, Surbhi Gupta², Ruchi Singla³, Yu-Chen Hu⁴.

Abstract

Artificial intelligence has aided in the advancement of healthcare research. The availability of open-source healthcare statistics has prompted researchers to create applications that aid cancer detection and prognosis. Deep learning and machine learning models provide a reliable, rapid, and effective solution to deal with such challenging diseases in these circumstances. PRISMA guidelines had been used to select the articles published on the web of science, EBSCO, and EMBASE between 2009 and 2021. In this study, we performed an efficient search and included the research articles that employed AI-based learning approaches for cancer prediction. A total of 185 papers are considered impactful for cancer prediction using conventional machine and deep learning-based classifications. In addition, the survey also deliberated the work done by the different researchers and highlighted the limitations of the existing literature, and performed the comparison using various parameters such as prediction rate, accuracy, sensitivity, specificity, dice score, detection rate, area undercover, precision, recall, and F1-score. Five investigations have been designed, and solutions to those were explored. Although multiple techniques recommended in the literature have achieved great prediction results, still cancer mortality has not been reduced. Thus, more extensive research to deal with the challenges in the area of cancer prediction is required. © CIMNE, Barcelona, Spain 2021.

Entities: Chemical

Year: 2021 PMID： 34602811 PMCID： PMC8475374 DOI： 10.1007/s11831-021-09648-w

Source DB: PubMed Journal: Arch Comput Methods Eng ISSN： 1134-3060 Impact factor: 8.171

Introduction

The word cancer comes from the ancient Greek kapkivoc, which means crab and tumor. Cancer was introduced to the medical world in the 1600 s and is associated with abnormally growing cells that can invade or spread to other parts of the body [136]. The uncontrolled growth of cells starts from a site in the human body and further spreads to other body parts known as cancer metastasis [43, 172]. Cancer cells are categorized into benign and malignant cells. The benign cells do not spread to other parts, while malignant cells metastasize and are considered more destructive. Due to high mortality and recurrence rate, its process of treatment is very long and costly. There is a need to accurately diagnose it early to enhance cancer patient's survival rate. It is a genetic disease triggered due to genetic mutations that control our cell's function, especially how they grow and divide. As the tumor cells continue to grow, additional changes will occur. In a nutshell, cancer cells have more genetic changes, such as mutations in DNA, than normal cells [116], 110]. Though the immune system generally discards damaged or abnormal cells from the body, few cancer cells can hide from the immune system. The tumor also uses the immune system to grow and stay alive [179]. The name of the cancer type is based on the site where tumor cells grow, for example, cancer that arises in the lungs and spreads to the liver is called lung cancer. Cancer diagnosis includes three predictive predictions related to cancer risk assessment, cancer recurrence, and cancer survivability prediction. Initially, the probability of cancer occurrence is assessed, followed by the second step, predicting cancer recurrence. The last step is to predict the aspects like progression, life expectancy, tumor-drug sensitivity, survivability [95].

Motivation

The motivation behind this research is the rapid growth in cancer incidence and mortality cases worldwide [10]. The reasons are complex but reflect both aging and growth of the population and changes in the prevalence and distribution of the main risk factors for cancer. Figure 1 depicts the cancer incidence cases and death statistics reported by the American Cancer Society and other reliable resources.

Fig. 1

Estimated number of new cases and deaths in 2020 for common cancer types (www.cancer.net)

Estimated number of new cases and deaths in 2020 for common cancer types (www.cancer.net) Multiple investigations have been done in cancer research; for example, Rong et al. [142] have led a mortality and survival study by gender orientation. Dolatkhah et al. [49] have introduced the investigation that revealed the endurance information and pattern examination of malignant breast growth in Iran. Goodarzi et al. [65] had introduced the assessment dependent on distinct cross-sectional malignant growth studies. Azamjah et al. [13] aimed to determine the 25-year breast cancer mortality rate in 7 super regions defined by the Health Metrics and Evaluation (IHME). Momenimovahed et al. [115] presented a study that determined that breast cancer incidence varies significantly with race and ethnicity and is higher in developed countries. Haggar et al. [66] introduced the examination which demonstrated the frequency, mortality, and survival rates for colorectal malignancy are with consideration paid to provincial varieties and changes after some time. Zhang et al. [184] led an investigation to gather the CRC frequency information from the Cancer Incidence in Five Continents. Wong et al. [174] observed a positive correlation between incidence and country-specific socio-economic development. Nguyen et al. [124] summarized the diagnosis and treatment of thyroid cancer, with recommendations from the American Thyroid Association regarding thyroid nodules and differentiated thyroid cancer. Lee et al. [176] have stated that from March 18 to April 26, 2020, 800 patients analyzed with a diagnosis of cancer and symptomatic COVID-19. 412 (52%) patients had a mild COVID-19 disease course. 226 (28%) patients died, and the risk of death was significantly associated with advancing patient age. Al-Zhou et al. [6] evaluated the demographic characteristics and histological trends of skin cancer in Southern areas of Yemen. Artificial Intelligence (AI) is one of the exceptional achievements of computer science conceived around the 1940s [5, 130]. AI has marked its significance in advanced clinical diagnostics by providing unique opportunities to incorporate the tools into the healthcare area [4, 131]. AI aims to analyze the associations between treatment techniques and patient outcomes. In cancer research, AI has proved its potential to affect several facets of cancer therapy, improved the accuracy and speed of diagnosis, and provided more reliable clinical decisions, leading to better health outcomes [182, 183]. AI provides an unprecedented cancer prediction accuracy level higher than a general statistical expert [152, 180]. Thus, AI-based cancer detection models can assist in health centers and help medical experts affirm their medical verdicts without any obstruction. Hence, the article aims to highlight the contribution made by the researchers in the field of artificial intelligence techniques for the early detection and diagnosis of cancer.

Contribution and Organization of Paper

We conducted an extensive survey of the conventional machine and deep learning models proposed in cancer research. The paper presents a comparative analysis of the existing research works using AI-based techniques and medical imaging for cancer diagnosis, medical imaging for diagnosis, and automated analysis in cancer diagnosis. Most of the techniques proposed in the different papers were based on the deep learning framework and provided appreciable prediction outcomes. The paper provides a description of cancer complications and clinical applications, cancer classification using AI-based techniques, the role of deep learning in cancer research, limitations of cancer prediction-related using automated learning, multiple investigations, and challenges corresponding to cancer research using AI-based techniques. The rest of the paper is organized as follows. Section 2 elaborates the research methodology. This section discusses the approach used for selecting the literature. Section 3 highlights the Cancer complications and clinical Applications. Section 4 expresses the reported work, which covers the deep learning perspective in cancer. This section further discusses the comparative analysis, which includes the challenges of the current work with performance evaluation using various other parameters. Section 5 delivers a thorough discussion; all the investigations are discussed in this section. Section 6 concludes the paper and discusses future directions.

Research Methodology

We conducted this systematic review under the PRISMA guidelines [40]. We performed an efficient search for selecting research articles on three different electronic databases, i.e., the web of science, EBSCO, and EMBASE. These are all openly available web indexes that list the entire content or metadata of academic writings. The articles were selected using the query ((Artificial Intelligence) or (Cancer Diagnosis) or (Early Detection) or (Machine Learning) or (Deep Learning)). The exclusion and inclusion standards used to select the articles are discussed in Sect. 2.1. Figure 2 presents the PRISMA flowchart depicting the detailed screening of the collected papers.

Fig. 2

PRISMA flow chart

PRISMA flow chart The articles published from 2009 to April 2021 have been included in this study. Total 350 studies were selected, and after removing duplicate ones, 275 studies remained. Subsequently, 210 papers were selected, and the studies focused on diseases other than cancer, treatment & surgery, a language other than English were excluded. Also, after this phase, the complete articles were evaluated, and the research articles that used methods other than AI-based techniques were also excluded from further analysis. Finally, the 185 selected articles were analyzed in the study.

Investigations

Investigation 1: Which Learning Approach has provided appreciable prediction outcomes extensively? Investigation 2: Which cancer site and training data has been explored most extensively? Investigation 3: In which year most of the cancer prediction studies have been published? Investigation 4: Which sorts of images have attained the highest prediction accuracy? Investigation 5: What are the Challenges faced by the researchers in the construction of AI-based prediction models.

Cancer Complications and Clinical Applications

The DNA present inside a cell is packaged into a vast number of individual genes and has instructions that communicate the cell's functions. [15]. DNA mutations are the reason for cancer development. The original functioning of the cells ultimately turns cancerous due to some error interruption in the multistage process [104, 185]. Figure 3 shows different factors that affect the spread of cancers. Tobacco, alcohol, improper diet, and few physical activities are the leading cancer risk factors worldwide. Some chronic infections are the risk factors for cancer and have major significance in low- and middle-income countries.

Fig. 3

Causes of cancers [26]

Cancer Complications

While undergoing cancer treatment, one can experience many complications that affect the health of the patient. However, not all cancers are painful while undergoing cancer treatment, but they still may have to experience some pain. But there are few medications and other approaches that help treat cancer-related pain [129, 184]. During cancer, one can experience fatigue and many symptoms, but usually, it is manageable [3]. Tiredness happens because of radiation therapy or chemotherapy treatments,however, it is generally short-term. Breathing is another complication because of cancer or cancer treatment [120]. However, treatments may bring relief whereas, some types of cancer and treatment of cancer can lead to nausea [34]. Cancerous cells deprive normal cells of required nutrients, which may ultimately cause a loss in weight. Majorly, even if nutrients are provided with the help of artificial ways via tubes in the vein or stomach, it still does not impact the reduction of weight [169], 21]. Cancer can also uplift severe complications because of the imbalance of the average chemical balance in the human body. Frequent urination, confusion, excessive thirst, and constipation might be the signs and symptoms of chemical imbalances [46]. In some instances, cancer can impact the body's immune system by attacking cancer cells to normal and fit cells. Paraneoplastic syndrome, a very uncommon reaction, can bring on several symptoms and signs like a problem in walk and seizures [7]. Cancer immensely affects the functioning of that body part as it may press on nearby nerves. It can cause headaches and signs and symptoms of stroke and maybe a weakness on one side of the human body if it involves the brain [47]. Suppose someone becomes successful in defeating once it may save one temporarily because cancer survivors always remain at the risk of occurrence [36]. So, the patient needs to hear from the doctor about the precautions.

Clinical Applications

Doctors can develop a plan for the future, consisting of scans and examine at regular fixed intervals of time (in the months or years) after the patient's treatment to investigate radiation treatment: In a radiation treatment, cancerous cells are targeted [30, 54]. A significant fraction of cancer cases and deaths can be preventable by having an excellent epidemiological and mechanistic understanding of environmental and behavioral risk factors. Cancer therapeutics presently have the most minimal clinical preliminary achievement pace of every significant sickness. Due to the scarcity of successful anti-cancer drugs, malignant growth will be the leading source of mortality in created nations. As a sickness inserted in the essentials of our science, cancerous growth presents troublesome difficulties that would profit by joining specialists from a wide cross-segment of related and random fields [55]. Along with causes, we have factors for identifications of the initial staging of cancer. Diagnosing cancer at an early stage ultimately leads to higher survival rates, less morbidity, and less expensive treatment [27]. Three essential steps need to be taken in a well-timed way: Alertness and get into precaution Medical valuation, analysis, and staging Get into therapeutics. The relevancy of early diagnosis is high in every situation and most cancers. Programs can be formulated to lessen hold-up in and obstruction to care, letting patients gain treatment well in time [31].

Current methodologies applied in the medical sector for cancer prediction

The section presents a description on the clinical practices applied in the medical sector for cancer prediction at present. The methodologies are described as follows: Screening: Screening aims to find people of particular cancer or pre-cancer who have not developed any symptoms and direct them quickly for analysis and treatment. For the specific type of cancer, screening can be effective when tests are used according to the need and stages [149]. Moreover, screening is a more complicated process to follow than early diagnosis. Screening is of utmost necessary to have an accurate diagnosis [10]. The main reason behind every type of cancer is that cancer needs a unique treatment schedule that includes single or extra modalities, such as chemotherapy, surgical procedures, and radiotherapy [16]. The main aim is to treat the tumor and significantly extend lifespan because improving a patient's life is also an unforgettable target [28]. Chemotherapy: The main aim of chemotherapy is to kill cancerous cells with the help of medications that target rapidly dividing cells. The drugs used to shrink tumors have dangerous side effects [71]. Hormone-level therapy: Hormone-level therapy works on the reaction of few hormones to the body. Hormones play a substantial role among people suffering from prostate or breast cancers [53]. Immunotherapy: Immunotherapy aims to strengthen the body's immune system to fight against cancerous cells. Checkpoint inhibitors and adoptive cell transfers are some examples of immunotherapy [150]. Personalized medication: Personalized medication is a newly developed approach with the help of genetic testing and determines suitable treatment for specific cancer. However, it is yet to prove that whether personalized medication can treat all kinds of cancers or not [24]. Radiation treatment: Radiation therapy kills the cancerous cells or slows down the growth of cancerous cells by damaging their DNA. Medical experts often recommend this treatment to shrink tumors or minimize cancer symptoms before surgery [89]. Stem cell transplant: Stem cell transplant is helpful for cancer that is related to blood, such as leukemia or lymphoma. The process involves the removal of RBC (Red Blood Cells) and WBC (White Blood cells), which have been destroyed because of the chemotherapy [34]. Surgery: Surgery is primarily done when a person is suffering from cancerous cells. It is also used to nullify the spread of the disease by removing the lymph nodes [48]. Targeted therapies: Targeted therapies are used to avoid the spread of cancer and improve immunity. Small-molecule drugs and monoclonal antibodies are examples of the target therapies [90].

Related Work

From the last couple of years, artificial intelligence has taken society’s imagination and created interest in its potential to progress our lives [91]. Now the usage of AI has been increasing rampantly to uplift disease recognition, its management, and the ramification of therapies. Because of the growing number of patients identified with cancer and the ample amount of data gathered during the treatment process [77, 119]. It leads to the need for AI to improve oncologic care. Cancer prediction can diminish the mortality rate [57, 118]. The section consists of cancer diagnosis based on deep learning methods, medical imaging for cancer, the mortality rate for different cancers, cancer dataset, and automated and semi-automated methods for cancer detection.

Artificial Intelligence in Medical Imaging for Cancers Diagnosis

In clinical imaging, computer-aided detection (CADe) or computer-aided diagnosis (CADx) is the system-based framework that helps specialists to make decisions rapidly [70]. Medical imaging manages data in the picture that the clinical specialist and specialists need to assess and examine abnormality in a timeframe [182, 183]. Clinical images prepared with AI strategies can propel the exactness in various cancer growth stages [121]. In this way, early malignancy determination and recognition clinical imaging is a robust method. Without a doubt, clinical imaging has been generally utilized for early malignancy discovery, checking, and follow-up after the medicines [44, 101, 102]. Figure 4 shows different kinds of scans used for cancer diagnosis. A computed tomography (CT) scan can help doctors diagnose cancer and determine the shape and size of the tumor. Nuclear medicine scans can help medical experts determine cancer metastasis. The most common nuclear scans are bone scans, PET (positron emission tomography) scans, Thyroid scans, MUGA (multigated acquisition) scans, and gallium scans. MRI assists specialists with discovering malignancy in the body and search for signs that it has spread. X-ray additionally can help specialists plan malignant growth therapy, similar to medical procedure or radiation, and Mammograms are low-portion x-beams that can help discover breast disease. Detection of Cancer usually includes radiological imaging that examines the extent of cancer and improvement after treatment. Oncological imaging is constantly turning into more wide-ranging and precise [95]. Suberi et al. [162] proposed an image-based computer-aided system for cancer immunotherapy. The proposed approach enhanced the preparation of the vaccine with Dendritic Cells (DCs) immunotherapy. The study has incorporated various image-based algorithms have into the system with low computational time.

Fig. 4

Types of imaging for cancer test

Types of imaging for cancer test Nirupama and Damodhar [126] predicted lung cancer using the MRI scans (Dicom images). Win et al. [171] developed a computer-aided decision system to detect the cancer cells in cytological pleural effusion images. Initially, median filtering and intensity adjustment were applied to enhance the quality of the picture. They used a hybrid segmentation method to extract cell nuclei based on simple linear iterative clustering and K-means clustering. In a K- means clustering algorithm, the error of each data point is computed using the distance (Euclidean) between the data point and nearest centroid as shown in Eq. (1), and further compute the total sum of the squared errors. In the Eq. (1), , and represent the objective function, the number of clusters, and number of cases, respectively. Also, represents case of cluster and is the centroid for cluster. Another distance metric used in K-means clustering is cosine similarity, expressed mathematically in Eq. (2). In Eq. (2), and are the Euclidean norms of the vector and vector , respectively. Rosalidar et al. [140] presented the asymmetrical thermal distribution on breast thermograms using computer-assisted technology. The reported work has shown that the current neural learning models have increased the classification accuracy of breast cancer thermograms. Taher et al. [165] worked on the CAD system to diagnose lung cancer. They used the database of 100 sputum color images of different patients collected from the Tokyo Centre of lung cancer. The new CAD system processed the sputum images and classified them into benign or cancerous cells. Another factor observed in the study was the superior performance of Bayesian classification over the rule-based heuristic classification. The Bayesian algorithm works by computing posterior probabilities as shown in Eq. (3). In Eq. (3), and are the prior probability of class and predictor, respectively. Also, and denote the posterior probability of target () given predictor () and the probability of given , respectively. Naeem et al. [117] introduced the AI (ML) strategies for liver malignancy order using a fused dataset of two-dimensional (2D) computed tomography (CT) and attractive reverberation imaging (MRI). From that point, a combination of MRI and CT-filter datasets produced the fused optimized hybrid-feature dataset. The MLP has indicated a promising exactness of 99% among all the conveyed classifiers. Kalaiselvi et al. [80] have also proposed a fuzzy c-means method to detect automatic brain tumors from T2-weighted MRI brain images using the principle of modified minimum error thresholding (MET). Lee et al. [99] discovered the most widely recognized type of disease types, particularly breast malignancy, prostate disease, cellular breakdown in the lungs, and skin disease. A new proposed distributed computing structure has motivated the specialists to use the current deals with picture-based disease investigation and build up a more flexible CAD framework for discovery [87]. introduced an edge technique for sectioning mammographic pictures to identify Breast malignancy in its beginning phases. [127] evaluated a computer-aided diagnosis (CADx) system for lung nodule classification. The retrospective study hand-crafted imaging features with machine learning algorithms and compared support vector machine (SVM) and gradient tree boosting (XGBoost) as machine learning algorithms. Gradient boosting classifiers works by first computing the error done by each misclassified instance as shown in Eq. (4) and then increasing the weight of misclassified instances in the next layer as shown in Eq. (4). Here, denotes the error, is the weight associated with each instance and is the size of the dataset, and denotes the number of the weak learners. The hypothesis for each of the s instances is evaluated under the condition function . The weight Updation formula is given in Eq. (5).

Deep learning methods for cancer detection

Deep learning is a sub-part of AI, which falls under artificial intelligence. Deep learning is a technique that takes in the features from the data, for instance, text, pictures, or sound. Deep learning is one of the most significant attributes of AI [101, 102]. Traditional AI methodologies require gathering steps to achieve the portrayal task, including pre-getting ready, feature extraction, and wary selection of features, learning, and request [113]. The introduction of these systems is solidly dependent on the picked features, which may not be the right features to isolate between classes. At the same time, Deep learning engages the robotized learning of the capacities for different endeavors instead of standard AI methodology. It can achieve the learning and gathering in one shot [114]. Figure 5 shows the deep learning methods for cancer diagnosis and detection by analyzing the medical imaging in different steps. This section discusses the purpose of various deep learning models such as auto-encoder, transfer learning, Convolutional Neural Networks, Gradient Descent, Generative Adversarial Networks, and Boltzmann Machines for cancer diagnosis and detection. Yu et al. [178] built up an information-based discovery technique that utilized deep learning strategies for lincRNA discovery and created DNA genome examination [82]. Second, approving the commented on lincRNAs record locales and testing the presence of deep learning strategy by contrasting and customary procedures. For the primary objective, the auto-encoder method accomplished a 100% rate.

Fig. 5

Deep learning process for cancer diagnosis [1]

Deep learning process for cancer diagnosis [1] An auto-encoder strategy is made out of three primary strides, as demonstrated in Fig. 6: building, pre-preparing, and approving. The fundamental design, including an input layer, concealed layer, and initiation capacities, is fabricated in the initial step. Also, the encoder and the decoder are prepared layer by coating following the pre-arranged cycles. Thirdly, fine-grained preparing/approval is performed through the whole model. All in all, the initial step develops the fundamental system of the deep neural organization, the subsequent one trains the layer-wise hubs, and the last one moves through all layers for approval. Brosch et al. [35] described a method that learned the 3D brain image using a deep belief network. Their approach took low computational time and less memory. Kadam et al. [79] also proposed a feature ensemble learning based on Sparse Auto-encoders and Softmax Regression for classification of Breast Cancer into benign (non-cancerous) and malignant (cancerous). An Auto-encoder consists of an encoder part and a decoder part, an artificial neural network trained using unsupervised learning that applies the back-propagation approach. Sparse Auto-encoder (SA) is an Autoencoder imposed with sparseness constraints on all hidden nodes and the sparse penalty term. The cost function for training a Sparse Auto-encoder (given by Eq. (6) includes three attributes. The first term is called mean square error, which offers the discrepancy between input and reconstructs the whole training data.where

Fig. 6

Working of auto-encoder method [126]

Working of auto-encoder method [126] Mean Squared Error computes the average squared difference between predicted and the actual value. MSE is expressed mathematically in Eq. (7) where and are the vectors of observed and predicted values Li [100] also proposed a practical and self-interpretable invasive cancer diagnosis solution for the diagnosis of breast cancer. Also, Krithiga et al. [88] carried a systematic review on breast cancer that focused on the call for specific action in the diagnostic processes. Similarly, Bulten et al. [32], Sajja et al. [145] also proposed a deep neural network based on GoogleNet with a maximum dropout ratio to moderate the processing time for detection of lung cancer using CT scan images. In the proposed approach, 60% of neurons are at a fully connected layer with which higher drop rate than the existing GoogleNet. Experiments were conducted using the three pre-trained CNN architectures such as AlexNet, GoogleNet, and ResNet50 on LIDC pre-process dataset. ResNet50 produced the highest accuracy than the pre-trained architectures and the state-of-the-art methods. The main components working behind the deep learning architecture are the "neurons" that compute average k vector values, and q denotes the column vector of weights. The working is mathematically expressed in Eq. (8). Further, bias (b) gets updated with each iteration and added to adjust the output, as shown in Eq. (9). The functioning of layer k is explained in Eq. (10), where g and are the non-linear function and activation functions. The function of each is further computed, as shown in Eq. (11). Kassani et al. [78] proposed a successful deep learning-based technique utilizing a DCNN descriptor and pooling activity to characterize breast malignancy. The creators likewise utilized diverse information enlargement strategies to help the exhibition of order and explored the impact of various stain standardization strategies. The proposed approach using the pre-prepared Xception model accomplished 92.50% order precision. Chen et al. [37] proposed a transfer learning-based depiction group (TLSE) strategy by incorporating preview outfit learning with move learning in a brought together and composed manner. Preview outfit gives troupe benefits inside a solitary model preparing methodology while moving learning centers around the little example issue in cervical cell arrangement. Figure 7 portrays the transfer learning-based approach ensemble strategy for cervical cell arrangement reason. The TLSE technique is assessed on a pap-smear dataset called Herlev dataset and is demonstrated to have a few superiorities over the leaving strategies. It shows that TLSE can improve the exactness with just one preparing measure for the little example in fine-grained cervical cells arrangement. Alzubaidi et al. [9] introduced a crossover deep convolutional neural organization to arrange hematoxylin–eosin-stained bosom biopsy pictures into four classes: obtrusive carcinoma, in-situ carcinoma, kind tumor, and normal tissue. The model consolidated two ideas, which are equal convolutions with various channel sizes and leftover connections. The foundational layout of the proposed model has as conspicuous attributes a superior component portrayal and the mix of highlights at multiple levels. This study achieved a precision of 90% precision in predicting breast cancer. Sasikala et al. [151] performed the detection of skin cancer lesions as malignant (melanoma) or benign using the CNN. The system's performance was evaluated using the accuracy and error rate with varying learning rates. Hosny et al. [76] introduced a programmed skin injuries grouping framework with a higher characterization rate utilizing the hypothesis of move learning and the pre-prepared deep neural organization. The exchange learning has been applied to the Alex-net in various manners, including the arrangement layer with a softmax layer. The presentation of the framework is measured with the ISIC dataset and got 93% precision. Nivaashini and Soundariya [128] The proposed system uses a Deep Boltzmann Machine (DBM) to find an efficient set of features. Deep Neural Network (DNN) classifier is used to classify the tumor into benign or malignant breast cancer groups. The proposed system obtained a higher detection rate of 99.73% than the conventional machine learning models.

Fig. 7

Transfer learning-based snapshot ensemble method [37]

Transfer learning-based snapshot ensemble method [37] Figure 8 shows the typical segmentation with Deep Learning: A Convolutional Neural Network (CNN) based model is discovered. It first packs up the source picture with a heap of various convolution, actuation, and pooling layers. The inverse operation extends the compacted latent representation. The organization is kept from start to finish trainable. At the test time, a forward pass gives the segmentation labels, which first packs the information picture measurements with a heap of convolutional and pooling layers. Altaf et al. [1], Gomez et al. [59] also proposed a CNN-based breast disease diagnosis technique by utilizing thermal pictures. The creators showed that an all-around delimited data set split method is required to decrease the bias and overfitting during the training process. They likewise introduced the studies on the DMR-IR data set. Exploratory outcomes affirmed that the data set split approach limits the overfitting and bias during training. The creators also passed on that state-of-the-art benchmark of CNN models, for example, ResNet, SeResNet, VGG16, Inception, InceptionResNetV2, and Xception, the DMR-IR data set. Albahar [8] proposed a prediction model that grouped skin injuries into kind-hearted or harmful sores dependent on a novel regularize method. The proposed model accomplished a standard exactness of 97.49%, which indicated its prevalence over other state-of-the-art strategies. The presentation of CNN as far as AUC-ROC with an implanted novel regularizer was tried on various use cases. The Area under the curve (AUC) accomplished for nevus against melanoma sore is 77%. Ragab et al. [135] proposed a computer-aided diagnosis (CAD) structure for requesting thoughtful and undermining mass tumors in breast mammography pictures. The deep convolutional neural association (DCNN) is used to incorporate extraction. An outstanding DCNN design named AlexNet is used and is aligned to mastermind two classes instead of 1,000 classes. The last related convolution layer is associated with the support vector machine (SVM) classifier to improve exactness. The results are obtained using the going with transparently open datasets (1) the electronic informational index for screening mammography (DDSM) and (2) the Curated Breast Imaging Subset of DDSM (CBIS-DDSM). The mathematical working of linear, polynomial, and radial basis function (rbf) kernel is expressed in the Eqs. (12), (13), (14), respectively. Here, are n-dimensional inputs. Here, is the constant and is the degree of freedom. Here, is the free parameter.

Fig. 8

Deep learning-based CNN model for segmentation of MRI imaging [1]

Deep learning-based CNN model for segmentation of MRI imaging [1] Saraf and Kalpana [148] presented the work for classifying the benign and the malignant thyroid nodules in ultrasound images. The author performed pre-processing, segmentation, feature extraction as well as the classification for thyroid detection. Edge detection techniques have been used for segmentation purposes and detected malignant nodule using ANN. Similarly, Dov et al. [51] also presented the work for predicting thyroid-malignancy from the ultra-high-resolution whole-slide images of the cytopathology. A deep-learning-based algorithm has been used for the cytopathologist diagnosing the slides. The projected algorithm assigns the relevant image regions to the local malignancy scores, which are incorporated into global malignancy. The reported output of the presented work using the MIL method is 0.87 Area under the curve (AUC) and 0.743 average precision (AP). Ma et al. [106] also proposed that the CNN diagnose thyroid-based diseases using the SPECT images. The projected method used the modified DenseNet architecture as well as the improved training method. The accuracy achieved using the proposed method is 99.08% for Grave’s disease, 99.25% for Hashimoto disease, and 99.67% for Subacute disease. Sokoutil et al. [161] presented the work for detecting tumors in the thyroid gland. The reported work depicts the image processing technique and the simple, intelligent system like the hill-climbing algorithm. Malathi et al. [107] presented the CNN method for the segmentation of brain tumors and achieved high prediction accurateness [132], compared three segmentation algorithms and proposed a Random Forest (RF) classifier, and convolution neural network. RF and CNN yielded an average Dice’s coefficient (DC) of 0.862 and 0.876, respectively. The RF classification method computes the information gain for a split using Entropy (E). Mathematically, is expressed in Eq. (15). Here, is the number of classes (binary or multi) and is the likelihood that an instance belongs to the class n. Image processing techniques have been widely used in various health sectors, especially detecting and diagnosing cancer early. Huidrom et al. [75] used Juxta-Pleural nodules inclusion which was a fully automated lung segmentation method, and it consisted of two main stages. In its first stage, the Lung region was extracted, also known as lung field extraction, followed by the second stage, lungs were segmented using boundary analysis and segmentation techniques. It has been observed that their proposed method yielded a better result than that of the existing ones. Whereas, Asideu et al. [12] proposed a technique in which automatic features were extracted and classified for acetic acid and Lugol’s iodine cervigrams. The study employed various techniques for combining the features in cervigrams and used a support vector machine model to classify cervigrams. Cheng et al. [38] used a CAD system to detect and classify breast cancer. They did it in four stages, i.e., pre-processing, segmentation, feature extraction, and feature classification. Patil et al. [131] presented the automated system to build the mammogram breast detection model with improved hybrid classifiers. Image processing, tumor segmentation, feature extraction, and diagnosis are the well-designed steps for detecting projected breast cancer. [122] launched automated multi-strategy-based lung nodule detection and the classification system, which contains the objective of the bogus positive decrease at the beginning phases. Cui et al. [41] proposed the strategy to perceive lung nodules in the pictures of chest CT and improved DICOM windows show. During this experiment, the nodule recognition was 92.65% sensitive with 0.2468 FPs/filter.

Comparative Analysis

The comparative analysis section highlighted the study of different researchers for cancer disease detection using AI techniques. The prediction outcomes are classified on basis of parameters such as accuracy, sensitivity/recall, precision, specificity, dice score, Area under the Curve. Figure 9 provides the description of multiple evaluation parameters.

Fig. 9

Evaluation parameters

Evaluation parameters Table 1 comprises the comparative analysis based on multiple evaluation parameters for various cancer types.

Table 1

Comparative analysis using AI techniques for different cancers

Authors	Cancer types	Training data	Techniques	Challenges	Reported outcomes
Sudharani et al. [163]	Brain	MRIs images	Fuzzy C-Means	The small and unstructured data were not used in the system, restricting the generality and clinical applicability	Accuracy = 89.2% Sensitivity = 88.9% Specificity = 90%
Mohsen et al. [112]	Brain	Brain MR images	DNN with PCA and DWT (discrete wavelet transform)	The present technique is complex as it requires a large number of processors to execute the data	Prediction rate = 96.7% Precision = 97%
Dong et al. [50]	Brain	BRATS 2015	Deep-CNN	The system can be improvised by adding multi-institutional and longitudinal datasets in the future	Complete tumor region = 88%
Sobhaninia et al. [156]	Brain	Brain MR images	CNN	The technique can be extended by using instance segmentation for detecting the tumor in the image	Dice Score = 79%
Malathi et al. [107]	Brain	BRATS 2015	CNN with TensorFlow	New methodologies need to be used to segment the tumor images and perform the accurate delineation in radiotherapy	Dice coefficient = 0.73 Advancing tumor = 0.76 Sensitivity = 0.82
Alam et al. [17]	Brain	MRI images	Template-based K-means	The features used for enhancing the accuracy and detection can be improved in the future	Tumor detection = 97.43%
Devi et al. [45]	Brain	MRI images	Radial basis functional network ( RBFN)	The technique cannot predict the progressive growth of tumor cells	Energy = 0.1743 Homogeneity = 0.9300 Contrast = 0.2450
Kalaiselvi et al. [80]	Brain	MRI brain images	Modified MET (minimum error thresholding technique)	The system can be improved by incorporating more datasets in the future	Predictive Accuracy (PA) = 97.6% Dice coefficient (DC) = 67.9%
Al-Ayyoub et al. [18]	Brain	MRI images	Neural Network J48 Naïve Bayes Lazy-IBk	The current system failed to predict complex features which need to be solved in the future	Accuracy = 66.6% for NN, 59.2% J48, 59.2% for Naïve Bayes, 62.9% for Lazy-IBk
Kaur et al. [82]	Breast	Mammogram breast images	Support vector machine Deep Neural Network K-mean clustering	The system can improve its accuracy by working on large-scale deep learning internal layers, which will help radiologists validate data in less time in the future	Accuracy = 92% Specificity = 90% Sensitivity = 93% F-score = 96%
Bidard et al. [29]	Breast	Mammogram breast images	CTC cell search system	The system can enhance its work by developing and validating new bio clinical prognostic indices by pooling future trials	Sensitivity = 55% Specificity = 81% Accuracy = 77%
Patil et al. [131]	Breast	Mammogram images	CRNN FC-CSO	The current system did not work with blur images which should be improved by using a wiener filter	Accuracy = 98.4% Specificity = 99.9% F1-score = 74.5%
Eleyan et al. [52]	Breast	Wisconsin Breast Cancer Datasets	KNN	The present system failed to work with large datasets, which should be improved in the future	Accuracy = 97.51%
Nallamala et al. [123]	Breast	Mammogram images	CNN Logistic Regression	The system can be improved by working on large number of datasets in the future	Precision = 98.5%
Assiri et al. [14]	Breast	Wisconsin Breast Cancer Dataset	Multilayer Perceptron, Logistic Regression, Stochastic Gradient descent Learning	This technique failed to accurately perform the segmentation to be solved by applying semantic or instance segmentation	Accuracy = 99.42% Precision = 0.9940
Saha et al. [144]	Breast	DCE-MR images	Multivariate machine learning models	The system needed to work on its algorithms in image-controlled conditions with uniform scanning and contrast protocol	AUC = 0.771
Abdallah et al. [2]	Breast	Mammography images	Segmentation Techniques	The segmentation techniques should be escalated to improve its accuracy	Matching ratio = 96.3 ± 8.5
Mejia et al. [111]	Breast	Mammography images	KNN	The current system should enhance classification accuracy to improve the work	Accuracy = 94.44%
Qayyum et al. [133]	Breast	Digital mammograms	SVM Gray level co-occurrence matrix (GLCM) Features	It has been challenging for the system to interpret the final model because of its high dimensionality matrix	Accuracy = 96.55% Sensitivity = 96.97% Specificity = 96.29%
Ragab et al. [135]	Breast	Mammography images	SVM	The current system should improve its accuracy by working with a large number of datasets	Accuracy = 87.2% AUC = 94%
Win. et al. [171]	Cervical	Pap Smear images	Bagging Ensemble	The system produces false-negative results because it failed to detect specific abnormalities in Pap smear images	Accuracy = 98.27%
Wu et al. [173]	Cervical	Pathological images	CNN	The accuracy of the system can be improved by incorporating more training datasets	Accuracy = 93.33%
Alyafeai et al. [20]	Cervical	Cervigram images	CNN	The accuracy should be improved to increase the efficiency of the system	AUC score = 0.82 Accuracy = 0.68
Gupta et al. [62]	Cervical	Pap Smear images	ANN	The accuracy can be improved further to improve the work	Accuracy = 78%
Kurnianingsih et al. [92]	Cervical	Herlev Pap Smear dataset	R-CNN	The current technique required higher processing power which should be extended with the deeper network in order to improve the performance results	Accuracy = 95% Sensitivity = 96%
Rudra et al. [141]	Cervical	Pap Smear images	K-nearest Neighbor	The present system failed to classify and detect the abnormalities in the image	Accuracy = 98.31%
Sajenna et al. [143]	Cervical	Pap Smear images	SVM	The present system's classification technique did not include high-dimensional data that should be improved in the future to increase its accuracy	Accuracy = 93.78% Sensitivity = 98.96% Specificity = 96.69%
Hoerter et al. [73]	Colorectal	ImageNet database	CNN	The current system is restricted to detect polyps that are smaller than 10 mm	per-polyp sensitivity = 71%
Shin et al. [159]	Colorectal	Polyp images and videos	Deep-CNN	The current system showed much detection processing time, which should be improved in the future	Detection processing time = 0.39 s
Figueiredo et al. [56]	Colorectal	PillCam COLON2 capsule-based images and videos	Image processing approach	The present system worked with a limited number of videos and frames, increasing for better prediction outcomes	P-value higher than 500
Godkhindi et al. [58]	Colorectal	CT images	CNN	The polyp detection accuracy needs to be improved for the better working of the system	polyp detection accuracy = 88%
Zhang et al. [182, 183]	Colorectal	Endoscopic images	CNN	The current system had been manually selecting the RoI of each polyp which should be done automatically in the future to improve its accuracy	Accuracy = 85.9% Precision = 87.3% Recall = 87.6%
Yamada et al. [175]	Colorectal	polyp images and videos	Deep learning	The present system lacked robustness, limiting the utility of a computer-aided diagnosis system	Specificity = 97.3%
Santini et al. [147]	Kidney	KiTS19	CNN	New training strategies will be designed to differentiate between the data, and a different stage will be added for more detailed local features for escalating the current system's efficiency	Mean Dice score = 0.96
Tabibu et al. [164]	Kidney	Renal Cell Carcinoma	CNN	The current system had data imbalance issues which should be improved in the future	Accuracy = 92.61%
Ali et al. [19]	Kidney	miRNA Dataset	LSTM	Further clinical studies must validate the effectiveness of the selected miRNAs by the current system	Accuracy = 95%
Han et al. [67]	Kidney	Renal Cell Carcinoma	DNN	The accuracy, sensitivity, and specificity of the system should be improved further	Accuracy = 85% Sensitivity = 64% to 98% Specificity = 83% to 93%
Skalski [160]	Kidney	CT images	Vascular Tree (RUSBoost and Decision Trees)	Needed improvement regarding feature selection and segmentation of the image	Accuracy = 92.1%
Chlebus et al. [40]	Liver	CT images	Deep-CNN	The present system requires more work to be done to match the performance of human expertise	Detection rate = 77%
Le et al. [96]	Brain	BraTS 2018	CNN Random Forest Regression	The current system should add more datasets to increases the prediction rate in the future	Predict the survival rate
Wang et al. [67]	Liver	CT images	RT-PCR (Polymerase chain reaction)	The sensitivity and specificity can be improved further for the improvement of the work	Area under curve = 80.3% Sensitivity = 75% Specificity = 75%
Das et al. [42]	Liver	CT images	DNN	The current system failed to calculate the lesion's volumetric size, which hampered its efficiency	Accuracy = 99.38%
Raj et al. [137]	Liver	CT images	SVM	The model restricted access to the large datasets, which hurdles the efficiency of the system	Accuracy more than 80%
Rajkumar et al. [138]	Liver	CT images	SVM	The present system will be strived to improve the accuracy, precision, computational speed, automation, and reduction of manual interaction	Accuracy = 98%
Bach et al. [25]	Liver	CT images	LDCT,	The system showed the existence of uncertainty about the potential harms of screening and generalizability of results	Accuracy = 80%
Kang et al. [81]	Liver	CT images	Neural Network Fuzzy Neural Network	The current system lacked sufficient accuracy for the clinical application that should be further improved	Accuracy = 79.19%
Gruber et al. [60]	Liver	CT Liver images	DNN	An accurate minimization strategy will be developed for joint loss function and an improved deep learning algorithm for classification that the current system lacked	Accuracy = 99.9%
Shakeel et al. [154]	Lung	CT images	Improved-DNN	The present system can be improved by adding more datasets to it	Accuracy = 96.2% Specificity = 98.4% Precision = 97.4%
Asuntha and Srinivasan [20]	Lung	CT images	CNN Fuzzy Particle Swarm Optimization (FPSO)	The current framework neglected to order the disease as favorable or threatening, which ought to be improved in the future	Accuracy = 94.97% Sensitivity = 96.68% Specificity = 95.89%
Riquelme et al. (2020)	Lung	CT images	DBN	The present work can incorporate the improved version of convolutional architectures to enhance the efficiency of lung cancer detection	Sensitivity = 0.734 Specificity = 0.822
Ausawalaithong et al. [23]	Lung	Chest X-ray dataset	CNN	The current system required more features to enhance its accuracy, specificity, and sensitivity	Accuracy = 84.02% Specificity = 85.34% Sensitivity = 82.71%
Nasrullahet al. [122]	Lung	LIDC-IDRI datasets	CNN, MixNet	The present system should incorporate shifted additions in the future to reduce the redundancy of data	Sensitivity = 94% Specificity = 91%
Senthil et al. [153]	Lung	CT scan images	Guaranteed convergence particle swarm optimization (GCPSO)	The framework is expected to add more improvement calculations to upgrade precision	Accuracy = 95.89%
Bur et al. [33]	Oral	NCDB dataset	Tumor depth of invasion (DOI) Model	The current system needs improved predictive algorithms to enhance accuracy in detecting oral cancer in patients	Sensitivity = 86.6%
Lavanya and Chandra [93]	Oral	Oral Leukoplakia dataset	Decision Tree	The accuracy needs to be improved for the improvement of the work	Accuracy = 83.703%
Liu et al. [98]	Prostate	MRI images	CNN	The current system worked on a limited dataset that should be increased to improve its efficiency	AUC = 0.84
Yoo et al. [177]	Prostate	MRI images	CNN	The current system should be extended by 3DCNN’s and recurrent neural networks for improving the work in the future	AUC = 0.87 Confidence level = 95%
Zhang et al. [181]	Skin	DermIS Digital, Dermaquest Database	CNN/WOA Method	The optimization technique of the current system needs to be improvised in the future for better exploration ability	Sensitivity = 95% Specificity = 92% PPV = 84% NPV = 95% Accuracy = 91%
Mane et al. [108]	Skin	Dermoscopy images	SVM linear kernel	The present work is invasive, painful, and time-consuming, which needs to be improved in the future	Sensitivity = 90% Specificity = 90.90% Accuracy = 90.47%
Hasan et al. [67, 68]	Skin	Dermoscopy images	CNN	The technique shrank the size of the image, which led to the loss of information	Accuracy = 89.5%
Marka et al. [109]	Skin	Dermoscopy images	Machine Learning, computer-aided design	The present system can be extended by testing the viability of the models in a clinical setting	AUC = 0.832
Hasan et al.[68, 69]	Skin	PH2 dataset	ANN	The current system can be improved by using large datasets in the future	Accuracy = 95%
Khan et al. [84]	Skin	DERMIS dataset	SVM	The proposed system failed to classify the data accurately, which should be improved in the future	Accuracy = 96% Sensitivity = 97%
Radu et al. [134]	Skin	Clinical images	CNN	Though the system has maximized classification accuracy, its computation is too high because of its complex nature	Accuracy = 81% Sensitivity = 72% Specificity = 89%
Udrea et al. [167]	Skin	Dermoscopy images	ANN, Generative Adversarial Neural Network	The system can be improved by enlarging the training and testing data based on skin lesions images	Accuracy = 92%
Kloeckner et al. [86]	Stomach	Gastric cancer Images	CNN	The system's limitation is based on the selection and classification based on the selection of gastric images	ROC curves above 0.9
Khryashchev et al. [85]	Stomach	Endoscopic images	CNN	The system can be improved by adding many endoscopic image datasets to increase generalizing ability	mAP metric = 0.875
Shibata et al. [158]	Stomach	Endoscopic images	RNN	The present work should incorporate picture information, for example, screening endoscopic pictures and films in the future	Dice index = 71% Sensitivity = 96%
Hirasawa et al. [72]	Stomach	Endoscopic images	CNN	Worked on less training and high-quality data	Sensitivity = 92.2%,
Leon et al. [97]	Stomach	Histopathological Samples	Deep-CNN	The current system needs more samples for the better classification of data	Detection accuracy = 89.72%
Sakai et al. [146]	Stomach	Endoscopic images	CNN	The detection accuracy can be improved further for the improvement of the work	Detection accuracy = 82.8%
Thapa et al. [166]	Stomach	Gastroscopy samples	Random Forest	The presented work had used a minimal sample size which affected the validity of the model	Sensitivities = 86% Specificities = 79%
Dov et al. [51]	Thyroid	Cytopathology images	multiple-instance learning (MIL)	Due to limited memory, the present system is not able to access the large-sized database	AUC Score = 0.87 Average precision = 0.743
Ma et al. [106]	Thyroid	SPECT images	CNN	The system can be improved by adding more SPECT images of the Thyroid in the future	Accuracy = 99.08% Precision = 98.82% Specificity = 99.61%
Poudel et al. [132]	Thyroid	3D thyroid images	CNN	The current system only worked with a limited dataset which should be improved in the future by adding more training data	Dice coefficient = 0.876
Sokoutil et al. [161]	Thyroid	MRI images	Hill climbing	The present system was more manual and less automatic. Hence it can be improved by using algorithms to make the system more automatic	Accuracy = 98.96%
Guan et al. [61]	Thyroid	Ultrasound images	CNN, Inception v3	The present work can be improved and can be extended to use Doppler images in the future work	Sensitivity = 93.3% Specificity = 87.4% Confidence interval = 95%
Hu et al. [74]	Breast	Breast magnetic resonance imaging	Dichotomous Technique	The reported work can be enhanced further by adding more MRI images to screen the breast cancer for avoiding any future complications	Confidence Interval (CI) = 95%
Song et al. [157]	Neuroendocrine tumors	Contrast enhanced (CE)-MRI	Logistic regression analysis	The diagnostic accuracy can be improved further using clinical decision	AUC = 0.900 Validation Cohert = 0.978 Confidence Interval = 95%
Chillakuru, et al. [39]	Lung	Chest CT	Neural network	The performance can be improved on ground glass for lung cancer detection	Precision = 0.962 Recall = 0.573
Iuga et al. [105]	Tumors	lymph nodes (LNs) in computed tomography (CT)	CNN	Quantitative features of LNs can be improved to accelerate diagnosis	Detection rate = 76.9% Detection rate = 91.6%
Weng et al. [170]	Lung	Magnetic resonance imaging	CNN	The deep learning-based image segmentation for lungs using MRI images are time consuming process, further enhancement can be done to reduce the time duration process for segmentation	Mean difference in Lung = 0.032 ± 0.048 L
Gupta et al. [63, 64]	Cervical	Pap Smear images	Stacking Model	The oversampling technique used in the study may lead to over-fitting	AUC = 99.7%
Gupta et al. [63, 64]	Breast	Wisconsin Breast Cancer Dataset	Neural Ensemble stacking	Neural Ensemble stacking performed the best prediction	Accurcy = 99.8%

Comparative analysis using AI techniques for different cancers Accuracy = 89.2% Sensitivity = 88.9% Specificity = 90% Prediction rate = 96.7% Precision = 97% Dice coefficient = 0.73 Advancing tumor = 0.76 Sensitivity = 0.82 Energy = 0.1743 Homogeneity = 0.9300 Contrast = 0.2450 Predictive Accuracy (PA) = 97.6% Dice coefficient (DC) = 67.9% Neural Network J48 Naïve Bayes Lazy-IBk Support vector machine Deep Neural Network K-mean clustering Accuracy = 92% Specificity = 90% Sensitivity = 93% F-score = 96% Sensitivity = 55% Specificity = 81% Accuracy = 77% CRNN FC-CSO Accuracy = 98.4% Specificity = 99.9% F1-score = 74.5% CNN Logistic Regression Accuracy = 99.42% Precision = 0.9940 SVM Gray level co-occurrence matrix (GLCM) Features Accuracy = 96.55% Sensitivity = 96.97% Specificity = 96.29% Accuracy = 87.2% AUC = 94% AUC score = 0.82 Accuracy = 0.68 Accuracy = 95% Sensitivity = 96% Accuracy = 93.78% Sensitivity = 98.96% Specificity = 96.69% Accuracy = 85.9% Precision = 87.3% Recall = 87.6% Accuracy = 85% Sensitivity = 64% to 98% Specificity = 83% to 93% CNN Random Forest Regression RT-PCR (Polymerase chain reaction) Area under curve = 80.3% Sensitivity = 75% Specificity = 75% Neural Network Fuzzy Neural Network Accuracy = 96.2% Specificity = 98.4% Precision = 97.4% CNN Fuzzy Particle Swarm Optimization (FPSO) Accuracy = 94.97% Sensitivity = 96.68% Specificity = 95.89% Sensitivity = 0.734 Specificity = 0.822 Accuracy = 84.02% Specificity = 85.34% Sensitivity = 82.71% Sensitivity = 94% Specificity = 91% AUC = 0.87 Confidence level = 95% DermIS Digital, Dermaquest Database Sensitivity = 95% Specificity = 92% PPV = 84% NPV = 95% Accuracy = 91% Sensitivity = 90% Specificity = 90.90% Accuracy = 90.47% Accuracy = 96% Sensitivity = 97% Accuracy = 81% Sensitivity = 72% Specificity = 89% Gastric cancer Images Dice index = 71% Sensitivity = 96% Sensitivities = 86% Specificities = 79% AUC Score = 0.87 Average precision = 0.743 Accuracy = 99.08% Precision = 98.82% Specificity = 99.61% Sensitivity = 93.3% Specificity = 87.4% Confidence interval = 95% Neuroendocrine tumors AUC = 0.900 Validation Cohert = 0.978 Confidence Interval = 95% Precision = 0.962 Recall = 0.573 Detection rate = 76.9% Detection rate = 91.6% Mean difference in Lung = 0.032 ± 0.048 L As shown in the comparative analysis, many research works have been analyzed for cancer diagnosis and detection using conventional machine and deep learning methods. It can be observed that most of the deep learning techniques have performed well and achieved high accurateness in terms of the prediction scores obtained. Also, most of the research articles have been published recently (2020). Also, most of the studies have worked on the diagnosis of breast cancer.

Discussion

In the current review, we have presented recently published research studies that employed AI-based Learning techniques for predicting malignancy. This study highlights research works related to cancer diagnosis prediction and predicting post-operative life expectancy of cancer patients using AI-based learning techniques. The research works published between 2009 to April 2021 are selected in this review article. Figure 13 demonstrates the distribution of the articles based on the published year. Most of the research works were published in the years 2020 (35), 2019 (32), 2018 (30). There are few papers from the year 2021 as we could only extract papers published up to April 2021. Based on the analysis of Fig. 13, we can conclude that number of research studies has increased gradually in recent years. Although AI-based techniques have marked their significance in the field of cancer prediction research, there are still many challenges faced by the researchers that need to be addressed.

Fig. 13

Year-wise distribution of papers

Investigation 1: Which Learning Approach has provided appreciable prediction outcomes extensively? AI-based techniques have contributed significantly to the field of cancer research. The research works mentioned in the literature have focussed mainly on deep learning techniques. Deep learning classifiers have dominated over machine learning models in the field of cancer research. Among Deep learning models, Convolutional Neural Networks (CNN) has been used most commonly for cancer prediction; approximately 41% of studies have used CNN to classify cancer. Neural networks (NN) and Deep Neural Networks (DNN) have also been used extensively in the literature. Apart from deep learning approaches, Ensemble learning techniques (Random Forest Classifier weighted voting, Gradient Boosting Machines) and Support vector machines (SVM) are primarily used in literature. The distribution of literature based on AI-based prediction models is shown in Fig. 10.

Fig. 10

AI-Based Prediction Models

Investigation 2: Which cancer site and training data has been explored most extensively? Most of the research papers explored in this review focused on the automated diagnosis of cancer prediction. The most extensively explored sites are the breast (22) followed by the kidney (17). Other than breast and kidney, most researchers have worked on brain, colorectal, cervical, and prostate cancer prediction. Figure 11 depicts the distribution of the research works based on cancer sites.

Fig. 11

Cancer site-wise distribution of papers

The type of data used to train the prediction model significantly affects the performance of the model. The reliability and the prediction outcomes are dependent on the data used to train the classification model. Most of the research studies reviewed in this paper has used Magnetic Resonance Imaging (MRI). The second most commonly used data is Computed Tomography (CT) scan images. Other image types like dermoscopic, mammographic, endoscopic, and pathological were also used in the literature. Figure 12 highlights the distribution of papers based on the type of data used to train the prediction model.

Fig. 12

Distribution of papers based on the type of training data

Investigation3: In which year most of the cancer prediction studies have been published? AI-Based Prediction Models Cancer site-wise distribution of papers Distribution of papers based on the type of training data Investigation 4: which sorts of images have attained the highest prediction accuracy? Most of the studies have used MRI images for cancer diagnosis prediction. Approximately 23% of literature has used Computed Tomography scan for training the model. Also, many studies have employed mammographic images, endoscopic images, and pathological images. Low contrast in CT scan images makes the classification task difficult as it becomes difficult to differentiate the object from the background. Some cancers, such as prostate cancer, and certain liver cancers, are hardly detected using a CT scan. In such scenarios, Digital Imaging and Communications in Medicine (DICOM) images generated from MRI can help achieve the purpose with greater prediction accurateness. Regarding the specificity of the type of classification models used for specific cancer: Convolutional Neural Networks models have been used to predict almost every type of cancer such as brain, colorectal, skin, thyroid, and lungs. Most of the studies that explored the prediction of breast cancer diagnosis used hybrid modes or novel approaches for the purpose. Also, Neural networks have been applied to almost all breast and cervical cancer datasets. Regarding Stomach cancer, only Convolutional Neural Networks have been used. Support Vector machines have been used for the prediction of liver and breast cancer. In a nutshell, Convolutional Neural Networks can be applied with different datasets. Also, ensemble learners have been used with almost every kind of cancer. Investigation 5: Challenges faced by the researchers in the construction of AI-based prediction models. Year-wise distribution of papers Limited Data size The most common challenge faced by most of the studies was insufficient data to train the model. A small sample size implies a smaller training set which does not authenticate the efficiency of the proposed approaches. Good sample size can train the model better than the limited one. High dimensionality Another data-related issue faced in cancer research is high dimensionality. High dimensionality is referred to a vast number of features as compared to cases. However, multiple dimensionality reduction techniques [155] are available to deal with this issue. However, the requirement of a generic approach to handle this issue is there. Class imbalance problem A leading challenge faced by medical data sets, especially cancer data, is the uneven distribution of classes. Class imbalance arises due to a miss-match of the sample size of each class. Classification models tend to be biased towards the class with a majority of samples. Most of the existing techniques handle the imbalance well on binary classes but fail in multi-class patterns. Computational time About 90% of studies have endorsed deep learning approaches to predict cancer using medical images than other techniques. However, the deep learning-based approaches are highly complex. About 41% of the studies have used the CNN classifier, which has performed significantly but at the cost of high computational time and space. Efficient feature selection technique Many studies have achieved exceptional prediction outcomes. However, the requirement of a computationally effective feature selection method is still there to eradicate the data cleaning procedures while generating high cancer prediction accuracy. Model Generalizability A shift in research towards improving the generalizability of the model is required. Most of the studies have proposed a prediction model that is validated on a single site. There is a need to validate the models on multiple sites that can help improve the model's generalizability. Clinical Implementation AI-based models have proved their dominance in cancer research; still, the practical implementation of the models in the clinics is not incorporated. These models need to be validated in a clinical setting to assist the medical practitioner in affirming the diagnosis verdicts.

Conclusions and Future Directions

This review study attempts to summarize the various research directions for AI-based cancer prediction models. AI has marked its significance in the area of healthcare, especially cancer prediction. The paper provides a critical and analytical examination of current state-of-the-art cancer diagnostic and detection analysis approaches—a thorough examination of the machine and deep learning models used in cancer early detection using medical imaging. The AI techniques play a significant role in early cancer prognosis and detection using machine and deep learning techniques for extracting and classifying the disease features. Our study concluded that most previous literature works employed deep learning techniques, especially Convolutional Neural Networks. Another significant factor noted in our study is that most studies have worked on breast cancer data. It was examined that when deep learning models are applied to pre-processed and segmented medical images, the images perform better in classification metrics such as AUC, Sensitivity, Dice-coefficient, and Accuracy. There is scope to work on early detection of head and neck cancers because less study has been conducted for both types of cancer. Also, the federated learning model can be used for cancer detection based on distributed datasets. hence, we intend to use a federated learning model for the detection of cancer disease by creating the decentralized training model for cancer datasets in remote places. This study highlights the challenges faced by the researchers in the construction of AI-based prediction models. Although multiple pieces of research have displayed significant results, there is still a need to address the challenges in cancer research in future.

89 in total

1. Thyroid cancer: zealous imaging has increased detection and treatment of low risk tumours.

Authors: Juan P Brito; John C Morris; Victor M Montori
Journal: BMJ Date: 2013-08-27

2. Computer-aided diagnosis systems for lung cancer: challenges and methodologies.

Authors: Ayman El-Baz; Garth M Beache; Georgy Gimel'farb; Kenji Suzuki; Kazunori Okada; Ahmed Elnakib; Ahmed Soliman; Behnoush Abdollahi
Journal: Int J Biomed Imaging Date: 2013-01-29

3. Manifold learning of brain MRIs by deep learning.

Authors: Tom Brosch; Roger Tam
Journal: Med Image Comput Comput Assist Interv Date: 2013

4. A study of association of Oncotype DX recurrence score with DCE-MRI characteristics using multivariate machine learning models.

Authors: Ashirbani Saha; Michael R Harowicz; Weiyao Wang; Maciej A Mazurowski
Journal: J Cancer Res Clin Oncol Date: 2018-02-09 Impact factor: 4.553

5. Single circulating tumor cell detection and overall survival in nonmetastatic breast cancer.

Authors: F-C Bidard; C Mathiot; S Delaloge; E Brain; S Giachetti; P de Cremoux; M Marty; J-Y Pierga
Journal: Ann Oncol Date: 2009-10-22 Impact factor: 32.976

6. Automatic polyp detection in pillcam colon 2 capsule images and videos: preliminary feasibility report.

Authors: Pedro N Figueiredo; Isabel N Figueiredo; Surya Prasath; Richard Tsai
Journal: Diagn Ther Endosc Date: 2011-05-22

7. Global Trend of Breast Cancer Mortality Rate: A 25-Year Study.

Authors: Nasrindokht Azamjah; Yasaman Soltan-Zadeh; Farid Zayeri
Journal: Asian Pac J Cancer Prev Date: 2019-07-01

8. Lung Cancer Detection Using Image Segmentation by means of Various Evolutionary Algorithms.

Authors: K Senthil Kumar; K Venkatalakshmi; K Karthikeyan
Journal: Comput Math Methods Med Date: 2019-01-08 Impact factor: 2.238

9. Lysine-specific histone demethylase 1A (LSD1) in cervical cancer.

Authors: Daniel Beilner; Christina Kuhn; Bernd P Kost; Julia Jückstock; Doris Mayr; Elisa Schmoeckel; Christian Dannecker; Sven Mahner; Udo Jeschke; Helene Hildegard Heidegger
Journal: J Cancer Res Clin Oncol Date: 2020-07-28 Impact factor: 4.553

10. Deep learning with convolutional neural networks for identification of liver masses and hepatocellular carcinoma: A systematic review.

Authors: Samy A Azer
Journal: World J Gastrointest Oncol Date: 2019-12-15

5 in total

1. Prediction Performance of Deep Learning for Colon Cancer Survival Prediction on SEER Data.

Authors: Surbhi Gupta; S Kalaivani; Archana Rajasundaram; Gaurav Kumar Ameta; Ahmed Kareem Oleiwi; Betty Nokobi Dugbakie
Journal: Biomed Res Int Date: 2022-06-16 Impact factor: 3.246

Review 2. Molecular Markers of Pediatric Solid Tumors-Diagnosis, Optimizing Treatments, and Determining Susceptibility: Current State and Future Directions.

Authors: Joanna Trubicka; Wiesława Grajkowska; Bożenna Dembowska-Bagińska
Journal: Cells Date: 2022-04-06 Impact factor: 6.600

Review 3. Application of Artificial Intelligence Methods for Imaging of Spinal Metastasis.

Authors: Wilson Ong; Lei Zhu; Wenqiao Zhang; Tricia Kuah; Desmond Shi Wei Lim; Xi Zhen Low; Yee Liang Thian; Ee Chin Teo; Jiong Hao Tan; Naresh Kumar; Balamurugan A Vellayappan; Beng Chin Ooi; Swee Tian Quek; Andrew Makmur; James Thomas Patrick Decourcy Hallinan
Journal: Cancers (Basel) Date: 2022-08-20 Impact factor: 6.575

4. Deep learning techniques for cancer classification using microarray gene expression data.

Authors: Surbhi Gupta; Manoj K Gupta; Mohammad Shabaz; Ashutosh Sharma
Journal: Front Physiol Date: 2022-09-30 Impact factor: 4.755

5. A novel interpretable machine learning algorithm to identify optimal parameter space for cancer growth.

Authors: Helena Coggan; Helena Andres Terre; Pietro Liò
Journal: Front Big Data Date: 2022-09-12

5 in total