| Literature DB >> 31065266 |
Amina Adadi1, Safae Adadi2, Mohammed Berrada1.
Abstract
Machine learning has undergone a transition phase from being a pure statistical tool to being one of the main drivers of modern medicine. In gastroenterology, this technology is motivating a growing number of studies that rely on these innovative methods to deal with critical issues related to this practice. Hence, in the light of the burgeoning research on the use of machine learning in gastroenterology, a systematic review of the literature is timely. In this work, we present the results gleaned through a systematic review of prominent gastroenterology literature using machine learning techniques. Based on the analysis of 88 journal articles, we delimit the scope of application, we discuss current limitations including bias, lack of transparency, accountability, and data availability, and we put forward future avenues.Entities:
Year: 2019 PMID: 31065266 PMCID: PMC6466966 DOI: 10.1155/2019/1870975
Source DB: PubMed Journal: Adv Bioinformatics ISSN: 1687-8027
Overview of ML techniques.
| Algorithm | Overview |
|---|---|
| Artificial neural networks (ANN) | ANN is inspired by interconnections between neurons in biological neural networks. It consists of a set of nodes configured in layers (input, hidden, and output), connected to one another via weighted edges. Input feature vectors are processed sequentially by every layer in the net via non-linear transformations, before an output is generated upon reaching the final layer. During the training process, if the output of the ANN is incorrect, an algorithm known as backpropagation distributes the error term back up through the layers, by modifying the weights at each edge. ANN can be supervised or unsupervised. More recently, there has been a resurgence of interest in multi-layered ANNs or Deep Learning (DL), given their ability to work well with complex and high-dimensional data sets. Convolutional Neural Network (CNN), a variation of DL, is a useful technique used in image classification. |
|
| |
| Support vector machine (SVM) | SVM is a discriminative classifier formally defined by a separating hyperplane. In other words, given labelled training data, the algorithm outputs an optimal hyperplane that categorizes new examples. |
|
| |
| Decision Tree (DT) | DT is the simplest tree-based supervised ML model. The aim is to recursively construct a tree structure, in which each internal node represents a condition based on which the tree splits into branches/ edges. The end of the branch that does not split anymore is the decision/leaf. |
|
| |
| k-Nearest neighbours (KNN) | KNN is supervised algorithm that classifies new data by a majority vote of its neighbors, with the data being assigned to the class most common amongst its K nearest neighbors measured by a distance function. |
|
| |
| Logistic regression (LR) | LR is a traditional statistical method for solving binary classification problems (problems with two class values). It predicts the probability of occurrence of an event by fitting data to a logistic function. |
|
| |
| K-mean clustering (KM) | KM is a popular unsupervised ML algorithm. The algorithm works iteratively to partition data into k clusters in which each object belongs to the cluster with the nearest mean. This technique produces exactly k different clusters of greatest possible distinction. The best number of clusters k leading to the greatest separation (distance) is not known a priori and must be computed from the data. |
Overview of performance metrics.
| Metric | Formula | Description |
|---|---|---|
| Sensitivity (SV) |
| It measures the portion of positives that are correctly identified (performance measure of the whole positive of a dataset) |
|
| ||
| Specificity (SP) |
| It measures the portion negatives that are correctly identified (performance measure of the whole negative part of a dataset) |
|
| ||
| Positive Predictive Value (PPV) |
| The ratio of correctly diagnosed positives to the total of identified positives |
|
| ||
| Negative Predictive Value (NPV) |
| The ratio of correctly diagnosed negatives to the total of identified negatives |
|
| ||
| Accuracy (ACC) |
| The ratio of correctly diagnosed cases to the total diagnosed cases ( the overall performance measure) |
|
| ||
| Area under the receiver operating characteristics curve (AUC-ROC) | Graphical plot [ | In a Receiver Operating Characteristics (ROC) curve the sensitivity is plotted in function of the false positive rate (100-Specificity) for different cut-off points of a parameter. Each point on the ROC curve represents a sensitivity/specificity pair corresponding to a particular decision threshold. The area under the ROC curve (AUC-ROC) is a measure of how well a parameter can distinguish between two diagnostic groups (diseased/normal) |
TP: true positive (number of positive cases correctly detected).
TN: true negative (number of negative cases correctly detected).
FP: false positive (number of negative cases incorrectly detected as positive).
FN: false negative (number of positive cases incorrectly detected as negative).
ML applications in medical domains.
| Medicine domain | ML applications | References |
|---|---|---|
| Radiology | Radiological imaging tasks such as: | [ |
|
| ||
| Pathology | Digital pathological image analysis notably: | [ |
|
| ||
| Oncology | Early cancer diagnosis and prognosis: | [ |
|
| ||
| Cardiology | Early detection of cardiovascular diseases based on: | [ |
|
| ||
| Neurology | Neurological disorders identification and prediction: | [ |
Figure 1Key enabling technologies in GE practice.
Research questions.
| Research question | Objective |
|---|---|
| RQ1: What are the main machine learning techniques that have been applied on gastroenterology? | Identifying techniques currently in use and studying their characteristics and outcomes in terms of learning class, sources of data, and performance. |
|
| |
| RQ2: Which sub-fields of gastroenterology has machine learning been applied to? | Identifying where ML is making changes and hence |
|
| |
| RQ3: How ML will impact gastroenterology practice? | Drawing conclusions about the current research efforts and the main research directions |
Figure 2The process of search and selection of papers.
Search terms used for query.
| ML related terms | GE related terms |
|---|---|
| Artificial Intelligence – Machine learning – Data mining – Neural network – Deep learning – Algorithms | Oesophagus – Stomach – Gallbladder – Liver –Pancreas – Biliary bowel – Colon – Intestine – Anus – Gut – Rectum– Gastroenterology – Hepatology – Proctology – Endoscopy – Digestive |
Figure 3Distribution of the included studies by year of publication.
Figure 4Breakdown of the included studies by country.
Figure 5Main GE disorders covered by the identified studies.
GE activities using ML.
| Aim of study | Number of studies | Application/Description |
|---|---|---|
| Disease classification and discrimination | 27 | The usage of ML in disease classification is very frequent. Indeed, as ML systems are capable to analyze large volumes of patient data, they can, efficiently and accurately, correlate these features with some disease state. This is particularly useful for difficult-to-diagnose diseases, such as celiac disease which involves multiple clinical presentations and symptoms shared with other diseases. ML ability to accurately classify disease states (present/absent), etiology, and subtype allows subsequent investigations, treatments, and interventions to be delivered in an efficient and targeted manner. |
|
| ||
| Risk stratification | 17 | The accurate assessment of a patient's risk of adverse events remains a mainstay of clinical care; MLTs form an attractive platform to build risk metrics because they can easily incorporate disparate pieces of data, yielding classifiers with improved performance. |
|
| ||
| Endoscopic imaging examination | 16 | Endoscopic procedures generate a large amount of images in one examination of a patient. It is hard for clinicians to leave continuous time to examine the full endoscopic images. Thus, the use of ML to assist in endoscopic imaging examination tasks represents a response to the urgent need for new technologies to supplement existing imaging techniques. |
|
| ||
| Early detection of cancer | 7 | Early identification of cancer is challenging because symptoms are non-specific (or absent) and compounded by overlap with symptoms of other diseases. That is why ML has emerged as a promising technique for handling complex interactions of high-dimensional medical data related to cancerology tasks. |
|
| ||
| Survival prediction | 7 | Survival probability prediction is one important problem encountered in medical studies when the primary endpoint of interest is time to an event. An accurate survival probability prediction can provide a useful tool for selecting prevention and treatment strategies. Thus, considerable studies in the reviewed literature have introduced MLTs as a rapid and reliable technique to predict survival. |
|
| ||
| Others tasks | 14 | Other applications of MLTs that have been studied in literature with promising results include drug development and treatment planning (6 studies), endoscopy or surgery candidate selection (4 studies), and surgical/clinical outcomes prediction (4 studies). |
Figure 6Distribution of GE activities using ML classified by GE disorders.
Figure 7The main used MLTs.
Figure 8Data source typology.
Figure 9The main used performance metrics.
Figure 10Technical description of all included studies.