| Literature DB >> 35204625 |
Athanasios Siouras1,2, Serafeim Moustakidis3, Archontis Giannakidis4, Georgios Chalatsis5, Ioannis Liampas6, Marianna Vlychou7, Michael Hantes5, Sotiris Tasoulis1, Dimitrios Tsaopoulos2.
Abstract
The improved treatment of knee injuries critically relies on having an accurate and cost-effective detection. In recent years, deep-learning-based approaches have monopolized knee injury detection in MRI studies. The aim of this paper is to present the findings of a systematic literature review of knee (anterior cruciate ligament, meniscus, and cartilage) injury detection papers using deep learning. The systematic review was carried out following the PRISMA guidelines on several databases, including PubMed, Cochrane Library, EMBASE, and Google Scholar. Appropriate metrics were chosen to interpret the results. The prediction accuracy of the deep-learning models for the identification of knee injuries ranged from 72.5-100%. Deep learning has the potential to act at par with human-level performance in decision-making tasks related to the MRI-based diagnosis of knee injuries. The limitations of the present deep-learning approaches include data imbalance, model generalizability across different centers, verification bias, lack of related classification studies with more than two classes, and ground-truth subjectivity. There are several possible avenues of further exploration of deep learning for improving MRI-based knee injury diagnosis. Explainability and lightweightness of the deployed deep-learning systems are expected to become crucial enablers for their widespread use in clinical practice.Entities:
Keywords: ACL; deep learning; knee injury; machine learning; meniscus
Year: 2022 PMID: 35204625 PMCID: PMC8871256 DOI: 10.3390/diagnostics12020537
Source DB: PubMed Journal: Diagnostics (Basel) ISSN: 2075-4418
Figure 1Examples of typical machine-learning and deep-learning pipelines.
Brief presentation of the feature extraction techniques, as well as the ML and DL models, and the main procedures that were reported in the papers of our survey.
| Category | Models | Description |
|---|---|---|
| Feature extraction | Histogram of oriented gradient (HOG) [ | This is a feature descriptor used in computer vision and image processing for the purpose of object detection. The technique counts occurrences of gradient orientation in localized portions of an image. |
| Generalized search tree (GIST) [ | GIST descriptor represents holistic spatial scene properties (spatial envelope) of an image. It summarizes gradient information on different spatial scales and orientations by splitting the image into a grid of cells on several scales and convolving each cell using a Gabor filter bank from different perspectives. | |
| Gray-level co-occurrence matrix (GLCM) [ | GLCM is a way of extracting second-order statistical texture features. In particular, the texture of an image is estimated by calculating how often pairs of pixels with specific values and a certain spatial relationship occur. | |
| Traditional Machine Learning | k-nearest neighbor (K-NN) [ | KNN algorithm is a simple, easy-to-implement supervised ML algorithm that can be used to solve both classification and regression problems. It works by (i) finding the distances between a query and all the examples in the data, (ii) selecting the K nearest neighbors of the query, and (iii) voting for the most frequent label (in the case of classification) or averaging the labels (in the case of regression). |
| Support vector machines (SVMs) [ | SVMs is a supervised method that identifies a hyperplane that best divides the data into two classes. To separate the two clouds of data points, there are many possible hyperplanes that could be chosen. The objective of the SVM algorithm is to find a slab that has the maximum thickness, i.e., the maximum distance between data points of the different classes. | |
| Shallow artificial neural networks (ANNs) [ | The ANN vaguely simulates the way the human brain analyzes and processes information. They consist of sequential layers: input, hidden and output layers. The hidden layer processes and transmits the input information to the output layer. | |
| Deep Learning | Convolutional neural networks (CNNs) [ | This is a class of DL algorithms commonly used in computer vision and pattern recognition. CNNs are a specific type of neural networks that are generally composed of the following layers: (i) input layer, (ii) convolution layers, (iii) pooling layers and (iv) fully connected layers. The convolution layers use filters that perform convolution operations as they are scanning the input with respect to its dimensions. Pooling is a down-sampling operation, which is typically applied after a convolution layer. The fully connected layers operate on a flattened input where each input is connected to all neurons in the next layer and are usually found towards the end of CNN architectures to optimize objectives such as class scores. |
| Region based convolutional neural networks (R-CNNs) [ | The method of detecting and classifying objects in an image is known as object detection. R-CNN (regions with convolutional neural networks) is a deep learning technique that blends rectangular area proposals with convolutional neural network functionality. The R-CNN algorithm is a two-stage detection method. | |
| Deep residual networks [ | A residual neural network (ResNet) is an ANN variant that uses residual mapping and shortcut connections to tackle the problem of vanishing and exploding gradients that is characteristic of deep CNNs. As a consequence of this, deep residual networks achieve better performance when compared to plain very deep networks, whereas their training is easier as well. Typical ResNet models are implemented with double- or triple-layer skips that contain nonlinearities such as rectified linear unit (ReLUs) and batch normalization in between. | |
| 3D-CNNs [ | A 3D CNN is simply the 3D generalization of 2D CNNs. It takes as input a 3D volume or a sequence of 2D frames (e.g., slices in an MRI scan). Then kernels move through 3 dimensions of data producing 3D activation maps. Overall, they learn powerful representations of volumetric data. | |
| Computer Vision Transformers [ | When data is modelized as a sequence of embeddings, the Transformer model is a basic yet scalable technique that can be used for any type of data. Even without typical convolutional pipelines, transformers can be utilized to provide SOTA results in Computer Vision. It is a DL network that extracts inherent properties of the interest domain via the self-attention technique. | |
| Procedure | Training | The standard procedure involves a dataset of paired images and labels (x, y) for training and testing, an optimizer (e.g., stochastic gradient descent, Adam [ |
| Data augmentation | Data augmentation is a strategy that artificially generates more training samples to increase the diversity of the training data. This can be done via applying affine transformations (e.g., rotation, scaling), flipping or cropping to original labeled samples. | |
| Dropout | Dropout is a regularization method that randomly drops some units from the neural network during training, encouraging the network to learn a sparse representation. It is used to reduce overfitting. | |
| Loss function | The metric to assess the discrepancy between model predictions and labels is called loss function. The gradients of the loss function are used to update the weights of the neural networks. | |
| Transfer learning | This aims to transfer knowledge from one task to another different but related target task. This is often achieved by reusing the weights of a pre-trained model, to initialize the weights in a new model for the target task. Transfer learning can help to decrease the training time and achieve lower generalization error. |
Figure 2Quality assessment outcomes using the MINORS tool.
Figure 3Flow chart presenting the design of the literature search.
Results of studies.
| No. | Author | Year | AI Model Used | Pretrained CNN | MRI (T) | Localization Technique | Validation | Performance (Accuracy/AUC) | Application Domain |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Awan et al. [ | 2021 | CNN | ResNet-14 | 1.5 T | They applied normal approach to localize based upon region of interest (ROI) | 5-fold cross-validation | 92%/(healthy tear = 0.98, partial tear = 0.97 and fully ruptured tear = 0.99) | ACL tear |
| 2 | Jeon et al. [ | 2021 | 3D CNN | VGGNet, AlexNet, and SqueezeNet | 3 T & 1.5 T | Custom localization technique | 5-fold cross-validation | N/A/0.983 and 0.980 on the | ACL tear |
| 3 | Rizk et al. [ | 2021 | 3D CNN | CNN-based localization model | 1 T (54%)–1.5 T (9.7%)–3 T (36.3%) | Custom localization technique | ten-fold cross validation | Meidal = N/A/0.93, Lateral = N/A/0.84 | Meniscus tear |
| 4 | Dai et al. [ | 2021 | TransMed | N/A | 3 T & 1.5 T | N/A | 120 exams | ACL tear = 94.9%/0.98, Abnormality = 91.8%/0.976, Meniscus tear = 85.3%/0.95 | ACL tear—Meniscus tear—Abnormalities |
| 5 | Astuto et al. [ | 2021 | 3D CNN | N/A | 3 T | V-Net | Hold out (15% of sample) | N/A/from 0.83 to 0.93 | ACL tear—Meniscus tear—Cartilage Lession |
| 6 | Fritz et al. [ | 2020 | DCNN | N/A | 1.5 T (64%)–3 T (36%) | To visually localize the tear, the software computes the class activation map (CAM) of the last convolution layer in the CNN and maps it to an axial knee image | Hold out (10% of sample) | Medial = (86%/0.88), Lateral = (84%/0.78), Overall = (N/A/0.96) | Meniscus tear |
| 7 | Namiri et al. [ | 2020 | CNN | N/A | 3 T | three-dimensional V-Net | Hold out (10% of sample) | 3D-model = (89%/sensitivity of 89% and specificity of 88%), 2D-model = (92%/sensitivity of 93% and specificity of 90%) | ACL tear |
| 8 | Zhang et al. [ | 2020 | CNN | 3D DenseNet, VGG16, ResNet | 1.5 T (74%)–3 T (26%) | - | Hold out (20% of sample) | Custom = (95.7%/0.96), ResNet = (NA/0.95), VGG16 = (NA/0.86) | ACL tear |
| 9 | Germann et al. [ | 2020 | DCNN | N/A | 1.5 T–3 T | They cropped manually | Out of the 5802 MRI studies, 4802 were used for training, 500 for validation, and 500 for initial testing | N/A/0.94 | ACL tear |
| 10 | Azcona et al. [ | 2020 | CNN | MRNet, ResNet18, Resnet50 and ResNet152, ImageNet | 3 T (56.6%)–1.5 T (43.4%) | - | N/A | NA/0.96–N/A/0.91–N/A/0.94 | ACL tear—Meniscus tear—Abnormalities |
| 11 | Chang et al. [ | 2019 | CNN | ResNet | 1.5 T–3 T | The object localization CNN was implemented as a fully convolutional network based on U-net architecture | 5-fold-cross-validation | 96.7%/0.97 | ACL tear |
| 12 | Liu et al. [ | 2019 | CNN | LeNet-5, DenseNet, VGG16, AlexNet | N/A | They used object detection technique YOLO | 50 subjects test set (14% of the sample) | N/A/0.98 | ACL tear |
| 13 | Couteaux et al. [ | 2019 | CNN | ResNet-101, ConvNet, R-CNN | N/A | To localize both menisci and identify tears in each meniscus, they used the Mask R-CNN framework | 54 cases and the model with the highest validation accuracy was selected | N/A/0.90 | Meniscus tear |
| 14 | Pedoia et al. [ | 2019 | 2D U-Net, CNN | N/A | 3 T | - | Hold out (20% of sample) | Sensitivity of 89.81% and specificity of 81.98% | Meniscus tear |
| 15 | Roblot et al. [ | 2019 | CNN | AlexNet, MRNet | N/A | They used object detection technique Fast RCNN & Faster RCNN | The algorithm was thus used on a test dataset composed of 700 images for external validation | 72.5%/0.85 | Meniscus tear |
| 16 | Nicholas Bien et al. [ | 2018 | CNN | AlexNET, MRNet | 3 T (56.6%)–1.5 T (43.4%) | - | 120 exams | 86.7%/0.97–72.5%/0.85–N/A/0.94 | ACL tear—Meniscus tear—Abnormalities |
| 17 | Liu et al. [ | 2018 | CNN | VGG16 | 3 T | - | fellowship trained musculoskeletal radiologist (R.K., with 15 years of clinical experience) | N/A/0.92 | Cartilage lesion |
| 18 | Stajduhar et al. [ | 2017 | HOG + linSVM, HOG + RF, GIST + rbfSVM, GIST + RF | N/A | 1.5 T | Manual extraction of a rectangular ROI | 10-fold cross validation | (Injury detection problem, complete rupture) = (N/A/0.89, N/A/0.94), (N/A/0.88, N/A/0.94), (N/A/0.889, N/A/0.91), (N/A/0.88, N/A/0.90) respectively with the models | ACL tear |
| 19 | Mazlan et al. [ | 2017 | SVM | N/A | N/A | They use cropping technique | Hold out (10% of sample) | 100%/N/A | ACL tear |
| 20 | Zarandi et al. [ | 2016 | IT2FCM, PNN | N/A | N/A | - | Hold out (20% of sample) | 0 and 1 mode: 90%/N/A | Meniscus tear |
| 21 | Fu et al. [ | 2013 | SVM | N/A | N/A | Active Contours without Edges method. This method combines Active Contours with Level Sets and is called ACLS | 5-Fold cross validation | SVM model: N/A/0.73 | Meniscus tear |
| 22 | Abdullah et al. [ | 2013 | BP ANN, K-NN | N/A | N/A | - | 5-fold and 6-fold | BP ANN: 94.44%/N/A | ACL tear |
Figure 4Temporal evolution chart depicting the number of ML papers per category published each year since 2013.
Figure 5A typical DL pipeline for ACL detection.