| Literature DB >> 32646771 |
O S Albahri1, A A Zaidan2, A S Albahri3, B B Zaidan1, Karrar Hameed Abdulkareem4, Z T Al-Qaysi5, A H Alamoodi1, A M Aleesa6, M A Chyad1, R M Alesa6, L C Kem1, Muhammad Modi Lakulu1, A B Ibrahim1, Nazre Abdul Rashid1.
Abstract
This study presents a systematic review of artificial intelligence (AI) techniques used in the detection and classification of coronavirus disease 2019 (COVID-19) medical images in terms of evaluation and benchmarking. Five reliable databases, namely, IEEE Xplore, Web of Science, PubMed, ScienceDirect and Scopus were used to obtain relevant studies of the given topic. Several filtering and scanning stages were performed according to the inclusion/exclusion criteria to screen the 36 studies obtained; however, only 11 studies met the criteria. Taxonomy was performed, and the 11 studies were classified on the basis of two categories, namely, review and research studies. Then, a deep analysis and critical review were performed to highlight the challenges and critical gaps outlined in the academic literature of the given subject. Results showed that no relevant study evaluated and benchmarked AI techniques utilised in classification tasks (i.e. binary, multi-class, multi-labelled and hierarchical classifications) of COVID-19 medical images. In case evaluation and benchmarking will be conducted, three future challenges will be encountered, namely, multiple evaluation criteria within each classification task, trade-off amongst criteria and importance of these criteria. According to the discussed future challenges, the process of evaluation and benchmarking AI techniques used in the classification of COVID-19 medical images considered multi-complex attribute problems. Thus, adopting multi-criteria decision analysis (MCDA) is an essential and effective approach to tackle the problem complexity. Moreover, this study proposes a detailed methodology for the evaluation and benchmarking of AI techniques used in all classification tasks of COVID-19 medical images as future directions; such methodology is presented on the basis of three sequential phases. Firstly, the identification procedure for the construction of four decision matrices, namely, binary, multi-class, multi-labelled and hierarchical, is presented on the basis of the intersection of evaluation criteria of each classification task and AI classification techniques. Secondly, the development of the MCDA approach for benchmarking AI classification techniques is provided on the basis of the integrated analytic hierarchy process and VlseKriterijumska Optimizacija I Kompromisno Resenje methods. Lastly, objective and subjective validation procedures are described to validate the proposed benchmarking solutions.Entities:
Keywords: Artificial intelligence; Benchmarking; COVID-19; Decision-making; Evaluation; MCDA; Medical image
Mesh:
Year: 2020 PMID: 32646771 PMCID: PMC7328559 DOI: 10.1016/j.jiph.2020.06.028
Source DB: PubMed Journal: J Infect Public Health ISSN: 1876-0341 Impact factor: 3.718
Fig. 1Method of SLR of the study topic.
Fig. 2Statistics of the included studies by databases and countries.
Fig. 3Taxonomy of research literature on AI techniques used in the detection and classification of COVID-19 medical images.
Summary of the perspectives of works described in research cluster studies.
| Ref. | Type of datasets | AI techniques | Case study | ||
|---|---|---|---|---|---|
| Primary data | Secondary data | Traditional machine learning techniques | Deep learning techniques | ||
| [ | χ | χ | χ | √ | CT scan |
| [ | √ | χ | √ | χ | CT scan |
| [ | √ | √ | √ | √ | CT scan |
| [ | √ | √ | √ | √ | X-ray |
| [ | χ | √ | χ | √ | X-ray |
| [ | χ | √ | χ | √ | X-ray |
| [ | χ | √ | χ | √ | X-ray |
| [ | χ | √ | √ | χ | X-ray |
| [ | χ | √ | √ | √ | X-ray |
| [ | √ | √ | √ | √ | X-ray |
Fig. 4Proposed methodology for the evaluation and benchmarking of binary, multi-class, multi-labelled and hierarchical classification of COIVID-19 AI classification techniques.
Evaluation criteria of binary, multi-class, multi-labelled and hierarchical AI classification techniques.
| Binary classification | ||
|---|---|---|
| Evaluation criteria | Formula | Description |
| Accuracy | Overall effectiveness of a classifier | |
| Precision | Class agreement of the data labels with the positive labels given by the classifier | |
| Recall (sensitivity) | Effectiveness of a classifier to identify positive labels | |
| F score | Relations between data positive labels and those given by a classifier | |
| Specificity | How effectively a classifier identifies negative labels | |
| AUC | Classifier’s ability to avoid false classification | |
tp = true positive, tn = true negative, fp = false positive, fn = false negative, AUC = area under the curve, μ = micro-averaging, M = macro-averaging, I = indicator function, Li = set of class labels, , = subclasses assigned by a classifier,,=.
DM of COVID-19 AI binary classification techniques.
| Evaluation criteria | Accuracy | Precision | Recall (sensitivity) | F score | Specificity | Area under the curve |
|---|---|---|---|---|---|---|
| AI COVID-19 classification techniques | ||||||
| Technique 1 | Av(T1/TS) | Pv (T1/TS) | Rv (T1/TS) | FSv (T1/TS) | Sv (T1/TS) | AUCv (T1/TS) |
| Technique 2 | Av (T2/TS) | Pv (T2/TS) | Rv (T2/TS) | FSv (T2/TS) | S (T2/TS) | AUCv (T2/TS) |
| . | . | . | . | . | . | . |
| Technique | Av(Tn/TS) | Pv (Tn/TS) | Rv (Tn/TS) | FSv (Tn/TS) | Sv (Tn/TS) | AUCv (Tn/TS) |
T = classification technique; Av = accuracy value; Pv = precision value; Rv = recall (sensitivity) value; FSv = F score value; Sv = specificity value; AUCv = area under the curve value; TS = test samples; n = number of AI classification techniques.
DM of COVID-19 AI multi-class classification techniques.
| Evaluation criteria | Average accuracy | Error rate | Precisionμ | Recallμ | F scoreμ | PrecisionM | RecallM | F scoreM |
|---|---|---|---|---|---|---|---|---|
| COVID-19 AI classification techniques | ||||||||
| Technique 1 | AAv (M1/TS) | ERv (M1/TS) | Pμv (M1/TS) | Rμv (M1/TS) | FSμv (M1/TS) | PMV (M1/TS) | RMV (M1/TS) | FSMV (M1/TS) |
| Technique 2 | AAv (M2/TS) | ERv (M2/TS) | Pμv (M2/TS) | Rμv (M2/TS) | FSμv (M2/TS) | PMV (M2/TS) | RMV (M2/TS) | FSMV (M2/TS) |
| . | . | . | . | . | . | . | . | . |
| Technique | AAv (Mn/TS) | ERv (Mn/TS) | Pμv (Mn/TS) | Rμv (Mn/TS) | FSμv (Mn/TS) | PMV (Mn/TS) | RMV (Mn/TS) | FSMV (Mn/TS) |
T = classification technique; AAv = average accuracy value; ERv = error rate value; Pμv = precisionμ value; Rμv = recallμ value; FSμv = F scoreμ value; PMV = precisionM value; RMV = recallM value; FSMV = F scoreM value; TS = test samples; n = number of AI classification techniques.
DM of COVID-19 AI multi-labelled classification techniques.
| Evaluation criteria | Exact match ratio | Labelling F score | Retrieval F score | Hamming loss |
|---|---|---|---|---|
| COVID-19 AI classification techniques | ||||
| Technique 1 | EMv (M1/TS) | LFv (M1/TS) | RFv (M1/TS) | HLv (M1/TS) |
| Technique 2 | EMv (M2/TS) | LFv (M2/TS) | RFv (M2/TS) | HLv (M2/TS) |
| . | . | . | . | . |
| Technique | EMv (Mn/TS) | LFv (Mn/TS) | RFv (Mn/TS) | HLv (Mn/TS) |
T = classification technique; EMv = exact match ratio value; LFv = labelling F score value; RFv = retrieval F score value; HLv = Hamming loss value; TS = test samples; n = number of AI classification techniques.
DM of COVID-19 AI hierarchical classification techniques.
| Evaluation criteria | Precision↓ | Recall↓ | F score↓ | Precision↑ | Recall↑ | F score ↑ |
|---|---|---|---|---|---|---|
| COVID-19 AI classification techniques | ||||||
| M1 | P↓v (M1/TS) | R↓v (M1/TS) | FS↓v (M1/TS) | P↑v (M1/TS) | P↑v (M1/TS) | FS↑v (M1/TS) |
| M2 | P↓v (M2/TS) | R↓v (M2/TS) | FS↓v (M2/TS) | P↑v (M2/TS) | P↑v (M2/TS) | FS↑v (M2/TS) |
| . | . | . | . | . | . | . |
| Mn | P↓v (Mn/TS) | R↓v (Mn/TS) | FS↓v (Mn/TS) | P↑v (Mn/TS) | P↑v (Mn/TS) | FS↑v (Mn/TS) |
T = classification technique; P↓v = precision↓ value; R↓v = recall↓ value; FS↓v = F score↓value; P↑v = precision↑ value; R↑v = recall↑ value; FS↑v = F score ↑value; TS = test samples; n = number of AI classification techniques.
Fig. 5Pairwise comparison example.