| Literature DB >> 35395867 |
Steven A Hicks1,2, Inga Strümke3, Vajira Thambawita3,4, Malek Hammou3, Michael A Riegler3, Pål Halvorsen3,4, Sravanthi Parasa5.
Abstract
Clinicians and software developers need to understand how proposed machine learning (ML) models could improve patient care. No single metric captures all the desirable properties of a model, which is why several metrics are typically reported to summarize a model's performance. Unfortunately, these measures are not easily understandable by many clinicians. Moreover, comparison of models across studies in an objective manner is challenging, and no tool exists to compare models using the same performance metrics. This paper looks at previous ML studies done in gastroenterology, provides an explanation of what different metrics mean in the context of binary classification in the presented studies, and gives a thorough explanation of how different metrics should be interpreted. We also release an open source web-based tool that may be used to aid in calculating the most relevant metrics presented in this paper so that other researchers and clinicians may easily incorporate them into their research.Entities:
Mesh:
Year: 2022 PMID: 35395867 PMCID: PMC8993826 DOI: 10.1038/s41598-022-09954-8
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
The reported metrics of the selected studies.
| STUDY | SET | REPORTED | EVALUATION | TOTAL | POS | NEG | TP | TN | FP | FN | ACC | PREC | REC | F1 | SPEC | MCC | NPV | TS |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | In-text | Per-finding | – | 338 | – | 337 | – | – | 1 | – | – | 1.00 | – | – | – | – | – |
| 2 | 1 | T2.R1 | Per-frame | 210 | 143 | 67 | – | – | – | – | 0.71 | 0.89 | 0.68 | 0.75 | – | – | – | – |
| 1 | T2.R2 | Per-frame | 210 | 143 | 67 | – | – | – | – | 0.77 | 0.81 | 0.88 | 0.83 | – | – | – | – | |
| 1 | T2.R3 | Per-frame | 210 | 143 | 67 | – | – | – | – | 0.87 | 0.91 | 0.84 | 0.87 | – | – | – | – | |
| 2 | T3.R1 | Per-frame | 48 | 13 | 35 | – | – | – | – | – | 0.65 | 0.85 | 0.73 | – | – | – | – | |
| 2 | T3.R2 | Per-frame | 48 | 35 | 13 | – | – | – | – | – | 0.94 | 0.83 | 0.88 | – | – | – | – | |
| 2 | T3.R3 | Per-frame | – | – | – | – | – | – | – | 0.83 | 0.86 | 0.83 | 0.84 | – | – | – | – | |
| 3 | 1 | T1 | Per-frame | 106 | – | – | 65 | 33 | 7 | 1 | 0.94 | 0.90 | 0.98 | – | 0.83 | – | 0.97 | – |
| 4 | 1 | T2.R1 | Per-frame | 6000 | 6000 | 0 | 5663 | 0 | 251 | 337 | – | – | 0.94 | – | – | – | – | – |
| 1 | T2.R2 | Per-frame | 1414 | 1414 | 0 | 1296 | 0 | 41 | 118 | – | – | 0.92 | – | – | – | – | – | |
| 1 | T2.R3 | Per-frame | 21,572 | 0 | 21,572 | 0 | 20,691 | 1004 | 0 | – | – | – | 0.96 | – | – | – | ||
| 2 | T2.R4 | Per-frame | – | – | – | 570 | 0 | 42 | 76 | – | 0.88 | – | – | – | – | – | – | |
| 3 | In-text | Per-frame | 60,914 | – | – | – | – | – | – | – | – | 0.92 | – | – | – | – | – | |
| 4 | In-text | Per-frame | 1,072,483 | 0 | 1,072,483 | 0 | – | – | 0 | – | – | – | – | 0.95 | – | – | – | |
| 5 | 1 | T1 | Per-frame | – | – | – | 3723 | 4735 | 262 | 930 | 0.88 | – | 0.80 | – | 0.95 | – | – | – |
The STUDY column represents each of the five studies selected for metric recalculation. The SET column is the different metrics calculated for the same set of data. The REPORTED column is how the metrics were reported in the respective study. To refer to the tables in each respective paper, we use T to refer to the table number and R for the row number. The EVALUATION column is the method used to generate the metrics. The TOTAL column is the total number of samples used in the metrics calculations. The POS and NEG columns represent the total number of positive and negative samples, respectively. The remaining columns correspond the aforementioned metric acronyms described in the main text.
Figure 1A visualization of the study selection process.
The recalculated metrics of the selected papers.
| STUDY | SET | REPORTED | EVALUATION | TOTAL | POS | NEG | TP | TN | FP | FN | ACC | PREC | REC | F1 | SPEC | MCC | NPV | TS |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | In-text | Per-polyp | 16,900,000 | 338 | 16,899,662 | 337 | 16,730,662 | 169,000 | 1 | 0.99 | 0.00 | 1.00 | 0.00 | 0.99 | 0.04 | 1.00 | 0.00 |
| 1 | Calculated | Per-frame | 16,900,000 | 84,500 | 16,815,500 | 84,500 | 16,646,500 | 169,000 | 250 | 0.99 | 0.33 | 1.00 | 0.50 | 0.99 | 0.57 | 1.00 | 0.33 | |
| 2 | 1 | T2.R1 | Per-frame | 210 | 143 | 67 | 97 | 55 | 12 | 46 | 0.71 | 0.89 | 0.68 | 0.77 | 0.82 | 0.47 | 0.55 | 0.63 |
| 1 | T2.R2 | Per-frame | 210 | 143 | 67 | 116 | 40 | 27 | 27 | 0.77 | 0.81 | 0.88 | 0.81 | 0.59 | 0.40 | 0.59 | 0.68 | |
| 1 | T2.R3 | Per-frame | 210 | 143 | 67 | 130 | 54 | 13 | 13 | 0.87 | 0.91 | 0.84 | 0.91 | 0.81 | 0.72 | 0.81 | 0.83 | |
| 2 | T3.R1 | Per-frame | 48 | 13 | 35 | 11 | 29 | 6 | 2 | 0.84 | 0.65 | 0.85 | 0.74 | 0.83 | 0.63 | 0.94 | 0.58 | |
| 2 | T3.R2 | Per-frame | 48 | 35 | 13 | 29 | 11 | 2 | 6 | 0.84 | 0.94 | 0.83 | 0.88 | 0.86 | 0.64 | 0.65 | 0.79 | |
| 2 | T3.R3 | AVG | – | – | – | – | – | – | – | 0.84 | 0.80 | 0.84 | 0.81 | 0.84 | 0.63 | 0.79 | 0.69 | |
| 2 | Calculated | WAVG | – | – | – | – | – | – | – | 0.84 | 0.86 | 0.84 | 0.84 | 0.85 | 0.64 | 0.73 | 0.73 | |
| 3 | 1 | T1 | Per-frame | 106 | 66 | 40 | 65 | 33 | 7 | 1 | 0.92 | 0.90 | 0.98 | 0.94 | 0.83 | 0.84 | 0.97 | 0.89 |
| 4 | 1 | T2.R1 | Per-frame | 6000 | 6000 | 0 | 5663 | 0 | 251 | 337 | 0.94 | 0.96 | 0.94 | 0.95 | − 0.05 | 0 | 0.91 | |
| 1 | T2.R2 | Per-frame | 1414 | 1414 | 0 | 1296 | 0 | 41 | 118 | 0.92 | 0.97 | 0.92 | 0.94 | − 0.05 | 0 | 0.89 | ||
| 1 | T2.R3 | Per-frame | 21,572 | 0 | 21,695 | 0 | 20,691 | 1004 | 0 | 0.95 | 0 | 0 | 0.95 | 1 | 0 | |||
| 1 | Calculated | Combined | 27,572 | 6000 | 21,695 | 5663 | 20,691 | 1255 | 337 | 0.95 | 0.82 | 0.94 | 0.88 | 0.95 | 0.84 | 0.98 | 0.78 | |
| 1 | Calculated | Biased POS | 27,572 | 6000 | 21,695 | 6000 | 0 | 21,695 | 0 | 0.22 | 0.22 | 1 | 0.36 | 0 | 0.22 | |||
| 1 | Calculated | Biased NEG | 27,572 | 6000 | 21,695 | 0 | 21,695 | 0 | 6000 | 0.78 | 0 | 0 | 1 | 0.78 | 0 | |||
| 2 | T2.R4 | Per-frame | 646 | 646 | 42 | 570 | 0 | 42 | 76 | 0.83 | 0.93 | 0.88 | 0.91 | 0 | − 0.09 | 0 | 0.83 | |
| 3 | In-text | Per-frame | 60,914 | – | – | – | – | – | – | – | – | 0.92 | – | – | – | – | – | |
| 4 | In-text | Per-frame | 1,072,483 | 0 | 1,072,483 | 0 | 1,023,149 | 49,334 | 0 | 0.95 | 0 | 0 | 0.95 | 1 | 0 | |||
| 5 | 1 | T1 | Per-frame | 9650 | 4653 | 4997 | 3723 | 4735 | 262 | 930 | 0.88 | 0.93 | 0.80 | 0.86 | 0.95 | 0.76 | 0.84 | 0.76 |
Columns represent the same as described in Table 1.