| Literature DB >> 34820029 |
Tomoyuki Noguchi1,2,3, Yumi Matsushita1, Yusuke Kawata4, Yoshitaka Shida4, Akihiro Machitori4.
Abstract
PURPOSE: Increased use of deep learning (DL) in medical imaging diagnoses has led to more frequent use of 10-fold cross-validation (10-CV) for the evaluation of the performance of DL. To eliminate some of the (10-fold) repetitive processing in 10-CV, we proposed a "generalized fitting method in conjunction with every possible coalition of N-combinations (G-EPOC)", to estimate the range of the mean accuracy of 10-CV using less than 10 results of 10-CV.Entities:
Keywords: appendicitis; computer simulation; cross validation; learning curve; machine learning; neural networks (computer)
Year: 2021 PMID: 34820029 PMCID: PMC8607834 DOI: 10.5114/pjr.2021.110309
Source DB: PubMed Journal: Pol J Radiol ISSN: 1733-134X
Figure 1Distribution of the scores of 10 samples in the simulation dataset. We set the distribution to follow the sigmoid function 1/(1 + exp (–αx)) under α = 1.28 and x = 0.05 to 0.95 with 0.1 as the increment, adjusting to the conditions for 100 and 0 for the maximal and minimal scores, respectively
Figure 2The process for making the estimation ranges from N samples. After we made (2N – 1) possible coalitions from N samples of 10 results of 10-CV, we applied them to 3 types of probability distribution fitting and then 3 types of generalized linear model fitting as the secondary processing. We then obtained the standard estimated range and the wide estimated range from the upper, mean, and lower limits of 95% CIs with N assigned as 2 to 9
Figure 3Representative examples of CT images in the appendicitis and control groups. A) Contrast-enhanced abdominal CT of a 32-year-old male in the appendicitis group shows an enlarged appendix with thickened wall enhancement (arrowheads) in conjunction with fluid and an appendicolith inside (arrow) at the right lower quadrant in the abdomen. B) The CECT of a 54-year-old female in the control group shows a normal gas-filled appendix with non-enhanced thin wall (arrowheads) at the apex of the cecum
Results of the simulation test
| Number of samplings from 10 samples in simulation dataset | Distribution model used in the primary statistical processing | Distribution model used in the secondary statistical processing | Use of wide estimated range | Successful estimation rate (%) | Average of successfully estimated range width (%) |
|---|---|---|---|---|---|
| 2 | Binominal distribution | Binominal distribution | Yes | 100 | 88.8 |
| 3 | Normal distribution | Binominal distribution | Yes | 100 | 73.3 |
| 4 | Normal distribution | Binominal distribution | Yes | 100 | 58.7 |
| 5 | Normal distribution | Binominal distribution | Yes | 100 | 46.2 |
| 6 | Normal distribution | Binominal distribution | Yes | 100 | 36.2 |
| 7 | Normal distribution | Binominal distribution | Yes | 99 | 28.6 |
| 8 | Normal distribution | Binominal distribution | Yes | 98 | 22.9 |
| 9 | Poisson distribution | Normal distribution | No | 100 | 13.6 |
Results of the practical test
| Number of samplings from 10 samples in simulation dataset | Successful estimation rate (mean/range) | Average of successfully estimated range width (mean/range) |
|---|---|---|
| 2 | 100.0% (98-100) | 87.6% (85.5-91.3) |
| 3 | 100.0% (100-100) | 73.6% (67.6-79.8) |
| 4 | 99.7% (94-100) | 60.8% (49.7-76.4) |
| 5 | 98.8% (92-100) | 51.0% (36.8-76.1) |
| 6 | 98.0% (88-100) | 43.9% (30.4-83.8) |
| 7 | 98.0% (90-100) | 39.2% (26.1-89.5) |
| 8 | 98.5% (89-100) | 36.2% (21-95.6) |
| 9 | 99.0% (72-100) | 13.8% (8.7-16.4) |
Figure 4The mean accuracy line for 5 practical datasets and the estimation range areas (ERAs) by G-EPOC. The mean accuracy line is completely covered by ERAs by G-EPOC
Figure 5The mean accuracy line and 95% CI range areas with t-distribution of the 5 practical datasets, and the midpoint lines in each for the first N sampling trial. The midpoints of between the upper and lower limits of the first sampling by G-EPOC with 8 and 9 as the N were included by the range of 95% CI with the t-distribution of all 5 practical datasets. However, some of the midpoints with the N assigned as 5 to 7 and all of those with the N assigned as 2 to 4 missed the 95% CI range with the t-distribution
Acquisition time (sec) of the first N sampling of G-EPOC
| Number of the first sampling from 10 samples in 10-CV for practice dataset | Pair number of positive and negative images | Acquisition time (sec) of the first N sampling of G-EPOC | ||||
|---|---|---|---|---|---|---|
| 124 | 250 | 374 | 500 | 624 | ||
| 2 | 0.28 | 0.29 | 0.26 | 0.26 | 0.29 | |
| 3 | 0.38 | 0.35 | 0.34 | 0.34 | 0.37 | |
| 4 | 0.39 | 0.41 | 0.39 | 0.38 | 0.39 | |
| 5 | 0.49 | 0.48 | 0.48 | 0.49 | 0.5 | |
| 6 | 0.66 | 0.68 | 0.67 | 0.69 | 0.69 | |
| 7 | 1.03 | 1.05 | 1.06 | 1.09 | 1.12 | |
| 8 | 1.79 | 1.85 | 1.86 | 1.92 | 1.92 | |
| 9 | 3.28 | 3.38 | 3.45 | 3.54 | 3.64 | |
Expected time-saving (%) of the first N of the result datasets of 10-CV
| Number of the first sampling from 10 samples in 10-CV for practice dataset | Pair number of positive and negative images | Expected time-saving (%) of the first N of the result datasets of 10-CV | ||||
|---|---|---|---|---|---|---|
| 124 | 250 | 374 | 500 | 624 | ||
| 2 | 80% | 81% | 79% | 81% | 80% | |
| 3 | 69% | 74% | 67% | 74% | 69% | |
| 4 | 61% | 63% | 52% | 60% | 61% | |
| 5 | 48% | 55% | 36% | 51% | 48% | |
| 6 | 35% | 46% | 30% | 41% | 35% | |
| 7 | 28% | 36% | 21% | 32% | 28% | |
| 8 | 18% | 23% | 14% | 19% | 18% | |
| 9 | 8% | 17% | 9% | 6% | 8% | |
Results of the random dataset test
| Number of sampling from 10 samples in 10-CV for practical dataset | Pair number of positive and negative images | Averaged upper and lower limits and midpoints of the estimation ranges (midpoint [upper limit – lower limit / width]) | ||||
|---|---|---|---|---|---|---|
| 124 | 250 | 374 | 500 | 624 | ||
| 2 | 56% | 58% | 58.4% | 58.4% | 57.9% | |
| 3 | 62.9% | 68.3% | 67% | 69.6% | 69.3% | |
| 4 | 68.7% | 75.1% | 73.3% | 77.1% | 77% | |
| 5 | 73.4% | 79.6% | 77.4% | 81.6% | 81.2% | |
| 6 | 76.9% | 82.3% | 79.5% (65.1-93.9/28.9) | 84.2% | 83.8% | |
| 7 | 79.2% | 83.7% | 81% | 85.5% | 85.2% | |
| 8 | 80.8% | 84.5% | 81.4% | 86.3% | 85.9% | |
| 9 | 77.9% | 85.8% | 82.4% | 87.3% | 86.8% | |
| 10 (=total acquisition) | 76.7% | 85.6% | 82.3% | 87.3% | 86.8% | |
Acquisition time (s) of the first N sampling of the practical datasets
| Number of the first sampling from 10 samples in 10-CV for practice dataset | Pair number of positive and negative images | Acquisition time (sec) of the first N sampling of the practical dataset | ||||
|---|---|---|---|---|---|---|
| 124 | 250 | 374 | 500 | 624 | ||
| 2 | 363 | 670 | 1044 | 1535 | 1642 | |
| 3 | 639 | 920 | 1672 | 2174 | 2532 | |
| 4 | 755 | 1284 | 2413 | 3293 | 3191 | |
| 5 | 946 | 1572 | 3240 | 4043 | 4220 | |
| 6 | 1155 | 1879 | 3559 | 4903 | 5249 | |
| 7 | 1402 | 2243 | 4019 | 5652 | 5862 | |
| 8 | 1452 | 2702 | 4366 | 6734 | 6661 | |
| 9 | 1652 | 2896 | 4628 | 7779 | 7459 | |
| 10 (= total acquisition) | 1824 | 3506 | 5060 | 8270 | 8118 | |