| Literature DB >> 27905520 |
Jianfang Cao1, Lichao Chen2, Min Wang2, Hao Shi2, Yun Tian1.
Abstract
Image classification uses computers to simulate human understanding and cognition of images by automatically categorizing images. This study proposes a faster image classification approach that parallelizes the traditional Adaboost-Backpropagation (BP) neural network using the MapReduce parallel programming model. First, we construct a strong classifier by assembling the outputs of 15 BP neural networks (which are individually regarded as weak classifiers) based on the Adaboost algorithm. Second, we design Map and Reduce tasks for both the parallel Adaboost-BP neural network and the feature extraction algorithm. Finally, we establish an automated classification model by building a Hadoop cluster. We use the Pascal VOC2007 and Caltech256 datasets to train and test the classification model. The results are superior to those obtained using traditional Adaboost-BP neural network or parallel BP neural network approaches. Our approach increased the average classification accuracy rate by approximately 14.5% and 26.0% compared to the traditional Adaboost-BP neural network and parallel BP neural network, respectively. Furthermore, the proposed approach requires less computation time and scales very well as evaluated by speedup, sizeup and scaleup. The proposed approach may provide a foundation for automated large-scale image classification and demonstrates practical value.Entities:
Year: 2016 PMID: 27905520 PMCID: PMC5131302 DOI: 10.1038/srep38201
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1The classification model for massive image datasets.
Figure 2A snapshot of the automated image classification system user interface.
Classification accuracy (%) of different approaches based on the Pascal VOC2007 dataset.
| Image category | RF | SVM | Adaboost-BP | The method of Shi | Parallel BP | The proposed approach |
|---|---|---|---|---|---|---|
| plane | 81.9 | 82.5 | 82.6 | 87.2 | 90.0 | 94.1 |
| bike | 80.0 | 79.9 | 79.8 | 83.3 | 86.2 | 90.5 |
| bird | 78.2 | 78.4 | 78.4 | 83.9 | 87.1 | 92.6 |
| boat | 79.9 | 80.1 | 80.5 | 85.0 | 87.6 | 94.3 |
| btl | 55.0 | 54.2 | 54.6 | 57.1 | 63.2 | 73.6 |
| bus | 72.8 | 73.2 | 73.1 | 79.6 | 82.9 | 87.3 |
| car | 81.5 | 80.8 | 81.9 | 86.4 | 89.8 | 95.7 |
| cat | 83.1 | 83.2 | 83.6 | 86.8 | 89.4 | 94.5 |
| chair | 65.4 | 64.9 | 65.8 | 70.5 | 72.6 | 83.6 |
| cow | 66.2 | 66.5 | 67.0 | 71.4 | 74.0 | 86.3 |
| table | 60.1 | 60.0 | 60.3 | 65.2 | 70.5 | 81.6 |
| dog | 84.3 | 84.1 | 84.7 | 88.4 | 90.4 | 96.5 |
| horse | 81.9 | 82.1 | 82.4 | 87.3 | 91.3 | 95.9 |
| moto | 79.0 | 79.2 | 79.3 | 83.8 | 87.2 | 92.8 |
| pers | 86.5 | 86.9 | 87.3 | 93.5 | 95.7 | 97.1 |
| plant | 58.3 | 59.1 | 60.2 | 66.1 | 71.9 | 78.6 |
| sheep | 73.6 | 74.1 | 74.8 | 79.9 | 82.6 | 90.0 |
| sofa | 63.5 | 63.9 | 65.7 | 68.3 | 73.8 | 79.9 |
| train | 81.0 | 81.4 | 81.7 | 85.9 | 89.1 | 94.7 |
| TV | 73.3 | 73.9 | 74.8 | 78.4 | 84.2 | 89.1 |
| Mean | 74.3 | 74.4 | 74.9 | 79.4 | 83.0 | 89.4 |
| Standard deviation | 9.314 | 9.377 | 9.213 | 9.400 | 8.576 | 6.634 |
Average accuracy (%) of different approaches based on the Caltech256 dataset and SUN dataset.
| Image dataset | Classification approach | Min (%) | Max (%) | Average accuracy (%) |
|---|---|---|---|---|
| Caltech256 dataset | RF | 51.7 | 68.3 | 59.8 |
| SVM | 51.4 | 70.6 | 60.1 | |
| Adaboost-BP | 52.9 | 71.1 | 60.3 | |
| The method of Shi | 60.8 | 79.5 | 68.7 | |
| Parallel BP | 72.3 | 86.1 | 78.7 | |
| The proposed approach | 82.2 | 95.9 | 86.3 | |
| SUN dataset | RF | 54.2 | 70.0 | 61.9 |
| SVM | 53.7 | 71.6 | 62.1 | |
| Adaboost-BP | 55.1 | 71.9 | 63.4 | |
| The method of Shi | 63.3 | 81.7 | 72.5 | |
| Parallel BP | 73.6 | 90.5 | 82.1 | |
| The proposed approach | 83.9 | 96.1 | 89.2 |
Figure 3Comparison of the average accuracy (%) of the different approaches based on different datasets.
Average accuracy (%) of different approaches based on different numbers of images and image categories.
| Image categories | Classification approach | Number of images | |||||
|---|---|---|---|---|---|---|---|
| 500 | 1,000 | 2,000 | 5,000 | 10,000 | 20,000 | ||
| 5 | RF | 96.1 | 95.0 | 91.9 | 85.8 | 77.8 | 67.9 |
| SVM | 96.3 | 95.1 | 92.2 | 86.5 | 78.6 | 69.3 | |
| Adaboost-BP | 96.3 | 95.0 | 92.6 | 87.4 | 80.1 | 71.7 | |
| The method of Shi | 96.3 | 95.3 | 93.5 | 90.0 | 85.5 | 78.3 | |
| Parallel BP | 100.0 | 99.6 | 99.4 | 99.2 | 97.5 | 94.6 | |
| Parallel Adaboost-BP | 100.0 | 100.0 | 99.9 | 99.8 | 98.0 | 97.1 | |
| 10 | RF | 95.0 | 93.5 | 88.9 | 82.6 | 74.3 | 63.8 |
| SVM | 95.1 | 93.9 | 89.7 | 83.8 | 76.1 | 65.2 | |
| Adaboost-BP | 95.1 | 94.2 | 91.4 | 85.5 | 77.2 | 67.0 | |
| The method of Shi | 95.5 | 94.7 | 92.6 | 89.3 | 84.5 | 78.0 | |
| Parallel BP | 99.7 | 99.5 | 99.3 | 99.1 | 96.8 | 94.0 | |
| Parallel Adaboost-BP | 100.0 | 100.0 | 99.8 | 99.7 | 97.3 | 96.3 | |
| 20 | RF | 93.1 | 92.0 | 85.9 | 77.9 | 69.1 | 58.8 |
| SVM | 93.5 | 92.0 | 86.7 | 79.3 | 70.4 | 60.5 | |
| Adaboost-BP | 94.0 | 92.1 | 88.3 | 81.9 | 72.6 | 62.0 | |
| The method of Shi | 94.0 | 94.0 | 91.6 | 87.2 | 80.5 | 71.8 | |
| Parallel BP | 99.5 | 99.4 | 99.0 | 97.3 | 94.1 | 90.8 | |
| Parallel Adaboost-BP | 99.8 | 99.6 | 99.5 | 99.0 | 97.1 | 95.3 | |
| 50 | RF | 92.7 | 89.4 | 82.8 | 75.0 | 66.6 | 56.8 |
| SVM | 92.8 | 90.1 | 83.9 | 76.9 | 68.1 | 58.3 | |
| Adaboost-BP | 92.8 | 90.3 | 85.6 | 79.1 | 70.7 | 60.6 | |
| The method of Shi | 93.3 | 93.0 | 91.0 | 86.6 | 78.5 | 68.9 | |
| Parallel BP | 98.8 | 98.7 | 98.0 | 97.0 | 93.8 | 89.6 | |
| Parallel Adaboost-BP | 99.7 | 99.0 | 98.5 | 97.4 | 96.0 | 93.4 | |
| 100 | RF | 89.7 | 87.6 | 80.1 | 73.7 | 64.1 | 54.1 |
| SVM | 90.3 | 88.0 | 81.9 | 74.2 | 65.8 | 56.0 | |
| Adaboost-BP | 90.9 | 88.4 | 83.7 | 77.5 | 68.4 | 58.3 | |
| The method of Shi | 92.3 | 92.1 | 89.0 | 83.2 | 69.7 | 59.4 | |
| Parallel BP | 98.9 | 97.6 | 95.0 | 93.9 | 90.2 | 83.1 | |
| Parallel Adaboost-BP | 99.5 | 99.0 | 98.1 | 94.8 | 92.3 | 90.0 | |
| 200 | RF | 87.0 | 83.6 | 76.9 | 68.8 | 58.1 | 46.0 |
| SVM | 87.1 | 84.3 | 77.8 | 70.0 | 60.2 | 49.2 | |
| Adaboost-BP | 87.2 | 85.1 | 79.5 | 72.3 | 62.4 | 51.0 | |
| The method of Shi | 90.3 | 89.5 | 87.6 | 81.2 | 68.7 | 58.3 | |
| Parallel BP | 98.5 | 97.1 | 93.9 | 90.7 | 85.3 | 78.2 | |
| Parallel Adaboost-BP | 99.5 | 98.5 | 98.1 | 93.4 | 90.7 | 86.1 | |
The accuracy declines (%) of different approaches as the number of images in each category changes from 5 to 200.
| Classification approaches | 500 images | 1,000 images | 2,000 images | 5,000 images | 10,000 images | 20,000 images |
|---|---|---|---|---|---|---|
| RF | 9.1 | 11.4 | 15.0 | 17.0 | 19.7 | 21.9 |
| SVM | 9.2 | 10.8 | 14.4 | 16.8 | 18.4 | 20.1 |
| Adaboost-BP | 9.1 | 9.9 | 13.1 | 15.1 | 17.7 | 20.7 |
| The method of Shi | 6.0 | 5.8 | 5.9 | 8.8 | 16.8 | 20.0 |
| Parallel BP | 1.5 | 2.5 | 5.5 | 8.5 | 12.2 | 16.4 |
| Parallel Adaboost-BP | 0.5 | 1.5 | 1.8 | 6.4 | 7.3 | 11.0 |
Figure 4(a) The accuracy decline curve of different approaches with 5 images in each category. (b) The accuracy decline curve of different approaches with 10 images in each category. (c) The accuracy decline curve of different approaches with 20 images in each category. (d) The accuracy decline curve of different approaches with 50 images in each category. (e) The accuracy decline curve of different approaches with 100 images in each category. (f) The accuracy decline curve of different approaches with 200 images in each category.
The number of correctly classified images and incorrectly classified images in the test sets of the three datasets.
| Classification approaches | The number of correctly classified images | The number of incorrectly classified images |
|---|---|---|
| RF | 18,725 | 11,205 |
| SVM | 18,795 | 11,135 |
| Adaboost-BP | 18,998 | 10,932 |
| The method of Shi | 21,442 | 8,488 |
| Parallel BP | 24,133 | 5,797 |
| Parallel Adaboost-BP | 26,300 | 3,630 |
Figure 5The results of the chi-square test.
Running times for the different approaches.
| (a) Training time (s) | ||||
|---|---|---|---|---|
| Classification approach | Image category | 1,000 images | 5,000 images | 15,000 images |
| RF | 10 | 56 | 373 | 5,874 |
| 30 | 58 | 388 | 6,137 | |
| 100 | 67 | 404 | 6,581 | |
| SVM | 10 | 55 | 371 | 5,869 |
| 30 | 58 | 386 | 6,130 | |
| 100 | 68 | 402 | 6,573 | |
| Adaboost-BP | 10 | 56 | 372 | 5,872 |
| 30 | 59 | 388 | 6,135 | |
| 100 | 67 | 403 | 6,579 | |
| The method of Shi | 10 | 55 | 370 | 5,431 |
| 30 | 59 | 391 | 5,955 | |
| 100 | 68 | 401 | 6,417 | |
| Parallel BP | 10 | 12 | 47 | 139 |
| 30 | 14 | 50 | 149 | |
| 100 | 19 | 54 | 155 | |
| Parallel Adaboost-BP | 10 | 11 | 45 | 129 |
| 30 | 14 | 49 | 133 | |
| 100 | 18 | 52 | 148 | |
| RF | 10 | 4 | 6 | 10 |
| 30 | 4 | 7 | 13 | |
| 100 | 6 | 11 | 16 | |
| SVM | 10 | 4 | 6 | 9 |
| 30 | 4 | 7 | 12 | |
| 100 | 6 | 10 | 15 | |
| Adaboost-BP | 10 | 4 | 6 | 9 |
| 30 | 4 | 7 | 12 | |
| 100 | 6 | 10 | 16 | |
| The method of Shi | 10 | 4 | 5 | 9 |
| 30 | 5 | 7 | 11 | |
| 100 | 5 | 10 | 14 | |
| Parallel BP | 10 | 1 | 2 | 4 |
| 30 | 1 | 2 | 4 | |
| 100 | 1 | 3 | 4 | |
| Parallel Adaboost-BP | 10 | 1 | 2 | 3 |
| 30 | 1 | 2 | 3 | |
| 100 | 1 | 2 | 4 | |
Figure 6Speedup comparison.
Figure 7Sizeup comparison.
Figure 8Scaleup comparison.
Figure 9MapReduce implementation process.
Figure 10The parallel model of the proposed algorithm.