| Literature DB >> 31687493 |
Khairun Saddami1, Khairul Munadi1,2, Yuwaldi Away1,2, Fitri Arnia1,2.
Abstract
Document image binarization is a challenging task because of combined degradation in a document. In this study, a new binarization method is proposed for binarizing an ancient document with combined degradation. The proposed method comprises the following four stages: histogram analysis, contrast enhancement, local adaptive thresholding, and artifact removal. In histogram analysis, a new approach is applied to establish a uniform background. Next, the image contrast is enhanced using a new contrast enhancement, and then the document is binarized using a novel local adaptive thresholding. Artifacts from the binarization process are removed in the artifact removal stage. Finally, an experiment is conducted using one private and four public datasets and by simulating the proposed method with and without contrast enhancement. The results showed that the proposed method is faster and more effective compared to other state-of-the-art procedures for binarizing ancient documents.Entities:
Keywords: Computer science; Degradation combination; Document image binarization; Local adaptive thresholding; Uniform histogram
Year: 2019 PMID: 31687493 PMCID: PMC6820306 DOI: 10.1016/j.heliyon.2019.e02613
Source DB: PubMed Journal: Heliyon ISSN: 2405-8440
Figure 1Examples of ancient documents suffering from combinations of degradation types. (a) Document suffering from a combination of ink-bleed through, faint text, and yellowing background. (b) Ancient Jawi document suffering from a combination of faint text, low brightness, water-spilling, and non-uniform illumination. (c) Document suffering from a combination of low contrast, faint text, and uneven object.
Figure 2Pipeline of the proposed method.
Figure 3Illustration of histogram analysis for (a) a document with combination of degradation types (b) after applying Eq. (2) and (c) after applying Eq. (3).
Figure 4Binarization result of the image in Fig. 1a using the following methods: (a) GT; (b) Proposed I; (c) Proposed II; (d)MG-Sauvola; (e) Lu; (f) Bataineh; (g) Su; (h) Howe; (i) Ramirez; (j) FAIR; (k) Nafchi; and (l) WH16.
Figure 5Binarization result of the image in Fig. 1c using the following methods: (a) GT; (b) Proposed I; (c) Proposed II; (d) MG-Sauvola; (e) Lu; (f) Bataineh; (g) Su; (h) Howe; (i) Ramirez; (j) FAIR; (k) Nafchi; and (l) WH16.
Figure 6Binarization result of the image in Fig. 1b using the following methods: (a) GT; (b) Proposed I; (c) Proposed II; (d) MG-Sauvola; (e) Lu; (f) Bataineh; (g) Su; (h) Howe; (i) Ramirez; (j) FAIR; (k) Nafchi; and (l) WH16.
Comparison of the performance of different methods on the DIBCO 2013 dataset.
| Methods | FM | FMps | PSNR | DRD | MPM |
|---|---|---|---|---|---|
| MG-Sauvola | 84.10 | 89.33 | 17.89 | 4.85 | 1.98 |
| Lu | 87.08 | 88.03 | 18.75 | 4.27 | 3.2 |
| Bataineh | 77.81 | 81.27 | 15.21 | 15.09 | 18.33 |
| Howe | 91.79 | 3.55 | |||
| Su | 87.70 | 88.15 | 19.59 | 4.21 | 3.02 |
| Ramirez | 90.43 | 92.94 | 19.32 | 3.91 | 3.32 |
| FAIR | 90.78 | 91.47 | 20.54 | 3.59 | 3.35 |
| Nafchi | 90.41 | 90.99 | 19.44 | 3.47 | 2.08 |
| WH16 | 91.26 | 91.82 | 21.21 | 3.53 | |
| Proposed I | 89.73 | 93.89 | 18.94 | 3.50 | 1.57 |
| Proposed II | 89.41 | 18.87 | 3.51 |
Comparison of the performance of different methods on the HDIBCO 2014 dataset.
| Methods | FM | FMps | PSNR | DRD | MPM |
|---|---|---|---|---|---|
| MG-Sauvola | 87.70 | 90.90 | 18.04 | 4.04 | 0.72 |
| Lu | 91.08 | 91.64 | 19.71 | 3.08 | 0.96 |
| Bataineh | 87.32 | 89.02 | 17.75 | 4.57 | 2.32 |
| Howe | 97.38 | 1.08 | 0.33 | ||
| Su | 94.38 | 95.94 | 20.31 | 1.95 | 0.33 |
| Ramirez | 92.26 | 94.38 | 19.72 | 2.61 | 0.36 |
| FAIR | 96.14 | 96.73 | 21.88 | 1.25 | |
| Nafchi | 93.35 | 96.05 | 19.45 | 2.19 | – |
| WH16 | 96.38 | 22.11 | |||
| Proposed I | 93.54 | 95.70 | 20.25 | 2.01 | 0.90 |
| Proposed II | 93.11 | 96.03 | 19.51 | 2.07 | 0.77 |
Comparison of the performance of different methods on the HDIBCO 2016 dataset.
| Methods | FM | FMps | PSNR | DRD | MPM |
|---|---|---|---|---|---|
| MG-Sauvola | 87.54 | 90.74 | 17.96 | 4.74 | 4.80 |
| Lu | 84.44 | 92.04 | 17.33 | 0.12 | 2.29 |
| Bataineh | 82.08 | 84.08 | 15.47 | 10.00 | 7.92 |
| Howe | 87.47 | 92.28 | 18.05 | 5.35 | 9.30 |
| Su | 84.75 | 88.94 | 17.64 | 5.64 | 4.61 |
| Ramirez | 88.23 | 18.44 | 4.17 | 3.58 | |
| FAIR | 88.50 | 92.51 | 18.31 | 4.27 | 3.32 |
| Nafchi | 88.11 | 91.17 | 18.00 | 4.38 | – |
| WH16 | 87.61 | 92.40 | 18.11 | 5.21 | 8.70 |
| Proposed I | 92.36 | ||||
| Proposed II | 90.78 | 92.43 | 19.13 | 3.46 | 1.96 |
Comparison of the performance of different methods on the PHIBD dataset.
| Methods | FM | FMps | PSNR | DRD | MPM |
|---|---|---|---|---|---|
| MG-Sauvola | 87.21 | 89.09 | 18.72 | 9.20 | 5.21 |
| Lu | 87.95 | 91.07 | 18.35 | 4.61 | 0.96 |
| Bataineh | 82.82 | 84.53 | 16.67 | 12.64 | 9.17 |
| Howe | 90.97 | 92.91 | 19.29 | 3.03 | 1.94 |
| Su | 88.21 | 88.82 | 18.27 | 5.44 | 2.65 |
| Ramirez | 3.07 | ||||
| FAIR | 71.38 | 72.39 | 14.73 | 11.25 | 2.34 |
| Nafchi | 92.26 | 94.00 | 20.15 | 4.08 | – |
| WH16 | 89.84 | 91.56 | 19.09 | 3.15 | 2.39 |
| Proposed I | 91.47 | 93.00 | 19.64 | 2.08 | |
| Proposed II | 91.39 | 92.88 | 19.60 | 2.94 | 2.08 |
Comparison of the performance of different methods on the Jawi dataset.
| Methods | FM | FMps | PSNR | DRD | MPM |
|---|---|---|---|---|---|
| MG-Sauvola | 90.31 | 92.50 | 15.35 | 4.98 | 8.80 |
| Lu | 85.17 | 88.13 | 13.81 | 6.17 | 3.70 |
| Bataineh | 87.04 | 90.95 | 14.37 | 7.15 | 8.14 |
| Howe | 83.43 | 84.88 | 13.03 | 7.21 | 7.75 |
| Su | 85.98 | 84.78 | 14.04 | 7.27 | 6.32 |
| Ramirez | 77.55 | 79.22 | 13.96 | 9.75 | 7.15 |
| FAIR | 80.89 | 80.90 | 11.69 | 9.00 | 9.97 |
| Nafchi | 85.80 | 89.25 | 13.64 | 7.20 | 15.86 |
| WH16 | 80.10 | 81.63 | 12.84 | 7.48 | 14.98 |
| Proposed I | 3.12 | ||||
| Proposed II | 91.08 | 93.44 | 15.63 | 3.34 |
Comparison of the average results of different methods across the datasets.
| Methods | FM | FMps | PSNR | DRD | MPM |
|---|---|---|---|---|---|
| MG-Sauvola | 87.37 | 90.51 | 17.59 | 5.56 | 4.30 |
| Lu | 86.93 | 90.85 | 17.50 | 4.61 | 1.95 |
| Bataineh | 83.41 | 85.97 | 15.89 | 9.89 | 9.18 |
| Howe | 89.94 | 91.85 | 3.97 | 4.57 | |
| Su | 88.20 | 89.32 | 17.97 | 4.90 | 3.39 |
| Ramirez | 88.32 | 90.79 | 18.34 | 4.70 | 3.23 |
| FAIR | 85.54 | 86.80 | 17.43 | 5.87 | 3.85 |
| Nafchi | 88.85 | 90.77 | 17.98 | 4.23 | 3.41 |
| WH16 | 89.46 | 91.37 | 18.66 | 4.12 | 4.42 |
| Proposed I | 1.91 | ||||
| Proposed II | 91.03 | 93.63 | 18.49 | 3.05 |
Comparison of the running times of different binarization methods on the DIBCO, PHIBD, and Jawi datasets.
| Dataset | DIBCO | PHIBD | Jawi | Average |
|---|---|---|---|---|
| Bataineh | 59.72 | 99.28 | 10.33 | 56.44 |
| Su | 27.51 | 26.22 | 3.04 | 18.92 |
| Nafchi | 209.77 | 189.51 | 20.55 | 139.94 |
| Howe | 250.02 | 239.14 | 25.39 | 171.52 |
| Proposed I | ||||
| Proposed II | 26,75 | 24.96 | 2.67 | 18.12 |