| Literature DB >> 22164106 |
Darko Brodic1, Dragan R Milivojevic, Zoran N Milivojevic.
Abstract
The paper introduces a testing framework for the evaluation and validation of text line segmentation algorithms. Text line segmentation represents the key action for correct optical character recognition. Many of the tests for the evaluation of text line segmentation algorithms deal with text databases as reference templates. Because of the mismatch, the reliable testing framework is required. Hence, a new approach to a comprehensive experimental framework for the evaluation of text line segmentation algorithms is proposed. It consists of synthetic multi-like text samples and real handwritten text as well. Although the tests are mutually independent, the results are cross-linked. The proposed method can be used for different types of scripts and languages. Furthermore, two different procedures for the evaluation of algorithm efficiency based on the obtained error type classification are proposed. The first is based on the segmentation line error description, while the second one incorporates well-known signal detection theory. Each of them has different capabilities and convenience, but they can be used as supplements to make the evaluation process efficient. Overall the proposed procedure based on the segmentation line error description has some advantages, characterized by five measures that describe measurement procedures.Entities:
Keywords: algorithms; document image processing; experiments framework; signal detection theory; testing; text line segmentation
Mesh:
Year: 2011 PMID: 22164106 PMCID: PMC3231474 DOI: 10.3390/s110908782
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1.Schematic procedure of the experiments framework.
Figure 2.Multi-line straight text: (a) Reference line definition. (b) Text over reference line. (c) English text. (d) Bengali text.
Figure 3.Multi-line waved text: (a) Reference line definition. (b) Text over reference line. (c) English text. (d) Bengali text.
Figure 4.Multi-line fractured text: (a) Reference line definition. (b) Text over reference line. (c) English text. (d) Bengali text.
Figure 5.Multi-line handwritten text fragments: (a) Serbian Latin text. (b) Serbian Cyrillic text. (c) Cyrillic text. (d) English text.
Figure 6.Text line segmentation: (a) Original text. (b) Original text with reference objects. (c) Correctly segmented text lines. (d) Over-segmentation text lines. (e) Under-segmentation text lines. (f) Text lines with mutually inserted words from different text lines.
Confusion matrix.
| Reality on Signal | Yes | No |
| Present | Hit ( | Miss ( |
| Absent | False Alarm ( | Correct Rejection ( |
Figure 7.Illustration of the water flow algorithm in direction from left to right (black regions represent text objects i.e., three I letters).
Figure 8.Text line segmentation water flow algorithm involving water flow angle α: (a) initial text containing three I letters. (b) unwetted areas made by water flow from left to right. (c) unwetted areas made by water flow from right to left. (d) united unwetted areas.
Figure 9.Water flow algorithm applied to the text sample.
Figure 10.Algorithm based on anisotropic Gaussian kernel applied to the text sample.
Multi-line straight text segmentation test results.
| 84 | 68 | 60 | |
| 12 | 28 | 36 | |
| 0 | 0 | 0 | |
| 0 | 0 | 0 |
Multi-line handwritten text segmentation test results.
| 144 | 96 | 88 | |
| 76 | 124 | 132 | |
| 0 | 0 | 0 | |
| 0 | 0 | 0 |
Multi-line straight text segmentation test results.
| 87.50 | 70.83 | 62.50 | |
| 12.50 | 29.17 | 37.50 | |
| 0.00 | 0.00 | 0.00 | |
| 0.00 | 0.00 | 0.00 | |
| 0.50 | 0.65 | 0.79 |
Multi-line handwritten text segmentation test results.
| 65.45 | 43.64 | 40.00 | |
| 34.55 | 56.36 | 60.00 | |
| 0.00 | 0.00 | 0.00 | |
| 0.00 | 0.00 | 0.00 | |
| 0.078 | 0.141 | 0.167 |
Multi-line straight text segmentation test results.
| 87.50 | 70.83 | 62.50 | |
| 100.00 | 100.00 | 100.00 | |
| 93.33 | 82.93 | 76.92 |
Multi-line handwritten text segmentation test results.
| 65.45 | 43.64 | 40.00 | |
| 100.00 | 100.00 | 100.00 | |
| 79.12 | 60.76 | 57.14 |
Multi-line straight text segmentation test results.
| 78 | 88 | 92 | 92 | 82 | 70 | 78 | 62 | 56 | |
| 18 | 6 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | |
| 0 | 2 | 2 | 4 | 14 | 26 | 18 | 34 | 40 | |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Multi-line handwritten text segmentation test results.
| 12 | 24 | 64 | 72 | 88 | 128 | 84 | 132 | 124 | |
| 208 | 196 | 156 | 148 | 132 | 86 | 136 | 76 | 72 | |
| 0 | 0 | 0 | 0 | 0 | 6 | 0 | 12 | 24 | |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Multi-line straight text segmentation test results.
| 81.25 | 91.67 | 95.83 | 95.83 | 85.42 | 72.92 | 81.25 | 64.58 | 58.33 | |
| 18.75 | 6.25 | 2.08 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 0.00 | 2.08 | 2.08 | 4.17 | 14.58 | 27.08 | 18.75 | 35.42 | 41.67 | |
| 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 0.61 | 0.29 | 0.20 | 0.20 | 0.38 | 0.52 | 0.43 | 0.60 | 0.65 |
Multi-line handwritten text segmentation test results.
| 5.45 | 10.91 | 29.09 | 32.73 | 40.00 | 58.18 | 38.18 | 60.00 | 56.36 | |
| 94.55 | 89.09 | 70.91 | 67.27 | 60.00 | 39.09 | 61.82 | 34.55 | 32.73 | |
| 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 2.73 | 0.00 | 5.45 | 10.91 | |
| 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 0.763 | 0.442 | 0.266 | 0.237 | 0.178 | 0.118 | 0.202 | 0.102 | 0.125 |
Multi-line waved text segmentation test results.
| 0.00 | 0.00 | 6.25 | 6.25 | 62.50 | 95.83 | 58.33 | 100.00 | 100.00 | |
| 100.00 | 100.00 | 93.75 | 93.75 | 37.50 | 4.17 | 41.67 | 0.00 | 0.00 | |
| 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 3.49 | 3.11 | 2.46 | 2.61 | 0.66 | 0.20 | 0.85 | 0.00 | 0.00 |
Multi-line straight text segmentation test results.
| 81.25 | 93.62 | 97.87 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | |
| 100.00 | 97.78 | 97.87 | 95.83 | 85.42 | 72.92 | 81.25 | 64.58 | 58.33 | |
| 89.66 | 95.65 | 97.87 | 97.87 | 92.13 | 84.34 | 89.66 | 78.48 | 73.68 |
Multi-line handwritten text segmentation test results.
| 5.45 | 10.91 | 29.09 | 32.73 | 40.00 | 59.81 | 38.18 | 63.46 | 63.27 | |
| 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 95.52 | 100.00 | 91.67 | 83.78 | |
| 10.34 | 19.67 | 45.07 | 49.32 | 57.14 | 73.56 | 55.26 | 75.00 | 72.09 |
Comparative results for SLHR (%) measurement (α is the algorithm parameter).
| 87.50 | 72.92 | 87.50 | 65.45 | |
| 70.83 | 64.58 | 85.42 | 43.64 | |
| 62.50 | 47.92 | 77.08 | 40.00 |
Comparative results for RMSE measurement (α is the algorithm parameter).
| 0.50 | 0.52 | 0.35 | 0.08 | |
| 0.65 | 0.78 | 0.38 | 0.14 | |
| 0.79 | 0.88 | 0.69 | 0.17 |
Comparative results for SLHR (%) in favor of α.
| ≤14° | ≤12° | ≤14° | <12° | |
| ≤14° | ≤12° | ≤14° | <12° | |
| ≤12° | ≤10° | ≤14° | - | |
| ≤10° | − | ≤12° | - | |
| − | − | − | - |
Comparative results for SLHR (%) measurement (K and λ are the parameter pair).
| 85.42 | 62.50 | 75.00 | 40.00 | |
| 72.92 | 95.83 | 87.50 | 58.18 | |
| 64.58 | 100.00 | 83.33 | 60.00 | |
| 58.33 | 100.00 | 81.25 | 56.36 |
Comparative results for RMSE measurement (K and λ are the parameter pair).
| 0.38 | 0.66 | 0.61 | 0.178 | |
| 0.52 | 0.20 | 0.35 | 0.118 | |
| 0.60 | 0.00 | 0.41 | 0.102 | |
| 0.65 | 0.00 | 0.43 | 0.125 |
Comparative results for SLHR (%) in favor of pair (K, λ).
| (8,4), (8,5), (10,4), (10,5) | (8,4), (8,5), (10,4), (10,5) | (8,4), (8,5), (10,4), (10,5) | (8,5), (10,4), (10,5) | |
| (8,4), (8,5), (10,4) | (8,4), (8,5), (10,4), (10,5) | (8,4), (8,5), (10,4), (10,5) | (10,4) | |
| (8,4), (8,5) | (8,5), (10,4), (10,5) | (8,4), (8,5), (10,4), (10,5) | − | |
| (8,4) | (8,5), (10,4), (10,5) | (8,5), (10,4), (10,5) | − | |
| − | (8,5), (10,4), (10,5) | − | − |
Comparative algorithms results for SLHR (%) measure.
| 87.50 | 72.92 | 87.50 | 65.45 | |
| 64.58 | 100.00 | 83.33 | 60.00 |
Comparative algorithms results for RMSE measure.
| 0.50 | 0.52 | 0.35 | 0.08 | |
| 0.60 | 0.00 | 0.41 | 0.102 |
Comparative algorithms results for OSLHR (%) measure.
| 12.50 | 14.58 | 2.08 | 34.55 | |
| 0.00 | 0.00 | 0.00 | 34.55 |
Comparative algorithms results for USLHR (%) measure.
| 0.00 | 12.50 | 10.42 | 0.00 | |
| 35.42 | 0.00 | 16.67 | 5.45 |
Figure 11.SLHR (%) comparison between testing algorithms.
Comparative algorithms results for precision.
| 87.50 | 72.92 | 92.67 | 65.45 | |
| 100.00 | 100.00 | 100.00 | 63.46 |
Comparative algorithms results for f-measure.
Comparative algorithms results for recall.
Figure 12.F-measure comparison between testing algorithms.
Text line segmentation test results (Example #1).
| 33.33 | 33.33 | |
| 66.66 | 66.66 | |
| 0.00 | 0.00 | |
| 0.00 | 0.00 | |
| 1.20 | 0.47 |
Text line segmentation test results (Example #1).
| 33.33 | 33.33 | |
| 100.00 | 100.00 | |
| 50.00 | 50.00 |
Text line segmentation test results (Example #2).
| 66.66 | 33.33 | |
| 0.00 | 0.00 | |
| 33.33 | 0.0 | |
| 0.00 | 66.66 | |
| 0.58 | 0.82 |
Text line segmentation test results (Example #2).
| 100.00 | 100.00 | |
| 66.66 | 33.33 | |
| 80.00 | 50.00 |
Multi-line waved text segmentation test results.
| 70 | 62 | 46 | |
| 14 | 32 | 50 | |
| 12 | 2 | 0 | |
| 0 | 0 | 0 |
Multi-line fractured text segmentation test results.
| 84 | 82 | 74 | |
| 2 | 6 | 20 | |
| 10 | 8 | 2 | |
| 0 | 0 | 0 |
Multi-line waved text segmentation test results.
| 72.92 | 64.58 | 47.92 | |
| 14.58 | 33.33 | 52.08 | |
| 12.50 | 2.08 | 0.00 | |
| 0.00 | 0.00 | 0.00 | |
| 0.52 | 0.78 | 0.88 |
Multi-line fractured text segmentation test results.
| 87.50 | 85.42 | 77.08 | |
| 2.08 | 6.25 | 20.83 | |
| 10.42 | 8.33 | 2.08 | |
| 0.00 | 0.00 | 0.00 | |
| 0.35 | 0.38 | 0.69 |
Multi-line waved text segmentation test results.
| 72.92 | 65.96 | 47.92 | |
| 85.37 | 96.88 | 100.00 | |
| 84.34 | 78.48 | 64.79 |
Multi-line fractured text segmentation test results.
| 97.67 | 93.18 | 78.72 | |
| 89.36 | 91.11 | 97.37 | |
| 93.33 | 92.13 | 87.06 |
Multi-line waved text segmentation test results.
| 0 | 0 | 6 | 6 | 60 | 92 | 56 | 96 | 96 | |
| 96 | 96 | 90 | 90 | 36 | 4 | 40 | 0 | 0 | |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Multi-line fractured text segmentation test results.
| 0 | 0 | 0 | 6 | 72 | 84 | 54 | 80 | 78 | |
| 94 | 92 | 92 | 86 | 16 | 0 | 32 | 0 | 0 | |
| 2 | 4 | 4 | 4 | 8 | 12 | 10 | 16 | 18 | |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Multi-line fractured text segmentation test results.
| 0.00 | 0.00 | 0.00 | 6.25 | 75.00 | 87.50 | 56.25 | 83.33 | 81.25 | |
| 97.92 | 95.83 | 95.83 | 89.58 | 16.67 | 0.00 | 33.33 | 0.00 | 0.00 | |
| 2.08 | 4.17 | 4.17 | 4.17 | 8.33 | 12.50 | 10.42 | 16.67 | 18.75 | |
| 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 4.07 | 4.01 | 3.18 | 3.42 | 0.61 | 0.35 | 1.34 | 0.41 | 0.43 |
Multi-line waved text segmentation test results.
| 0.00 | 0.00 | 6.25 | 6.25 | 62.50 | 95.83 | 58.33 | 100.00 | 100.00 | |
| − | − | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | |
| − | − | 11.76 | 11.76 | 76.92 | 97.87 | 73.68 | 100.00 | 100.00 |
Multi-line fractured text segmentation test results.
| 0.00 | 0.00 | 0.00 | 6.52 | 81.82 | 100.00 | 62.79 | 100.00 | 100.00 | |
| 0.00 | 0.00 | 0.00 | 60.00 | 90.00 | 87.50 | 84.38 | 83.33 | 81.25 | |
| − | − | − | 11.76 | 85.71 | 93.33 | 72.00 | 90.91 | 89.66 |
Comparative results for OSLHR (%) measurement (α is the algorithm parameter).
| 12.50 | 14.58 | 2.08 | 34.55 | |
| 29.17 | 33.33 | 6.25 | 56.36 | |
| 37.50 | 52.08 | 20.83 | 60.00 |
Comparative results for USLHR (%) measurement (α is the algorithm parameter).
| 0.00 | 12.50 | 10.42 | 0.00 | |
| 0.00 | 2.08 | 8.33 | 0.00 | |
| 0.00 | 0.00 | 2.08 | 0.00 |
Comparative results for OSLHR (%) measurement (K and λ are the parameter pair).
| 0.00 | 37.50 | 16.67 | 60.00 | |
| 0.00 | 4.17 | 0.00 | 39.09 | |
| 0.00 | 0.00 | 0.00 | 34.55 | |
| 0.00 | 0.00 | 0.00 | 32.73 |
Comparative results for USLHR (%) measurement (K and λ are the parameter pair).
| 14.58 | 0.00 | 8.33 | 0.00 | |
| 27.08 | 0.00 | 12.50 | 2.73 | |
| 35.42 | 0.00 | 16.67 | 5.45 | |
| 41.67 | 0.00 | 18.75 | 10.91 |