Needs for Estimating Page Quality
Next: Difficulty in Predicting
Up: Introduction
Previous: Optical Character Recognition
Estimating page quality for any given image would be beneficial for
several applications:
- Controlling adaptive image processing for OCR. The
existence of a way to automatically evaluate the quality of any given
image would be essential for an adaptive image-enhancement
algorithm. The algorithm would iteratively produce an image to be
graded by the page quality estimator, which in turn would feedback the
noise type or the degree of noise present in the image to the adaptive
algorithm to generate the next (better) iteration of the image.
- Adaptive OCR algorithms. An image quality estimator would
be essential to the operation of adaptive OCR algorithms since it
could set the parameters for the OCR engine according to the quality
of the page that it is about to process (Figure 1.3).
Figure 1.3: Adaptive OCR Algorithms Architecture
- Reducing rekeying costs. As will be shown in this work,
page/image quality is a direct cause of OCR errors. Therefore,
estimating page quality can also provide an estimation of OCR
accuracy. The minimum acceptable OCR accuracy for large-scale OCR
operations is in the range of 95%-98% [4]. Correcting the
errors on a page with less than 95%accuracy is more costly than
retyping the page from scratch. A hypothetical ``OCR-accuracy
estimator'' would act as a filter, classifying pages and filtering out
those that would be better off rekeyed manually. In large-scale OCR
environments, such a filter would represent substantial cost savings,
since often the whole process is automated and the cost of manually
rekeying a page after it has been processed implies disrupting the
normal flow of the entire system.