Difficulty in Predicting OCR Accuracy

Next: Problem Description Up: Introduction Previous: Needs for Estimating

Difficulty in Predicting OCR Accuracy

The task of predicting OCR accuracy is very complex. Furthermore, to the best of the author's knowledge, no previous work has been done in this area; therefore, no reference titles can be given.

OCR algorithms seem to be affected by a myriad of different problems. However, the following three general ``problem groups'' can be identified:

Image defects constitute the bulk of the problems associated with OCR algorithms (see Chapter 2 of this thesis). Therefore, the focus of this work is on the detection of image problems. By better understanding image defects and subsequently implementing OCR algorithms that are sensitive to these type of problems, it could be possible to achieve acceptable accuracy ranges (95%-98%, [4]) for most printed pages. To achieve near perfect (99.5%-100%) recognition, however, typographical as well as linguistic problems would have to be addressed.

Next: Problem Description Up: Introduction Previous: Needs for Estimating