Assumptions



Next: Description of the Up: Problem Description Previous: Problem Description

Assumptions

In order to limit the scope of this project, the research has been limited to solving the problem for a subset of the documents that are normally processed by OCR devices. However, the subset selected is a major portion of the usual OCR pages and the results of this work can be extended to handle a much more varied set of pages.

The type of pages the classifier will be designed for have the following characteristics:

This work will consider a page to be ``Good'' if its median OCR-accuracy (calculated from a set of accuracies from different OCR devices) is equal to or higher than 90%. Conversely, a page will be labeled ``Bad'' if its accuracy falls below this 90%threshold.