Justification for Using Simple Features

In this project, only simple image features are used to design the classifier. The reasons behind this constraint are as follows:

Cost. The classifier will act as a filter for pre-processing pages in a large-scale OCR production environment. Therefore, the filter must be fast and not become the bottleneck of the system. By restricting the features to only simple measurements, the resulting speed will be adequate.
Independence from OCR Technology. The focus of this research is to be able to determine ``image defects'' instead of ``character recognition'' defects. Ideally, the set of features used by the page quality clasifier would be orthogonal to those used by OCR algorithms. Using only simple metrics as features guarantees that the classifier will not be mimicking an OCR device, since much more complex features are required for this later purpose.
No Previous Work. Since there has been no previous work in this area and no previous approaches to this problem, simple features are chosen to understand the classifier's behavior as much as possible, and thus lay the grounds for future research in the area.