Justification for Using Simple Features
Next: Feature Selection Process
Up: Classifier Design
Previous: Classifier Design
In this project, only simple image features are used to design
the classifier. The reasons behind this constraint are as follows:
- Cost. The classifier will act as a filter for
pre-processing pages in a large-scale OCR production
environment. Therefore, the filter must be fast and not become the
bottleneck of the system. By restricting the features to only simple
measurements, the resulting speed will be adequate.
- Independence from OCR Technology. The focus of this
research is to be able to determine ``image defects'' instead of
``character recognition'' defects. Ideally, the set of features used
by the page quality clasifier would be orthogonal to those used by OCR
algorithms. Using only simple metrics as features guarantees that the
classifier will not be mimicking an OCR device, since much more
complex features are required for this later purpose.
- No Previous Work. Since there has been no previous work
in this area and no previous approaches to this problem, simple
features are chosen to understand the classifier's behavior as much
as possible, and thus lay the grounds for future research in the
area.