Statistical Pattern Recognition
Next: Features Observed but
Up: Future Work
Previous: Future Work
Instead of a heuristic approach, a more traditional statistical
approach could be used. Issues to explore in such a case would be:
- More training data. In this research, only 24 pages were
used as the training dataset. In order to implement a statistical
classifier, more data is needed for the training and designing phases.
- Risk concept. The introduction of the risk concept can be
included and the thresholds updated according to the related
specifications. This approach would be the correct way to handle the
two different misclassifications (B G and
G B) with two different weights. In this thesis, the
BG misclassification has been determined to be of higher
risk than the other type of classification; this determination has
driven the design of the classifier. However, in order to cope with
higher classifier accuracy, a better risk model is required.
- Confidence. Instead of producing just a single binary
output, it would be interesting to provide degrees of confidence to
the classifier responses. A response of ``Good'' or ``Bad'' would be
followed by a degree of ``certainty''.
- Quantizied output. Instead of reporting ``good'' or
``bad'', the degree of image defects for each would be more
convenient. In such a model, a rate of 0 would mean, for instance,
``no defect'' while a rate of 1 would be ``severe defect''. Rates for
touchiness and brokeness should be reported separately as they could
both be present within some pages.