The boundaries of the broken chars zone were defined as shown in Figure 3.11, where the percentage values are taken over the value of the reference point on that axis.
This zone is subdivided into cells at a rate of one cell per pixel in each direction and connected components are allocated to the cells according to their width and height.
Figure 3.11: Broken Chars Zone Coordinates Definition
From the observations on the training data, a broken chars zone 70%or more filled is a very strong indicator of the prescence of too many broken characters in the page, and thus poor OCR accuracy.