N measurements defined of a particular technique are combined to create an N

dimensional vector. When classifying a normalized version of the glyph, some distance

between its feature vector and the vectors for all known characters in the defined alphabet

is calculated. This value is used to determine which alphabet character the glyphs

examined is “closest” to.

Mori, Nishida et el (1999) present two examples of statistical methods in their work. The

first is called the cross-correlation method. The feature being looked at in this case is a

projection of the glyph in the domain of the image defined by f(x,y)/| f | for all x and y in

the domain. The function f represents the glyph its image space and | f | is the magnitude

(or norm) of f. The cosine of the angle between f and known alphabet glyph g can be

found when the projection of f(x,y)/ | f | on g(x,y) / | g | is calculated. The following

equation captures this relationship.

Figure 25: Projection Equation

f •g

cos θ =

| f || g |

76

where f • g is the inner product of f and g

This leads to the following equation used to implement the cross-correlation method.

Figure 26: Equation For Cross-Correlation

«« f ( x, y) g ( x, y)dxdy

S( f ) = R

«« f ( x, y) «« g ( x, y)

2 2

dxdy dxdy

R R

where R is the domain of the glyph™s image space

f(x,y) is a normalized representation of the glyph being analysed and is equal to 1

if it represents a text character, 0 otherwise

g(x,y) is a glyph of a known character and is equal to 1 if it represents a text

character, 0 otherwise

The function S obeys the relationship 0 ¤ S ¤ 1 since all values of f and g are positive.

As f(x,y) approaches g(x,y) the function S approaches 1 so S is dependent on the

Euclidean distance between f and g. In order to consider the glyph represented by f to be

considered the character represented by g, S(f) must exceed a defined threshold and be the

77

maximum value of S(f) for all possible glyphs g in the alphabet. The cross-correlation

method is an example of a global threshold technique. In these types of methods

character classification decisions are made based on information from the whole glyph.

Global methods provide high recognition rates.

The second technique presented by Mori, Nishida et el (1999) is based on Boolean

algebra. Assuming that the glyph being analysed has been normalized, a sample of pixels

can be analysed and compared with the glyphs of known alphabet characters in question.

The following equation provides a solution to the logical method.

For the given character A,

A =W ©B

where

«n

W = ¬ ‘ f (i, j ) g w (i, j ) ¤ TW ·

¬ ·

«m

B = ¬ ‘ f (i, j ) g b (i, j ) ≥ TB ·

¬ ·

TB and TW are threshold values for text and background pixels respectively.

78

f(i,j) is a function which represents the glyph being evaluated. It is equal to 1 if the pixel

in row i and column j in the glyph is a text pixel, -1 otherwise.