Friday, November 6, 2009

Why is Vowel Reordering required?

Indic scripts have the concept of vowel signs. The peculiarity of these vowel signs with respect to OCR is that sometimes consonant + vowel sign = a glyph where the consonant comes later and the vowel sign first.
Here I present just one simple example.
That is (in Bengali): ক + ে = কে

Now when we OCR কে , the OCR engine first encounters the vowel sign (ে without the dotted circle) and then the consonant ক. It then tries to do a string concatenation of the two characters seen in order, and it ends up producing this as the output: েক .
Since the OCR engine makes the same mistake all the time, its easy to write scripts which can move every such vowel sign to the appropriate place. This improves the OCR accuracy drastically.

No comments:

Post a Comment