Step one was to get the maatraa clipping code into Tesseract, which has happened. We still have the following issues to resolve before we can have excellent recognition rates:
We need to split the following glyphs into separate consonant and vowel signs.
1) Consonant + descending vowel sign
2) Consonant + ascending vowel sign
In summary we need to be able to do the following transformation before sending the image to Tesseract: