Next: Accessing text
Up: Modules API
Previous: Attributes again
  Contents
  Index
contextCorrection
After everything, there will remain some characters that weren't recognized,
and it's the task of this module to recognize them. These characters
can be divided in three groups3.6:
- merged characters. Due to imperfections of the original text, two
or more characters ended touching it other, and should be separated.
Ligatures may fall in this group too.
- unsupported characters. There's not much to do with these; they just
are not supported by any of the modules.
- unrecognizable characters. Bad printing, bad scanning or some accident
with the original document could have rendered some of the characters
unrecognizable. They can be recognized by using some filter and reprocessing,
or to use the context.
So, these are the issues you must consider.
Subsections
root
2002-02-17