There are three things that could be called a module in GOCR, so here's a thorough specification:
Module type | Function | Examples |
imageLoader | Loads an image. | Load images. There can be only oneimage loader. |
imageFilter | Filter the image. | Dust removal, etc. |
blockFinder | Find blocks, i.e., groups of similar dataand add information of its content. | Find pictures, find columns of text,find mathematical expressions. |
charFinder | Frame characters, and add informationof its content. | Frame characters, font recognition. |
charRecognizer | Recognize the framed characters. | Italic, bold, greek specialiazed OCR. |
contextCorrection | Try to recognize the still unrecognized characters. | Spell checker, ligature checker. |
outputFormatter | Output data to some format and file. | HTML output, LATEX output. |
All of the modules (except imageLoader) may be composed of several different functions, which may be in different module packages. The following sections explain how to load modules, set their order, and run them.