next up previous contents index
Next: Getting block information Up: Modules API Previous: Final considerations   Contents   Index


charFinder

This module should parse each block and frame every character found. It should also provide information about the character, such as if it's bold or italic, the font, etc. This information is used by the charRecognizer module functions to quickly check if they will be able to recognize the character or will just waste processing time. Prototype:

int gocr_charFinder ( gocrBlock *b, void *v );
In more detail, what should happen in this module is in this pseudo code:

sweep the block

for each character {

find pertinent pixels

find pertinent attributes

}

return 0

The function should return 0 if it took care of the block, -1 otherwise (for example, you don't recognize the block type).

The way you sweep the block is completely on yourself, and but it must be done in a way that the outputFormatter module will understand. It makes sense. at least when parsing text, to sweep as one would read it (which means that you are not stuck to left to right, top to bottom languages). GOCR saves the characters in the order you add them. Talk about how charRecognizer will receive the data and add to a linked list, etc. Add some way to override this default behaviour of adding characters to the list



Subsections
next up previous contents index
Next: Getting block information Up: Modules API Previous: Final considerations   Contents   Index
root 2002-02-17