next up previous contents index
Next: Finding blocks Up: blockFinder Previous: blockFinder   Contents   Index


Block types

Besides finding each block, you should try to recognize what kind of information that block carries. This will make the work of subsequent modules much easier, and will improve the speed of the processing.

GOCR automatically defines three types of blocks:



Block type
TEXT
PICTURE
MATH_EXPRESSION



but you can define new types, as explained below. The default is TEXT.

The block types are objects, which all derive from a common parent, gocrBlock. This allows any module to access the block, regardless of its type. This is what allows you to create new block types on the fly. To do that, you must first define the struct of your new block type, which must be in the following format:

struct newblocktype {
gocrBlock b;

/* other fields */

};
It's absolutely necessary that the first field of your structure be gocrBlock b. This is what allows to cast your structure to a simple gocrBlock (If you are wondering why the hell I didn't use C++ instead of C, these are the reasons: it's easier to use C from C++ than the opposite; I have much more experience with C than C++; there are several people that program in C but not in C++; the use of C as an OO language, although slightly obfuscated, has proven to be possible and used in successful projects, such as GTK; C++ name mangling makes it more difficult to write modules, and is not supported yet by libtool).

You must register your block type, to make GOCR aware of its existance. To do that, use the following function:

blockType gocr_blockTypeRegister ( char *name );
This function takes the name of your new block type, registers it, and returns a non negative number, which is the block type id, or -1 if some error occurred. This id should be saved, to provide a quick way to check what is the block type. Alternatively, you can use:

blockType gocr_blockTypeGetByName ( char *name );
which returns the id of a already registered block type, or -1 if none was found. Since this function is kind of slow, as it must compare the string given to every other block type name registered, it's a good idea to save the id in a variable. Last, a convenience:

const char *gocr_blockTypeGetNameByType ( gocrblockType t );
given the block type, returns its name. Do not free this string.


next up previous contents index
Next: Finding blocks Up: blockFinder Previous: blockFinder   Contents   Index
root 2002-02-17