next up previous contents index
Next: charRecognizer Up: charFinder Previous: Delimiting characters   Contents   Index


Setting attributes

Setting attributes of the text can get quite complicated if you want to be fancy. It was decided to design a very simple, yet powerful system, that should be able to handle most of the stuff you ever need. First, a reminding note: these attributes should only be those that are applied directly to the text, such as bold, italic, font type, etc.

As usual in GOCR, the first thing to do is to create the attribute:

int gocr_charAttributeRegister ( char *name, 
gocrCharAttributeType t, char *format );
name
attribute name; must be unique. We recommend to use capital letters, but it's up to you.
type
there are two possible values:

SETTABLE
the attribute works like a flag: either it's set, or not set. Example: boldness.
UNTIL_OVERRIDEN
the attribute is valid for ever; you can only change it's values. Example: font. There must always be a font type and size, but they may change during the text.
format
this field is used to store any attributes of the attribute (wow). It will be explained below, with a example.
As usual, the function returns 0 if OK, -1 if error (inserting an existant attribute is considered an error). Now that you created your attributes, you are processing the text and find that you need to set an attribute. Do it with the following function:This function name may be changed.

int gocr_charAttributeInsert ( char *name, ... ); 
name
attribute name.
I bet you are probably wondering how the hell this stuff works. Me too. Uh, I mean, it's easier to understand using an example. The first one is simple:

gocr_charAttributeRegister("BOLD", SETTABLE, NULL);

gocr_charAttributeInsert("BOLD");

/* insert some text */

gocr_charAttributeInsert("BOLD");

Quite easy: first you register the bold style. It's a settable attribute, and since you don't need any extra information, the format field is NULL. Then, when processing the text, you find a word in bold. What you do is simple: insert a bold, insert the text, insert another bold. Since it's a settable attribute, the second one cancels the effect.

Let's do something fancier now:

gocr_charAttributeRegister("FONT", UNTIL_OVERRIDEN, "%s %d");

gocr_charAttributeInsert("FONT", "Arial", 18);

/* insert some text */

gocr_charAttributeInsert("FONT", "TimesNewRoman", 12);

/* insert some more text */

Now the explanation of the format field: it's just a printf-like format field! So, you can save whatever you want in a format that will be easily read by anybody, even if they do not know what it means -- this is specially good when you are writing a outputFormatter module. When you insert the attribute, you pass the arguments to the format string. So, what happens in the example: we create an attribute ``FONT'', which is valid for ever. Note that, although it's valid for ever, it only starts to have effect when you first call gocr_charAttributeInsert, because you need to set its internal attributes (even if it doesn't have any). In the example, you are parsing a page, and finds that the title is typeset in Arial, size 18. The text in in Times New Roman, size 12.

Always remember that this system is subject to all the limitations of printf and scanf. For example: in scanf, %s reads a string up to the first white space, so you can't use spaces in a %s string, even though printf accepts it. And, since GOCR does not check the format string, if you screw it up you are screwing everything.


next up previous contents index
Next: charRecognizer Up: charFinder Previous: Delimiting characters   Contents   Index
root 2002-02-17