Next: charRecognizer
Up: charFinder
Previous: Delimiting characters
  Contents
  Index
Setting attributes
Setting attributes of the text can get quite complicated if you want
to be fancy. It was decided to design a very simple, yet powerful
system, that should be able to handle most of the stuff you ever need.
First, a reminding note: these attributes should only be those that
are applied directly to the text, such as bold, italic, font type,
etc.
As usual in GOCR, the first thing to do is to create the attribute:
-
- int gocr_charAttributeRegister ( char *name,
-
- gocrCharAttributeType t, char *format );
- name
- attribute name; must be unique. We recommend to use capital
letters, but it's up to you.
- type
- there are two possible values:
- SETTABLE
- the attribute works like a flag: either it's set, or not
set. Example: boldness.
- UNTIL_OVERRIDEN
- the attribute is valid for ever; you can only change
it's values. Example: font. There must always be a font type and size,
but they may change during the text.
- format
- this field is used to store any attributes of the attribute
(wow). It will be explained below, with a example.
As usual, the function returns 0 if OK, -1 if error (inserting an
existant attribute is considered an error). Now that you created your
attributes, you are processing the text and find that you need to
set an attribute. Do it with the following function:This
function name may be changed.
-
- int gocr_charAttributeInsert ( char *name, ... );
- name
- attribute name.
I bet you are probably wondering how the hell this stuff works. Me
too. Uh, I mean, it's easier to understand using an example. The first
one is simple:
-
- gocr_charAttributeRegister("BOLD", SETTABLE, NULL);
gocr_charAttributeInsert("BOLD");
/* insert some text */
gocr_charAttributeInsert("BOLD");
Quite easy: first you register the bold style. It's a settable attribute,
and since you don't need any extra information, the format
field is NULL. Then, when processing the text, you find a word in
bold. What you do is simple: insert a bold, insert the text, insert
another bold. Since it's a settable attribute, the second one cancels
the effect.
Let's do something fancier now:
-
- gocr_charAttributeRegister("FONT", UNTIL_OVERRIDEN, "%s %d");
gocr_charAttributeInsert("FONT", "Arial", 18);
/* insert some text */
gocr_charAttributeInsert("FONT", "TimesNewRoman", 12);
/* insert some more text */
Now the explanation of the format field: it's just a printf-like
format field! So, you can save whatever you want in a format that
will be easily read by anybody, even if they do not know what it means
-- this is specially good when you are writing a outputFormatter
module. When you insert the attribute, you pass the arguments to the
format string. So, what happens in the example: we create an attribute
``FONT'', which is valid for ever. Note that, although it's valid
for ever, it only starts to have effect when you first call gocr_charAttributeInsert,
because you need to set its internal attributes (even if it doesn't
have any). In the example, you are parsing a page, and finds that
the title is typeset in Arial, size 18. The text in in Times New Roman,
size 12.
Always remember that this system is subject to all the limitations
of printf and scanf. For example: in scanf, %s reads a string up
to the first white space, so you can't use spaces in a %s string,
even though printf accepts it. And, since GOCR does not check the
format string, if you screw it up you are screwing everything.
Next: charRecognizer
Up: charFinder
Previous: Delimiting characters
  Contents
  Index
root
2002-02-17