man_db-2.3.x - the database cached manual pager suite
Graeme W. Wilford <eep2gw@ee.surrey.ac.uk>

This document describes the setup, maintenance and use of a generic online manual page system with special reference to the man_db-2.3.x package and it's advanced features.

UNIX is a registered trademark of the X/Open Company, Ltd.
NFS is a registered trademark of Sun Microsystems, Inc.
PostScript is a registered trademark of Adobe in the United States.

The general conventions used throughout this manual include file names and paths in italic, eg. /usr/man. variable strings (usually path components) enclosed within <> and in italic, eg. <sec>, program names in bold, eg. man. commands that can be typed at a shell prompt in a eg. environment variables denoted as follows: $ENV_VAR


Copyright © 1995 Graeme W. Wilford

Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies.

Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a notice identical to this one.

Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the copyright holder.
.

Introduction

   
.

man_db-2.3.x

 

man_db-2.3.x is a package that is designed to provide users with online information in a fast and friendly manner while at the same time offering flexibility to the system administrator.

It is made up of several user programs:
	       man	- an interface to the on-line reference manuals
	       whatis	- search the manual page names
	       apropos	- search the manual page names and descriptions
	       manpath	- determine search path for manual pages
several maintenance programs:
	       mandb	- create or update the manual page index caches
	       catman	- create or update the pre-formatted manual pages
and a special pre-formatter that knows about compressed manual pages
	       zsoelim	- satisfy .so requests in roff input
In addition to these compiled programs, there are two shell scripts, mkcatdirs and checkman in the tools subdirectory. These scripts aid the creation of cat directories and check for duplicated manual pages, respectively.

The following manual pages are provided with this package to explain correct format and usage. man(1), whatis(1), apropos(1), manpath(1), manpath(5), mandb(8), catman(8) and zsoelim(1). .

The concept

 

man_db-2.3.x originally started out life as program suite man-1.1B, written by John W. Eaton <jwe@che.utexas.edu> and maintained by Rik Faith <faith@cs.unc.edu> to which support proposed by the newly formed FSSTND committee regarding cat directories was added.

Since then, man_db-2.3.x's most innovative feature: the database cache scheme[1] has been significantly developed. The basic idea was to reduce manual page search times to a minimum. The following piece of text is included from the man_db-2.2 distribution:

The theory: If you go to a library to take a book out, what do you do?

a) Go and look where it might be on a micro-fiche/terminal, take a look where it is supposed to be on the shelf, and then go look at the new arrivals if it's not where it's supposed to be?

OR

b) Start at one end of the ground floor, look along every bookshelf until you've completed that floor, then go up a level and start again until you've found what you're looking for?

Since then the database index scheme has evolved greatly. Every manual page and stray cat page on the system is registered in an index database cache which stores various details about the file including the timestamp, the location and the whatis[2] information. This information is kept up to date by man which looks for filesystem changes each time it is invoked. .

The manual page system

 

The simplest manual page system will have a single manual page hierarchy. This will typically be

beneath which will be several subdirectories of the form man<sec> where <sec> is 1, 2, 3, 4, 5, 6, 7 or 8. These are referred to as sections of the manual. Others may exist and they are not restricted to single character names. eg.

is a valid section subdirectory. Other common sections include 9, n, l, p and o.

Within these section subdirectories reside the manual pages themselves. Their filenames follow the pattern

where in most cases <ext> is an empty string. An example is manual page cp

which resides in section 1 and has no special extension.

.

Sections of the manual

 

The manual is split up into sections to ease access and to cater for manual pages that share the same name. It is common for a program and function to share the same name. kill is a good example. This is both a program which can be used to send a process a signal and an operating system call with similar functionality. Their manual pages are stored under sections 1 and 2 respectively. Thus, sections are used to separate out the program manual pages from the function manual pages and so on. The table below shows the section numbers of the manual followed by the types of pages they contain.

+---------+------------------------------------------------------+
| Section |		   Section contents			 |
+---------+------------------------------------------------------+
|    1    | user executable programs or shell commands		 |
|    2    | system calls (functions provided by the kernel)	 |
|    3    | library calls (functions within system libraries)	 |
|    4    | special files (usually found in /dev)		 |
|    5    | file formats and conventions eg. /etc/passwd	 |
|    6    | games						 |
|    7    | macro packages and conventions eg. man(7), groff(7). |
|    8    | system administration commands			 |
|    9    | kernel routines [Non standard]			 |
|    n    | new [obsolete]					 |
|    l    | local [obsolete]					 |
|    p    | public [obsolete]					 |
|    o    | old [obsolete]					 |
+---------+------------------------------------------------------+
.

The format of manual pages

 

The format in which manual pages are stored is NROFF/TROFF or more generally ROFF. This is a typesetter style language[3] which requires formatting before being viewed. In fact some manual pages require pre-format processing to correctly format tables or equations.

If the page is to be viewed on screen in a text environment, NROFF is used as the primary formatter. If the page is to be printed or displayed in a graphical environment, TROFF is used. Traditionally, TROFF formatted files for a C/A/T (Computer aided Typesetter) which is now obsolete. The GNU ROFF (GROFF[4]) suite of programs offer a choice of output types including X, dvi and postscript. When configuring man_db-2.3.x, the preference is to use GROFF rather than TROFF. .

Arguments to configure

 

To allow the configuration program, configure, to be non-interactive, it can be passed various options to alter the default settings. Generic configure options are discussed in docs/INSTALL. Options that are specific to the man_db-2.3.x package are described below.

--enable-debug
By default, the configuration process creates production quality Makefiles. This option, which takes no argument, changes certain values to aid in debugging man_db-2.3.x. It does not alter the physical behaviour of any of the programs.
--enable-setuid[=ARG]
By default, man will be installed as a setuid program to user man. Use this option with an argument to change the setuid owner.
--disable-setuid
Use this option to install man as a non-setuid program and to change the default cat and database files' access flags to allow users to modify them.
--with-device=DEVICE
Use this flag to alter the default output device used by NROFF. DEVICE is passed to NROFF with the -T option. configure will test that NROFF will run with the supplied device argument.
--with-db=LIBRARY
configure will look for database interface libraries in the order Berkeley DB, gdbm and finally ndbm and will #define appropriate variables relative to the first one found. To override the built in order on platforms having a choice of interface library, use this option to specify which library to use. .

The specifics of Sections

   
.

Package specific manual page sections

 

The use of package specific manual page sections is discouraged as packages large enough to warrant their own section probably contain manual pages that span other sections. An example might be package foo that has it's own section

which contains manual pages describing it's programs, the library routines it offers and the format of several of its configuration files. These pages would normally be allocated to sections 1, 3 and 5 respectively and thus combining them all under section foo is misleading. Subtle problems will arise if there are any base name-space clashes with standard manual pages, eg. exit(3), exit(foo) and the order in which they should be shown.

There are two standard solutions to this problem.

1
Create a separate manual page hierarchy for the package's manual pages such as
/usr/local/packages/foo/man
2
Install the pages in their relevant sections, with a unique extension appended to the filename such that
/usr/man/manfoo/exit.foo
would instead be installed as
/usr/man/man1/exit.1foo

Only (2) offers a complete solution to manual page ordering problems and allows users to access the desired page directly. .

Selecting a section type

 
.

Specifying a section

 

This is done via use of the section argument to man

will look for exit.1* in section 1 of the manual. If exit.1 exists, it will be displayed in preference to exit.1foo

will look for exit.1foo* in section 1 of the manual. The asterisk (*) represents a wild-card of any type or length, including length zero.

For an argument to be interpreted as a section name rather than a page name, it must either begin with a digit, or be included in the standard section list. The default section list is defined in include/manconfig.h to be 1, n, l, 8, 3, 2, 5, 4, 9, 6 and 7. This should be modified in order and content to meet the local conventions.

Every subdirectory section name in the entire system must be in the list, including sections found in imported manual page hierarchies. The order is important because in normal operation, man will only display the first manual page it finds that meets the search criteria. Using the --all argument will cause man to attempt to display all manual pages that meet the criteria. See man(1) for further information.

Having an excess of sections listed will not slow man down. .

Specifying an extension

 

If the section is unknown, but the package extension is, it is possible to use the extension argument

to search in all sections for manual pages named exit from package foo. .

Filesystem structure

   
.

Manual page hierarchies

 

It is often common for manual page systems to have more than one manual page hierarchy. Indeed one of the systems I use has the following globally accessible hierarchies

A full system $MANPATH would be a colon separated list of these directories The order is important, and is observed by man_db's search algorithms. The order is very much related to the users $PATH environment variable, and should be set on a per user basis, or not set at all. If a user's $PATH causes

to be executed in preference to

it is essential that

displays the manual page located within

rather than within

To ensure correct order, the program manpath may be used to set the $MANPATH environment variable. See manpath(1) and manpath(5) for details. .

Setting the MANPATH

 

If using a Bourne style login shell such as bash, ksh, or zsh, the commands

can be added to $HOME/.profile

If using a C style login shell such as csh or tcsh, the commands

can be added to $HOME/.login

N.B. $PATH must be set prior to using manpath. The setting of $MANPATH is actually unnecessary as the man_db-2.3.x utilities will dynamically determine the manpath if $MANPATH is unset. .

Determination of the internal manpath

 

All man_db utilities, manpath included, will use the user's $MANPATH environment variable if set and not equal to "". Otherwise the user's $PATH environment variable is queried. If this is unset or is set to "", the determined manpath will simply be any

elements defined in the man_db config file.

Assuming that a $PATH exists, each path element it contains is scanned for in the config file. If found, the relative manpath element is appended to the internal manpath. However, if the element is not mentioned in the config file, a man directory relative to it will be sought. The subdirectories ../man or man relative to the path component are appended to the internal manpath if they exist. Finally, the internal manpath is stripped of duplicate paths before being processed by the NLS and `Other OS' routines. These may add to or modify the separate path elements giving priority to NLS manual pages or add OS-relative manpaths. .

Other OS's manual pages

 

It is common to have collections of heterogeneous computer systems linked together in a network. In some circumstances[5] it is advantageous to be able to access the manual pages of these other systems directly from your system. This feature is known as alternate system support. The accepted way to setup this support is to NFS mount the respective systems' manual page hierarchies under the native manual page hierarchies. An example:

		+--------+-----------------------+
		| System | Manual page hierarchy |
		+--------+-----------------------+
		|<local> | /usr/man              |
		| newOS  | /usr/man/newOS        |
		| userix | /usr/man/userix       |
		|<local> | /usr/local/man        |
		| newOS  | /usr/local/man/newOS  |
		| userix | /usr/local/man/userix |
		+--------+-----------------------+
Rather than have multiple NFS mounts from a single machine, this may be accomplished by NFS mounting

somewhere on the local system and using symbolic links within the manual hierarchies. To access these alternate systems using man use the -m option, eg.

would provide manual pages showing the structure of /etc/passwd on systems userix and newOS in that order. A manual page would not be displayed about the local systems conventions. Please read the relevant man_db utility's manual page for further and more specific information. .

NLS manual pages

 

NLS manual pages should be put in NLS subdirectories of a standard manual page hierarchy. A table illustrating the concept is reproduced from the ``Linux Filesystem Structure''[6] (FSSTND) manual from which further information may be obtained.

+----------+----------------+-----------------+----------------------+
| Language | Territory      | Character Set   | Directory            |
+----------+----------------+-----------------+----------------------+
| English  | --             | ASCII           | /usr/man/en          |
| English  | United Kingdom | ASCII           | /usr/man/en_GB       |
| English  | United States  | ASCII           | /usr/man/en_US       |
| French   | Canada         | ISO 8859-1      | /usr/man/fr_CA       |
| French   | France         | ISO 8859-1      | /usr/man/fr_FR       |
| German   | Germany        | ISO 646         | /usr/man/de_DE.646   |
| German   | Germany        | ISO 6937        | /usr/man/de_DE.6937  |
| German   | Germany        | ISO 8859-1      | /usr/man/de_DE.88591 |
| German   | Switzerland    | ISO 646         | /usr/man/de_CH.646   |
| Japanese | Japan          | JIS             | /usr/man/ja_JP.jis   |
| Japanese | Japan          | SJIS            | /usr/man/ja_JP.sjis  |
| Japanese | Japan          | UJIS (or EUC-J) | /usr/man/ja_JP.ujis  |
+----------+----------------+-----------------+----------------------+
Each of these directories are then interpreted as manual page hierarchies themselves and may contain the usual section subdirectories. Access to NLS manual pages is achieved via use of the setlocale(3) function which queries user environment variables to determine the current locale. Internally to the man_db utilities, this locale string is appended to each manpath element and the resultant NLS manpath element is searched before the standard manpath element. In this way, an NLS manual page that matches the search criteria will be shown before or in place of the standard American English page.

If a user's $MANPATH consists of or is determined as

and their locale is set to de_DE, the command

would produce the following internal man_db manpath elements

foobar would be searched for in the order of manual page hierarchies listed. .

ISO 8859-1 (latin1) manual pages

 

By default NROFF will format manual pages into a form suitable for a typewriter style device, e.g. a terminal screen. GNU NROFF is capable[7] of formatting ROFF into a form suitable for 8-bit latin1 capable output devices. To enable output for such a device, give the option

--with-device=DEVICE

to configure where DEVICE is the suitable and supported output format, in this case latin1. .

Displaying latin1 characters on a Linux virtual terminal

 

To enable console based viewing of latin1 characters on a Linux system, you must have the kbd[8] package installed. The following commands included within an initialisation file such as /etc/rc.d/rc.local will enable the display of latin1 fonts on the first 5 virtual terminals.
---< part of /etc/rc.d/rc.local >---
# sort out the vt font
if [ -x /bin/setfont ]; then
/bin/setfont /etc/kbd/consolefonts/lat1-16.psf
fi

# load the keymap transformation to do when activating new font
if [ -x /bin/mapscrn ]; then
/bin/mapscrn /etc/kbd/consoletrans/trivial
fi

# enable new font
for t in 1 2 3 4 5; do
echo -n -e "\033(K" > /dev/tty$t
done
---< part of /etc/rc.d/rc.local >---

For display under the ``X Window System'', a suitable 8 bit clean terminal emulator is required. .

Viewing ASCII pages formatted for latin1 output device

 

When formatting an ASCII manual page for a latin1 output device, GNU NROFF will take advantage of the extra characters available and will always produce a text page containing some latin1 (8-bit) symbols. The table[9] below, taken from man(1) illustrates the differences.

       +---------------------+-------+------------+-------+
       | Description         | Octal | ISO 8859-1 | ASCII |
       +---------------------+-------+------------+-------+
       | continuation hyphen |  255  |     ­      |   -   |
       | bullet (middle dot) |  267  |     ·      |   o   |
       | acute accent        |  264  |     ´      |   '   |
       | multiplication sign |  327  |     ×      |   x   |
       +---------------------+-------+------------+-------+
To display such symbols on a 7 bit terminal or terminal emulator, they must be translated back into standard ASCII. The -7 option with man will enable this simple reverse translation.

This option may be useful if your site has both 7 and 8-bit capable output devices and nroff is using the latin1 output device to format manual pages. .

Cat pages

 

It has become standard practice to store the formatted manual pages on disk so that subsequent requests for the manual page do not have to involve the formatting process. These pre-formatted manual pages are known as cat pages. Although cat pages require additional disk storage requirements, they provide a substantial speed increase and their use is recommended.

The automatic support of storing and using cat pages is brought about by simply creating suitable directories for them. .

Cat page hierarchies

 

Traditionally, cat pages were stored under the same manual hierarchy as their source manual pages, in cat<sec> subdirectories rather than man<sec>. This situation is rather limiting in several situations

To avoid all of these problems simultaneously, it was decided to support local cat page directory caches. .

Local cat page directory caches

 

Any location for cat page hierarchy may be specified in the man_db configuration file. The location of the database cache associated with each manual page hierarchy will always be at the root of the cat page hierarchy. By default, the cat page hierarchy shadows the manual page hierarchy. The FSSTND proposes /var/catman as the location for such directories although man_db-2.3.x allows any directory hierarchy to be used. The FSSTND path transformation rule is as follows

should be formatted into the cat file

where the <locale> directory component may be missing and <ext> may be an empty string.

The suggestion is that stray cats are located in the traditional hierarchy under /usr whereas re-creatable cat pages are stored under the local writable hierarchy /var/catman. man follows strict rules in determining which file is displayed.

As an example, the following route is taken if all three files exist.

1
Check relative time stamps of the manual file and the traditional cat file. If the cat file is up to date (has a more recent time stamp), display it.
2
The traditional cat file is out of date. Check relative time stamps of the manual file and the alternate cat file. If the cat file is up to date, display it.
3
The alternate cat file is out of date. Format the manual file and display the result in the foreground, while updating the alternate cat file in the background. .

Compression

   
.

Compressed manual pages

 

It is possible to maintain a system of compressed manual pages. The use of this feature is not recommended for systems that have adequate disk space to store their manual pages uncompressed. Subsequent decompression of these manual pages will cause several bottlenecks in the formatting process.

Presently, the compression extension/decompressor pairs must be known at compile time although any number may be defined and used. The following structure is predefined in man_db-2.3.x

                   +----------+--------------+
                   |Extension | Decompressor |
                   +----------+--------------+
                   |   gz     | gzip -dc     |
                   |   z      | gzip -dc     |
                   |   Z      | compress -dc |
                   +----------+--------------+
It is a relatively easy operation to include further pairs in this structure. See include/comp_src.h for details and an example.

Support for compressed manual pages is compiled into the man_db-2.3.x utilities by default. To completely disable this support, edit include/config.h and comment out the following line

#define COMP_SRC 1

This will enable a minor speed increase, but note that support for stray cats with any compression extension other than the default will also be disabled. .

Compressed cat pages

 

man_db-2.3.x compresses cat files by default. During configuration, configure will try to find gzip and if so, all cat files produced by man will be compressed with

and have a .gz extension appended. If gzip is not found,

is used as the compressor and the extension .Z is appended.

To store cat files in an uncompressed state and to disable compressed extension processing completely, edit include/config.h and comment out the following line

#define COMP_CAT 1

.

Stray cats

 

Normally, man will only look for cat files with the default compression extension. The default compression extension is dependent on the default compressor and may be an empty string if the support for compressed cats is disabled.

It is possible for a system to be supplied with stray cat files located in the traditional cat page hierarchy. To make matters worse, they may have compression extensions other than the default and reside on read-only media. In such circumstances, stray cat files will be accepted with any compression extension that is also supported for manual pages.

This special treatment of stray cat pages is removed if support for compressed manual pages is turned off or not available. .

Formatting

   

As already pointed out in the introduction, there are two primary formatters common to UNIX: NROFF and TROFF.

In the following sections, I will use the term TROFF to describe the typesetter formatter and NROFF to describe the typewriter formatter. The term ROFF will be used to describe a generic formatter. .

GROFF

 

If using the GROFF package, there is a further choice, GROFF itself. Essentially, GROFF forms a pipeline of processors including TROFF and an output processor which translates the ditroff produced by TROFF into the appropriate output format. The default output format, or device, for GROFF is PostScript. Anything else must be specified using the device argument. To illustrate GROFF, the command

will form the following pipeline

If GROFF is tied to man's -T option, it is still possible for man to produce ditroff via use of the -Z option.

In GROFF 1.09, NROFF is bundled as a shell script that calls GROFF, which in turn calls TROFF with the default options -Wall -mtty-char -Tascii, passing the result through grotty before it finally reaches the screen.

It is imperative that the script does not pass pre-processing options to GROFF command line as man takes care of this separately. The file tools/nroff_script may be used as a basis for an NROFF script if your system is without one. .

Devices

 

Both NROFF and GROFF may allow output device selection. As mentioned previously, classic NROFF produces output suitable for a typewriter device, classic TROFF produces output suitable for a C/A/T and GROFF produces output suitable for a PostScript interpreting device. .

Macros

 

There are several ROFF macros in existence that are suitable for manual pages. Unfortunately, they tend to be incompatible with each other.

During configuration, configure will attempt to determine a suitable macro for the local system's manual page collection. It attempts to use NROFF with the following three macro packages:

        +---------------+----------------+---------------+
        | macro package | macro filename | nroff command |
        +---------------+----------------+---------------+
        | andoc         | tmac.andoc     | nroff -mandoc |
        | an            | tmac.an        | nroff -man    |
        | doc           | tmac.doc       | nroff -mdoc   |
        +---------------+----------------+---------------+
The first that succeeds is used. Macro andoc is suitable for manual pages written using either an or doc macro commands, but not a combination of both.

.

Pre-format processors (pre-processors)

 

Manual pages may require pre-processing by any of the following

                +---------+----+------------------+
                | Program | ID | Pre-processes    |
                +---------+----+------------------+
                | eqn     | e  | equations        |
                | tbl     | t  | tables           |
                | grap    | g  | graphs           |
                | pic     | p  | pictures         |
                | refer   | r  | A bibliography   |
                | vgrind  | v  | program listings |
                +---------+----+------------------+
It is possible to assign a default pre-processor list that all manual pages will be passed through prior to the primary formatter. By default, this is empty. To define a default list, edit include/manconfig.h and un-comment the following line

/* #define DEFAULT_MANROFFSEQ "t" */

which will enable tbl processing by default. To change the list, replace the t with a suitable string of processor ID's.

Pre-process options may be provided at run time in various forms, but in general the pre-processors required by each manual page is indicated in the first line of the manual page itself. See man(1) for details. .

Format scripts

 

It is very likely that alternate systems manual pages may require non-standard macro packages or possibly even special pre-processors. To tackle such problems, special format scripts may be created on a per manual hierarchy basis.

If the file

exists and is executable, it is expected to be able to correctly format a manual page originating from <manual_hierarchy> to its standard output. It will be supplied with either two or three arguments:

+ manual page filename
+ pre-processor string
+ ouput device (optional)
Similarly, if the option -T<device> or -t was supplied to man and the file

exists and is executable, it will be used in the same way.

An example of such a script, supplied by Markus Armbruster <armbru@pond.sub.org>, who provided support for external formatter scripts, can be found as tools/mandb_[nt]fmt

The script can be used as both a NROFF and TROFF/GROFF format script and can be installed as mandb_nfmt and hard linked to mandb_tfmt after modification appropriate for your particular site. .

The index database caches

   

As mentioned in the introduction, man_db-2.3.x uses database lookups to search for manual page locations and information. When performing a manual page lookup or a basic whatis search, the databases are searched in

mode and are as fast as the underlying databases can be. When performing apropos or special whatis searches, the databases are searched in a linear way, which although far more expensive than keyed lookup, is no worse than traditional text based file searching. .

index database location

 

The databases are always located at the root of the cat page hierarchy, whether this is the same as the manual page hierarchy or not. As file locking mechanisms are employed to ensure that concurrent processes do not update a database simultaneously, it is almost imperative that the databases reside on a local filesystem since file locking across NFS filesystems may be unavailable or flaky. To avoid such problems, man can be compiled without database maintenance support. See the section titled "Modes of operation" for details. .

Manual hierarchies with no index database

 

It is possible for the man_db-2.3.x utilities to operate without aid from an index database. Under such circumstances, search methods will resort to file globbing and whatis type searches are performed on any traditional whatis text databases that may exist. Only the traditional cat hierarchy is searched for cat files. .

User manual page hierarchies

 

A user may have any number of personal manual page hierarchies listed in their $MANPATH. By default, man will maintain mandb created databases at the root of user manual page hierarchies. The definition of a user manual hierarchy is that it does not have an entry in the man_db configuration file. See manpath(5) for details. .

Contents of an index database

 

There are four kinds of entry in an index database.

1
A direct entry regarding a particular manual page. Manual pages that are unique in terms of name use just a single entry in the database and can be looked up by simply using the name as the key.
2
A common name index entry that lists the extensions of all of the manual pages sharing the common index entry name. Manual pages that share common names, but have differing extensions each have a single database entry, but this time they are looked up with a key comprised of their name and their extension. The entire set of common named pages also has an common name index entry that informs of the extensions available.
3
An indirect entry that has a pointer to the real entry. Manual pages that are whatis references to a particular page do not physically exist so they have a pointer to the entry containing the location of the real manual page.
4
Special identification entries. There are two special key names, ``$mtime$'' that references an integer describing the last modification time of the database and ``$version$'' that identifies the database storage scheme version.

In the following entries, the character ``|'' will be used to separate the fields. In reality a tab is used. Direct and indirect entries takes the form:

Common name index entries take the form:

and common name direct or indirect entries take the form:

where in each case the filename being represented is formed as

in the case of a manual page, or

in the case of a stray cat.

If any of the fields would be empty, a single ``-'' is stored in its place. <comp> represents the compression extension. <mtime> is an integer representing the last modification time of the manual page, <ref> points to the entry containing the location of the real page and <ID> is one of the following identification letters.

+---+------------+--------------------------------------------------------+
|ID | #define    | Description                                            |
+---+------------+--------------------------------------------------------+
| A | ULT_MAN    | ultimate manual page, the full source nroff file       |
| B | SO_MAN     | manual page containing a .so request to an ULT_MAN     |
| C | WHATIS_MAN | virtual whatis referenced page pointing to an ULT_MAN  |
| D | STRAY_CAT  | cat page with no source manual page                    |
| E | WHATIS_CAT | virtual whatis referenced page pointing to a STRAY_CAT |
+---+------------+--------------------------------------------------------+
The ID illustrates the precedence. Some types of manual page can be referenced by several means, e.g. .so requested and whatis referred. In such a case, only one reference must be stored in the database, the precedence level decides which. .

Favouring stray cats

 

With the above rules of precedence, it is possible for a valid stray cat page to be replaced by a whatis referred page sharing identical name-space.

If you would like to see the stray cat page kill(1) instead of the bash_builtins(1) page referenced by kill(1) edit include/manconfig.h and un-comment the following line

/* #define FAVOUR_STRAYCATS */ .

Accessdb

 

A simple program, accessdb is included with man_db-2.3.x. It will output the data contained within a man_db database in a human readable form. By default, it will dump the data from /var/catman/index.<db-type>, where <db-type> is dependent on the database library in use.

Supplying an argument to accessdb will override this default. Tabs are replaced in the output by a tilde ``~'' in the key field and a single space in the content field

accessdb is not compiled by default. Type

in the src directory to compile it. .

Example database

 

As an example of both accessdb and the database storage method, the output of

after first running

from the top level build directory is included below.

$mtime$ -> "795987034"
$version$ -> "2.3.1"
apropos -> "1 1 795981542 A - - search the manual page names and descriptions"
catman -> "8 8 795981544 A - - create or update the pre-formatted manual pages"
man -> "1 1 795981542 A - - an interface to the on-line reference manuals"
mandb -> "8 8 795981544 A - - create or update the manual page index caches"
manpath -> " 1 5"
manpath~1 -> "1 1 795981542 A - - determine search path for manual pages"
manpath~5 -> "5 5 795981543 A - - format of the /etc/man_db.config file"
whatis -> "1 1 795981543 A - - search the manual page names"
zsoelim -> "1 1 795981543 A - - satisfy .so requests in roff input"
.

Database types

 

man_db-2.3.x has support for various low level database libraries commonly in use today. The interfaces to the libraries are known as ndbm (UNIX) gdbm (GNU) btree (Berkeley DB)

man_db-2.3.x currently does not hold more than one database open at any time, so dbm (UNIX)

support could be added in the future. .

limitations

 

The general differences and limitations are best compared in a table.

+------+-------------+----------+-----------------+--------------+-----------+
|      |             |  File    | Content memory  |  Concurrent  |           |
|Name  |    Type     |          +---------+-------+              | Shareable |
|      |             |  name    |  type   | limit |    access    |           |
+------+-------------+----------+---------+-------+--------------+-----------+
|ndbm  | hash        | index    | static  |  1Kb  |     none     |    no     |
|gdbm  | hash        | index.db | dynamic |   -   | file locking |    no     |
|btree | binary tree | index.bt | static  |   -   |     none     |    yes    |
+------+-------------+----------+---------+-------+--------------+-----------+
Those types that have no built in concurrent access strategy, are provided with flock(2) based file locking by man_db-2.3.x.

As btree is noticeably faster when doing man searches, mainly due to the fast initialization of the databases, it is the preferred library interface. configure will look for btree, gdbm and then finally ndbm routines when configuring man_db-2.3.x. .

Sharing databases in a heterogeneous environment

 
It may be necessary or advantageous to share databases across platforms, regardless of the potential file locking problems.

An example would be a user having a personal manual page hierarchy in an NFS based home directory environment, whereby the home directory is held on and mounted from a single machine in a heterogeneous network.

In this context, the database cache will have the same name and reside in the same place on all machines. There are at least two ways to deal with this problem. Hack the include/manconfig.h file on each platform to provide a unique database name for each system. No databases will be shared. Install and use the Berkeley DB database interface library on each platform. These databases can be shared across big-endian/little-endian platforms although a database created on a big-endian platform will suffer a small access penalty when used by a litle-endian machine and vice-versa. .

Miscellaneous

   
.

Modes of operation

 

The man_db utilities can operate in many different modes, allowing varying degrees of freedom, functionality and security. No mode requires that the manual page hierarchies be writable.

(1) Default mode
man is setuid to the user MAN_OWNER which is `man' by default and is changeable via options to configure. mandb, if run by the superuser or MAN_OWNER, creates globally accessible index databases owned by MAN_OWNER. Once the databases are created, man will update entries in them if it finds newly installed manual pages or delete entries if manual pages are removed. In this mode it is possible for a malicious man user to deliberately lock a database as a writer, thus denying read access to other users.
If cat directories exist and have the correct permissions, man will take care of producing cat files. These will be owned by MAN_OWNER. The default permissions of both cat files and databases are 0644.
(2) No man database updates
This mode also requires man to be setuid, but is favoured where databases must be shared in an environment unfriendly to kernel locking procedures, eg. NFS. It also prevents possible `denial of service' attacks by malicious man users as man never opens the databases as a writer in this mode. To replace the functionality lost by disallowing man write access to the databases, mandb must be rerun whenever new manual pages are installed. Failure to do so will result in man being unable to find and display the newly added manual pages. As mandb lacks the ability to delete database entries for manual pages that have been removed, it is necessary to use the --create flag whenever manual pages are removed from the filesystem. Each index database may be owned by an arbitrary user who will have subsequent write access to the database. Cat files are created in the same way as for mode (1) above.
To use the man_db utilities in this mode, edit include/manconfig.h and comment out `#define MAN_DB_UPDATES'
(3) No man database updates or cat production
man is installed not setuid. This mode of operation probably offers the highest level of security but it requires higher levels of maintenance than other modes due to the restrictions imposed upon man. Each database is owned by an arbitrary user as in mode (2). Each cat hierarchy is also owned by an arbitrary user who is responsible for creating cat files using catman whenever new manual files are installed. man will be completely passive in it's action, ie. no index databases will be written to and no cat files are ever produced.
To use the man_db utilities in this mode, supply the option `--disable-setuid' to configure and edit include/manconfig.h, commenting out `#define MAN_DB_UPDATES' and `#define MAN_CATS' after running configure.
(4) Wide open
man is installed not setuid. This mode is similar in operation to the majority of vendor supplied, non setuid, cat file supporting manual pager suites. It is not recommended. The databases are owned by an arbitrary user who maintains them using mandb. man does not update the databases. Cat files are produced and stored in world writable cat directories and have world write access themselves.
To use the man_db utilities in this mode, supply the option `--disable-setuid' to configure, edit include/manconfig.h and comment out `#define MAN_DB_UPDATES' and change the definition of CATMODE from 0644 to 0666.

Other variations can also be used. In fact it is possible for man to actually create index databases, usually the job of mandb, for users private manual page hierarchies. This is enabled by editing include/manconfig.h and un-commenting the `/* #define MAN_DB_CREATES */' line. man_db-2.2 operated in this manner.

In summary, include/manconfig.h contains definitions for MAN_DB_CREATES MAN_DB_UPDATES MAN_CATS CATMODE DBMODE

and the setuid installation and operation of man is modified by supplying either of the following options to configure: --enable-setuid=USER --disable-setuid .

NFS root squash

 

If man is installed setuid to an arbitrary user and is run by root, instead of gaining the effective user id of the setuid user, man is run with both uid and euid as root. This is neccesary due to infelicities with the POSIX setuid() function call: All users except root may change to and from the effective (setuid) user, however once root has setuid(user), there is no way back.

A side effect of this is that NFS mounted cat hierarchies or databases will be unwritable if the following conditions exist: man/catman/mandb is run by root The NFS mount has the root squash flag set

To get around this problem, the root user must first attain the ID of the cat hierarchy or database owner before running man/catman/mandb whenever the databases need updating or cat files are to be produced. .

NLS message catalogues

 

man_db-2.3.x has built in support for native language message catalogues. That is, it can issue messages in the locale of the users choice. This will only occur if the locale's translation has been written. Before undertaking a translation, please contact the author who will be maintaining a list of such activity.

Currently, the following translations exist de_DE.88591 .

Credits

 

I would like to thank the following people for their time, effort, support, ideas and code which went into man_db-2.3.x

Markus Armbruster <armbru@pond.sub.org>
Lionel Cons & colleages <cons@dxcern.cern.ch>
Carl Edman <cedman@princeton.edu>
Caleb Epstein <epstein_caleb@jpmorgan.com>
Lars Fenneberg <lf@gimli.comlink.de>
Zoltan Hidvegi <hzoli@cs.elte.hu>
Nils Magnus <magnus@unix-ag.uni-kl.de>
Daniel Quinlan <quinlan@yggdrasil.com>


Glossary

manual page
A file containing descriptions related to the use of a function or program or the structure of a file. The name of the file is formed from the title of the manual page followed by a period followed by the name of the section that it resides in, optionally followed by an extension. The format of the file is NROFF and may be compressed, having a suitable compression extension appended.
cat page
A formatted manual page suitable for viewing on a vt100-type terminal.
stray cat page
A cat page that does not have a relative manual page on the system, ie. only the cat page was supplied or the manual page was removed after the cat page had been created.
section
Each manual page or cat page hierarchy is divided into sections, each section having it's own directory. manual page hierarchy section names begin with `man' and cat page sections with `cat'.
extension
A package may provide manual pages with filenames ending in a package-specific extension name. This allows manual pages with the same title to coexist in the same manual page hierarchy and section without sharing the same filename. It also provides a further mechanism for man to select the correct manual page.
manual page hierarchy
A directory tree divided into manual page sections, each containing a collection of manual pages.
cat page hierarchy
A directory tree divided into cat page sections, each containing a collection of cat pages.
traditional cat page hierarchy
The same location as the manual page hierarchy.
alternate cat page hierarchy
A separate location to that of the traditional cat page hierarchy.
traditional cat page
A cat page located in a traditional cat page hierarchy.
alternate cat page
A cat page located in an alternate cat page hierarchy.

Contents


1. Introduction
1.1 man_db-2.3.x
1.1.1 The concept
1.2 The manual page system
1.3 Sections of the manual
1.4 The format of manual pages
1.5 Arguments to configure


2. The specifics of Sections
2.1 Package specific manual page sections
2.2 Selecting a section type
2.2.1 Specifying a section
2.2.2 Specifying an extension


3. Filesystem structure
3.1 Manual page hierarchies
3.2 Setting the MANPATH
3.3 Determination of the internal manpath
3.4 Other OS's manual pages
3.5 NLS manual pages
3.5.1 ISO 8859-1 (latin1) manual pages
3.5.2 Displaying latin1 characters on a Linux virtual terminal
3.5.3 Viewing ASCII pages formatted for latin1 output device
3.6 Cat pages
3.7 Cat page hierarchies
3.8 Local cat page directory caches


4. Compression
4.1 Compressed manual pages
4.2 Compressed cat pages
4.2.1 Stray cats


5. Formatting
5.1 GROFF
5.2 Devices
5.3 Macros
5.4 Pre-format processors (pre-processors)
5.5 Format scripts


6. The index database caches
6.1 index database location
6.1.1 Manual hierarchies with no index database
6.1.2 User manual page hierarchies
6.2 Contents of an index database
6.2.1 Favouring stray cats
6.2.2 Accessdb
6.2.3 Example database
6.3 Database types
6.4 limitations
6.5 Sharing databases in a heterogeneous environment


7. Miscellaneous
7.1 Modes of operation
7.2 NFS root squash
7.3 NLS message catalogues
7.4 Credits

Footnotes


[1] originally conceived after observing the actions of the perl based manual pager suite, man-pl written by Tom Christiansen <tchrist@convex.com>
[2] one line description of the manual page
[3] similar in some aspects to TeX
[4] Written and maintained by James Clark <jjc@jclark.com>
[5] writing portable software instantly comes to mind
[6] written and maintained by Daniel Quinlan <quinlan@yggdrasil.com>
[7] see nroff(5) for the output device formats available with your NROFF
[8] written and maintained by Andries Brouwer <aeb@cwi.nl>. Version 0.90 and above does not require the use of mapscrn as illustrated in the script.
[9] The ISO 8859-1 and ASCII columns of this table will be identical if this manual was formatted for an ASCII based typewriter display, i.e. using NROFF in it's native mode.
[10] cat files that have no source manual page, i.e. they cannot be recreated.
[11] ndbm databases are physically represented by two files: index.dir and index.pag but are referred to simply as index by the interface routines.