Any editor may be used to construct or modify a table, since it is a regular UNIX file, and this 'direct editing' method is occasionally used, especially for small amounts of data. However, avoid using an editor that destroys TAB characters.
To generate a new table the best plan (and usually the safest one) is to first generate a template file, then convert it to table format and add the rows of data. Any convenient editor may be used to generate a template file. To convert it to a table the command 'maketable' may be used, which will produce an empty table. Next use the operator 'edittable' to enter rows of data.
A typical template file is shown below:
# These are lines of table documentation. They can be of any length,
# and any number of such lines may exist.
# Each line must start correctly, e.g with "#", surrounded by any
# number of spaces and/or TABs.
Name Name of item
Type Type: 1,2,3,7,9,A,B,X
Count Number of items
K Constant modifier
SS7 Special status for type 7
Size In Kilobytes
The above template file contains the necessary elements to describe a table of six columns: table documentation (the comment lines that each start with a sharp sign '#'), column name ("Name", "Type", "Count", ...), and column documentation for each column (the text at the end of each column line).
To build the final table header, use the command:
maketable < table.tpl
where table.tpl
is the template file described above.
The command will produce the correct table header to STDOUT,
(that can be redirected to a file as usual) :
Name Type Count K SS7 Size
---- ---- ----- - --- ----
I will now explain how NoSQL tables can be modified, both manually and with automated programs.
Basically there are two ways to manually modify an existing table: either direct editing with a TAB-conscious text editor, like vi(1), or by using the 'edittable' command. The latter method is recommended, especially in a multi-user environment where multiple concurrent editing sessions may occur against the same table file.
The 'edittable' utility can be used to add new rows, change or delete existing rows of data in a table. The optional '--list' switch tells 'edittable' to convert the table to 'list' format for editing, which makes working with wide tables much more comfortable. The editor called by 'edittable' defaults to vi, or to whatever is set in the environment variable EDITOR.
After editing, the table is automatically checked for validity by 'edittable', using 'istable'. If structure errors are detected, the program prompts you for what to do (re-edit, quit, etc.).
One thing to point out here is that the only way to update a table, both manually and with a program, is to capture the SDTOUT stream of an operator and write the results back to a table file, paying attention to not overwriting the original input file while it is being read (familiarity with the Unix shell is expected from the user). For instance, you have better not do:
operator < table > table
or you will end up with 'table' being zero'ed! The correct procedure is:
operator < table > table.tmp
mv table.tmp table
At the moment NoSQL does not provide a record-level update facility. This may be inefficient if you need to do frequent changes to a large table, for instance with a CGI program behind a Web server, as the whole table needs to be rewitten upon each change. But don't be disappointed by this. There are ways to circumvent this apparently major limitation of the NoSQL paradigm.
The obvious "trick" is to try and keep your tables small. Silly a suggestion as it may sound, before laughing at it please have a look at section 2.9 of the included 4gl.ps or 4gl.txt paper. Remember that NoSQL works with UNIX, not in addition to it. The underlying UNIX file system, that most commercial databases tend to disregard, if used creatively can provide an extremely powerful way of pre-organizing your tables (relations) efficiently and keep them small (where small means roughly within a few hundred kilobytes in size).
If you really must do frequent modifications to a big indexed table, then you can still do it efficiently by applying your changes to a separate file rather than to the actual table, and merging the changes back into the big table (as well as re-build its indices) only every now and again, with a batch job that can be run in the background, overnight or when the system activity is low. The following example will try to explain this better.
Suppose we have the large indexed table bigtable.rdb
and
we need to insert/change/delete one or more records. What we can
do is:
bigtable.tjb
, with exactly the same header as
bigtable
, but containing only those records that we want
to insert/update into or remove from bigtable.rdb
.
The entries in bigtable.tjb
will have to be in a format
suitable for the 'update' operator.bigtable.rdb
we will have
to do it in three logical steps. The first step is to
use 'search' on
bigtable
to take advantage of the indices. This will
produce an intermediate output that will then have to be merged
into bigtable.tjb
, and the final output will undergo the
original query statement again.bigtable.rdb
will rather be done
to bigtable.tjb
, with the sintax described in the
documentation of the 'update' operator. Unlike bigtable.rdb
,
where records must be kept sorted on the primary key field (i.e.
on the leftmost table column), the records in
bigtable.rdb
do not need to be sorted in any particular
order.This may seem complicated, but it isn't much, really.
Say bigtable.rdb
contains two columns, SSN (for Social Security No.) and SURNAME, where
the unique
table key is on SSN, and we have built a secondary index on the SURNAME
field. If we want to extract all the rows which SURNAME field is
equal to "Smith", the query necessary to take advantage of a fast
search method on bigtable.rdb
and to account for the
presence of bigtable.tjb
is:
printf 'SURNAME\n-------\nSmith\n' |
search --index bigtable.rdb.x.SURNAME | update bigtable.tjb |
row 'SURNAME=="Smith"'
As you can see, the trick is:
bigtable.rdb
to quickly
obtain a much smaller (possibly empty) subset of its data.bigtable.tjb
on-the-fly during the query.As shown, these logical steps can be performed in one single line of UNIX shell code!
The need to concatenate tables comes up every so often and although it is simple to do it may not be obvious. The UNIX 'cat' command can not be used as it would result in duplicating the header and thus make an invalid table. And of course, only tables with the same header, i.e. tables that are said to be ``union-compatible'', should be concatenated, otherwise an invalid table would result (in this case it could be a gross inconsistency if the number of columns were different). If we have two tables, TABA and TABB, then to concatenate TABB onto the end of TABA we use the command:
union TABA TABB