Next Previous Contents

5. Generating or modifing NoSQL tables

5.1 Generating new tables

Any editor may be used to construct or modify a table, since it is a regular UNIX file, and this 'direct editing' method is occasionally used, especially for small amounts of data. However, avoid using an editor that destroys TAB characters.

To generate a new table the best plan (and usually the safest one) is to first generate a template file, then convert it to table format and add the rows of data. Any convenient editor may be used to generate a template file. To convert it to a table the command 'maketable' may be used, which will produce an empty table. Next use the operator 'edittable' to enter rows of data.

A typical template file is shown below:


      # These are lines of table documentation. They can be of any length,
      # and any number of such lines may exist.
      # Each line must start correctly, e.g with "#", surrounded by any
      # number of spaces and/or TABs.

      Name      Name of item
      Type      Type: 1,2,3,7,9,A,B,X
      Count     Number of items
      K         Constant modifier
      SS7       Special status for type 7
      Size      In Kilobytes

    

The above template file contains the necessary elements to describe a table of six columns: table documentation (the comment lines that each start with a sharp sign '#'), column name ("Name", "Type", "Count", ...), and column documentation for each column (the text at the end of each column line).

To build the final table header, use the command:

      maketable < table.tpl
    

where table.tpl is the template file described above. The command will produce the correct table header to STDOUT, (that can be redirected to a file as usual) :

                Name    Type    Count   K   SS7 Size
                ----    ----    -----   -   --- ----
    

5.2 Modifying existing tables

I will now explain how NoSQL tables can be modified, both manually and with automated programs.

Basically there are two ways to manually modify an existing table: either direct editing with a TAB-conscious text editor, like vi(1), or by using the 'edittable' command. The latter method is recommended, especially in a multi-user environment where multiple concurrent editing sessions may occur against the same table file.

The 'edittable' utility can be used to add new rows, change or delete existing rows of data in a table. The optional '--list' switch tells 'edittable' to convert the table to 'list' format for editing, which makes working with wide tables much more comfortable. The editor called by 'edittable' defaults to vi, or to whatever is set in the environment variable EDITOR.

After editing, the table is automatically checked for validity by 'edittable', using 'istable'. If structure errors are detected, the program prompts you for what to do (re-edit, quit, etc.).

5.3 Big tables

One thing to point out here is that the only way to update a table, both manually and with a program, is to capture the SDTOUT stream of an operator and write the results back to a table file, paying attention to not overwriting the original input file while it is being read (familiarity with the Unix shell is expected from the user). For instance, you have better not do:

      operator < table > table
    

or you will end up with 'table' being zero'ed! The correct procedure is:

      operator < table > table.tmp
      mv table.tmp table
    

At the moment NoSQL does not provide a record-level update facility. This may be inefficient if you need to do frequent changes to a large table, for instance with a CGI program behind a Web server, as the whole table needs to be rewitten upon each change. But don't be disappointed by this. There are ways to circumvent this apparently major limitation of the NoSQL paradigm.

The obvious "trick" is to try and keep your tables small. Silly a suggestion as it may sound, before laughing at it please have a look at section 2.9 of the included 4gl.ps or 4gl.txt paper. Remember that NoSQL works with UNIX, not in addition to it. The underlying UNIX file system, that most commercial databases tend to disregard, if used creatively can provide an extremely powerful way of pre-organizing your tables (relations) efficiently and keep them small (where small means roughly within a few hundred kilobytes in size).

If you really must do frequent modifications to a big indexed table, then you can still do it efficiently by applying your changes to a separate file rather than to the actual table, and merging the changes back into the big table (as well as re-build its indices) only every now and again, with a batch job that can be run in the background, overnight or when the system activity is low. The following example will try to explain this better.

Suppose we have the large indexed table bigtable.rdb and we need to insert/change/delete one or more records. What we can do is:

This may seem complicated, but it isn't much, really. Say bigtable.rdb contains two columns, SSN (for Social Security No.) and SURNAME, where the unique table key is on SSN, and we have built a secondary index on the SURNAME field. If we want to extract all the rows which SURNAME field is equal to "Smith", the query necessary to take advantage of a fast search method on bigtable.rdb and to account for the presence of bigtable.tjb is:

      printf 'SURNAME\n-------\nSmith\n' |
        search --index bigtable.rdb.x.SURNAME | update bigtable.tjb |
        row 'SURNAME=="Smith"'
    

As you can see, the trick is:

As shown, these logical steps can be performed in one single line of UNIX shell code!

5.4 Concatenating tables

The need to concatenate tables comes up every so often and although it is simple to do it may not be obvious. The UNIX 'cat' command can not be used as it would result in duplicating the header and thus make an invalid table. And of course, only tables with the same header, i.e. tables that are said to be ``union-compatible'', should be concatenated, otherwise an invalid table would result (in this case it could be a gross inconsistency if the number of columns were different). If we have two tables, TABA and TABB, then to concatenate TABB onto the end of TABA we use the command:

      union TABA TABB
    

Next Previous Contents