Handling Data
|
11: Handling Data
Gri can handle many different sorts of data file formats, including
ascii files, binary files in machine format, and the very powerful and
increasingly popular netCDF format.
(For more information on netCDF format, see
`
This chapter concentrates on ascii format. The overall message is that
you should not have to modify your data files to work with Gri.
For example, many oceanographic data files have header lines at the
start. With other plotting systems, users find themselves stripping off
these headers as a first step in data analysis. This is done to make
the data look like a tabular list, or matrix, for reading by matlab or
various spreadsheet-like programs. (It is not necessary to do this in
matlab, by the way; you should use the matlab ` The difficulty with stripping off header lines is that unless you are careful, you can lose the header information unless you are careful to put it in a separate file with an appropriate filename, and then just as careful to archive the header along with the data, and to send both to your colleague who has requested the data, etc. Often the header information seems unimportant to you at the moment, but it may be crucial to you later on, or to the next person who looks at the data! In Gri it is very easy to handle headers within files. It's also easy to handle data that are in somewhat odd formats, or that must be manipulated mathematically or textually to make sense.
11.1: Handling headers
11.1.1: Case 1 -- known number of header linesThis is easy. If you know that the file has, say, 10 header lines, you can just do this:
11.1.2: Case 2 -- header itself indicates number of header linesQuite often the first line of a file will indicate the number of header lines. For example, suppose the first line contains a single number, indicating the number of header lines to follow:
11.1.3: Case 3 -- header lines marked by a textual keySometimes header lines are indicated by a textual key, for example, the characters `HEADER ' at the start of the line in the file. The easy
way to skip such a header is to use a system command. Depending on your
familiarity with the operating system (here presumed to be Unix), you
might choose to use Grep, Awk, or Perl. Here are examples:
| ' mechanism, see Open. The Grep command
prints lines which do not match the indicated string
(because of the `-v ' switch), and the `^ ' character stands for
the start of the line see Grep. Thus all lines with the key word at the
start of the line are skiped.
11.1.4: Case 4 -- reading and using information in headerConsider a dataset in which the first line gives the time of observation, followed by a list of observations. This might be, for example, an indication of the data taken from a weather balloon released at a particular time from a fixed location, with the main data being air temperature as a function of elevation of the balloon. The time indication might be, for instance, the hour number. One might need to know the time to print a label on the diagram. You could do that by:
where the `
Here the `
(Some of you might know how many minutes in a day, but I'm silly so I kept the extra mathematical step -- nothing is lost by being straightforward!) Data set often have information on observation location as well as time, and I hope the above suggests how to handle that. It is very common, by the way, to want to draw so-called "waterfall" plots, in which many curves are drawn, each one offset by some amount. Sometimes you'll want the offset to be constant, but just as often you'll want the offset to be dependent on information in a header, such as the time of observation. I hope the above indicates roughly how you might handle this, but a snippet is:
Here a scale factor of 100 has been applied to y, to convert time units into an offset in terms of y.
11.2: Ignoring columns that are not of interestQuite often a dataset will have many columns, of which only a couple are of interest to you. Consider for example an oceanographic data which has columns storing, in order, these variables: (1) depth in water column, (2) "in situ" temperature, (3) "potential" temperature, (4) salinity, (5) conductivity, (6) density, (7) sigma-theta, (8) sound speed, and (9) oxygen concentration. But you might only be interested in plotting a graph of salinity on the x-axis and depth on the y-axis. Here are several ways to do this:
* ' is a place-keeper to indicate to skip that column.
For a large number of columns, or as an aesthetic choice, you might
prefer to write this as
Many users would just as soon not bother with this syntax, preferring instead to use system tools with which they are more familiar. So a Gawk user might write
For more on the Gawk command, see Awk. Suppose the file contains (x,y), but you wish to plot 2y times x. You could do the doubling of y within Gri, as
The latter is preferable in the sense that it is more powerful. The reason for this is that Gri allows you to manipulate the x and y columns, using so-called RPN mathematics see rpn Mathematics, but you cannot blend the columns. For example, you cannot easily form the ratio y/x in Gri. (Actually, you can, by looping through your data and doing the calculation index by index, but if you knew that already you wouldn't need to be reading this section!) Such blending is trivial in the operating system, though, as in the following Gawk example see Awk.
11.4: Combining columns from different filesSuppose you want to plot a column (`y ', say) from one file versus a
second column (`x ') from a second data file. The easy way is to
use a system command to create a new file, for example the Unix command
`paste ' -- but of course you don't want to clutter your filesystem
with such files, so you should do this withing Gri:
|