This chapter discusses the six primitive netCDF data types, the kinds of data access supported by the netCDF interface, and how data structures other than arrays may be implemented in a netCDF file.
The current set of primitive types supported by the netCDF interface are:
character
byte
short
long
float
double
Except for the added byte
and the lack of
unsigned types, netCDF supports the same primitive data types as C. The
names for the primitive data types are reserved words in CDL, so the
names of variables, dimensions, and attributes must not be type names.
It is currently possible to interpret byte
data as either signed (-128
to 127) or unsigned (0 to 255).
The current version of the netCDF library simply reads and writes 8-bit
bytes without needing to know whether they are signed.
However, the addition of packed data in a future version of netCDF
will require arithmetic operations on values, and for that purpose
byte
data will be interpreted as signed.
These types were chosen because they are familiar to C and FORTRAN programmers, they have well-defined external representations independent of any particular computers (using XDR), and they are sufficient for providing a reasonably wide range of trade-offs between data precision and number of bits required for each datum. See section Language Types Corresponding to NetCDF Data Types, for the correspondence between netCDF data types and the data types of a language.
There are plans for new data types, including 64-bit integers and n-bit packing.
To access (read or write) netCDF data you specify an open netCDF file, a netCDF variable, and information (e.g. indices) identifying elements of the variable. In addition, the netCDF interface supports a form of record-oriented data access.
Access to data is direct, which means you can access a small subset of data from a large dataset efficiently, without first accessing all the data that precedes it. Reading and writing data by specifying a variable, instead of a position in a file, makes data access independent of how many other variables are in the file, making programs immune to data format changes that involve adding more variables to the data.
In the C and FORTRAN interfaces, files are not specified by name every time you want to access data, but instead by a small integer called a file ID, obtained when the file is first created or opened. Similarly, a variable is not specified by name for every data access either, but by a variable ID, a small integer used to identify a variable in a netCDF file. (In the C++ interface, open netCDF files and variables are objects, so no IDs are needed.)
The netCDF interface supports several forms of direct access to data values in an open netCDF file. We describe each of these forms of access in order of increasing generality:
These four types of vector (index vector, count vector, stride vector and index mapping vector) are all vectors with an element for each dimension. For an n-dimensional variable (rank = n), an n-element vector is needed. If the variable is a scalar (no dimensions), these vectors are ignored.
An array section is a "slab" or contiguous rectangular block that is specified by two vectors. The index vector gives the indices of the element in the corner closest to the origin. The count vector gives the lengths of the edges of the slab along each of the variable's dimensions, in order. The number of values accessed is the product of these edge lengths.
A subsampled array section is similar to an array section, except that an additional stride vector is used specify sampling. This vector has an element for each dimension giving the length of the strides to be taken along that dimension. For example, a stride of 4 means every fourth value along the corresponding dimension. The total number of values accessed is again the product of the elements of the count vector.
A mapped array section is similar to a subsampled array section except that an additional index mapping vector allows one to specify how data values associated with the netCDF variable are arranged in memory. The offset, in bytes, of each value from the reference location, is given by the sum of the products of each index (of the imaginary internal array which would be used if there were no mapping) by the corresponding element of the index mapping vector. The number of values accessed is the same as for a subsampled array section.
The use of mapped array sections is discussed more fully below, but first we present an example of the more commonly used array-section access.
Assume that in our earlier example
netCDF file (see section Components of a NetCDF File), we wish to read a
cross-section of all the
data for the temp
variable at one level (say, the second), and
assume that there are currently three records (time
values) in
the netCDF file. Recall that the dimensions are defined as
lat = 5, lon = 10, level = 4, time = unlimited;
and the variable temp
is declared as
float temp(time, level, lat, lon);
in the CDL notation.
A corresponding C variable that holds data for only one level might be declared as
#define LATS 5 #define LONS 10 #define LEVELS 1 #define TIMES 3 /* currently */ ... float temp[TIMES*LEVELS*LATS*LONS];
to keep the data in a one-dimensional array, or
... float temp[TIMES][LEVELS][LATS][LONS];
using a multidimensional array declaration.
In FORTRAN, the dimensions are reversed from the CDL declaration with the first dimension varying fastest and the record dimension as the last dimension of a record variable. Thus a FORTRAN declaration for the corresponding variable that holds data for only one level is
INTEGER LATS, LONS, LEVELS, TIMES PARAMETER (LATS=5, LONS=10, LEVELS=1, TIMES=3) ... REAL TEMP(LONS, LATS, LEVELS, TIMES)
To specify the block of data that represents just the second level, all
times, all latitudes, and all longitudes, we need to provide a corner
and some edge lengths. The corner should be (0, 1, 0, 0) in C--or (1,
1, 2, 1) in FORTRAN--because we want to start at the beginning of each
of the time
, lon
, and lat
dimensions, but we want
to begin at the second value of the level
dimension. The edge
lengths should be (3, 1, 5, 10) in C--or (10, 5, 1, 3) in
FORTRAN--since we want to get data for all three time
values,
only one level
value, all five lat
values, and all 10
lon
values. We should expect to get a total of 150 float values
returned (3 * 1 * 5 * 10), and should provide enough space in our array
for this many. The order in which the data will be returned is with the
last dimension, lon
, varying fastest for C, or with the first
dimension, LON
, varying fastest for FORTRAN:
C FORTRAN temp[0][1][0][0] TEMP(1, 1, 2, 1) temp[0][1][0][1] TEMP(2, 1, 2, 1) temp[0][1][0][2] TEMP(3, 1, 2, 1) temp[0][1][0][3] TEMP(4, 1, 2, 1) ... ... temp[2][1][4][7] TEMP( 8, 5, 2, 3) temp[2][1][4][8] TEMP( 9, 5, 2, 3) temp[2][1][4][9] TEMP(10, 5, 2, 3)
Note that the different dimension orders for the C and FORTRAN interfaces do not reflect a different order for values stored on the disk, but merely different orders supported by the procedural interfaces to the two languages. In general, it does not matter whether a netCDF file is written using the C or FORTRAN interface; netCDF files written from either language may be read by programs written in the other language.
The use of mapped array sections allows non-trivial relationships between the disk addresses of variable elements and the addresses where they are stored in memory. For example, a matrix in memory could be the transpose of that on disk, giving a quite different order of elements. In a regular array section, the mapping between the disk and memory addresses is trivial: the structure of the in-memory values (i.e. the dimensional sizes and their order) is identical to that of the array section. In a mapped array section, however, an index mapping vector is used to define the mapping between indices of netCDF variable elements and their memory addresses. The offset, in bytes, from the origin of a memory-resident array to a particular point is given by the inner product (1) of the index mapping vector with the point's index vector (2) . The index mapping vector for a regular array section would have -- in order from most rapidly varying dimension to most slowly -- the byte size of a memory-resident datum (e.g. 4 for a floating-point value), then the product of that value with the edge length of the most rapidly varying dimension of the array section, then the product of that value with the edge length of the next most rapidly varying dimension, and so on. In a mapped array, however, the correspondence between netCDF variable disk locations and memory locations can be radically different. For example, the following C definitions
struct vel { int flags; float u; float v; } vel[NX][NY]; long imap[2] = { sizeof(struct vel), sizeof(struct vel)*NY};
where imap
is the index mapping vector,
can be used to access the memory-resident values of the netCDF variable,
vel(NY,NX)
, even
though the dimensions are transposed and the data is contained in a 2-D array
of structures rather than a 2-D array of floating-point values.
A more detailed example of mapped array access is presented in the description of the C and FORTRAN interfaces for mapped array access. See section Write a Subsampled Or Mapped Array of Values: ncvarputg, NCVPTG, and NCVPGC.
Note that, although the netCDF abstraction allows the use of subsampled or mapped array-section access if warranted by the situation, they are not required. If you do not need these more general forms of access, you may ignore these capabilities and use single value access or regular array section access instead.
Record-oriented access provides a more efficient alternative method in C (not FORTRAN) of reading or writing a whole record or part of a record. A record contains data for all the record variables, and any number of these can be read or written in a single record-oriented access call.
You specify a netCDF file, a record number (index of unlimited dimension) and an array of pointers to buffers (areas of memory) for each of the variables in the record. Those variables corresponding to NULL values in this array are ignored.
An example where the gain in speed could be considerable would be a file consisting of fifty variables, all of which have just one dimension which is the unlimited dimension. Thus each record contains a single value for each of fifty variables. It would be much faster to use a single record-oriented call, which reads or writes a whole record of fifty values, than to use fifty separate conventional calls, which each read or write a single value.
The only kind of data structure directly supported by the netCDF abstraction is a collection of named arrays with attached vector attributes. NetCDF is not particularly well-suited for storing linked lists, trees, sparse matrices, ragged arrays or other kinds of data structures requiring pointers. It is possible to build other kinds of data structures from sets of arrays by adopting various conventions regarding the use of data in one array as pointers into another array. The netCDF library won't provide much help or hindrance with constructing such data structures, but netCDF provides the mechanisms with which such conventions can be designed.
The following example stores a ragged array ragged_mat
using
an attribute row_index
to name an associated index variable giving the index of the start of
each row.
The first row contains 12 (12-0) elements, the second 7 (19-12), etc.
float ragged_mat(max_elements); ragged_mat:row_index = "row_start"; int row_start(max_rows); data: row_start = 0, 12, 19, ...
As another example, netCDF variables may be grouped within a netCDF file by defining attributes that list the names of the variables in each group, separated by a conventional delimiter such as a space or comma. A convention can be adopted to use particular sorts of attribute names for such groupings, so that an arbitrary number of named groups of variables can be supported. If needed, a particular conventional attribute for each variable might list the names of the groups of which it is a member. Use of attributes, or variables that refer to other attributes or variables, provides a flexible mechanism for representing some kinds of complex structures in netCDF files.
Go to the first, previous, next, last section, table of contents.