The Network Common Data Form, or netCDF, is an interface to a library of data access functions for storing and retrieving data in the form of arrays. An array is an n-dimensional (where n is 0, 1, 2, ...) rectangular structure containing items which all have the same data type (e.g. 8-bit character, 32-bit integer). A scalar (simple single value) is a 0-dimensional array.
NetCDF is an abstraction that supports a view of data as a collection of self-describing, network-transparent objects that can be accessed through a simple interface. Array values may be accessed directly, without knowing details of how the data are stored. Auxiliary information about the data, such as what units are used, may be stored with the data. Generic utilities and application programs can access netCDF files and transform, combine, analyze, or display specified fields of the data. The development of such applications may lead to improved accessibility of data and improved reusability of software for array-oriented data management, analysis, and display.
The netCDF software implements an abstract data type, which means that all operations to access and manipulate data in a netCDF file must use only the set of functions provided by the interface. The representation of the data is hidden from applications that use the interface, so that how the data are stored could be changed without affecting existing programs. The physical representation of netCDF data is designed to be independent of the computer on which the data were written.
Unidata supports the netCDF interfaces for C, FORTRAN, C++, and perl and for various UNIX operating systems. The software is also ported and tested on a few other operating systems, with assistance from users with access to these systems, before each major release. Unidata's netCDF software is freely available via FTP to encourage its widespread use.
Why not use an existing database management system for storing array-oriented data? Relational database software is not suitable for the kinds of data access supported by the netCDF interface.
First, existing database systems that support the relational model do not support multidimensional objects (arrays) as a basic unit of data access. Representing arrays as relations makes some useful kinds of data access awkward and provides little support for the abstractions of multidimensional data and coordinate systems. A quite different data model is needed for array-oriented data to facilitate its retrieval, modification, mathematical manipulation and visualization.
Related to this is a second problem with general-purpose database systems: their poor performance on large arrays. Collections of satellite images, scientific model outputs and long-term global weather observations are beyond the capabilities of most database systems to organize and index for efficient retrieval.
Finally, general-purpose database systems provide, at significant cost in terms of both resources and access performance, many facilities that are not needed in the analysis, management, and display of array-oriented data. For example, elaborate update facilities, audit trails, report formatting, and mechanisms designed for transaction-processing are unnecessary for most scientific applications.
To achieve network-transparency (machine-independence), netCDF is implemented in terms of XDR (eXternal Data Representation, see `ftp://ds.internic.net/rfc/rfc1832.txt'), a proposed standard protocol for describing and encoding data. XDR provides encoding of data into machine-independent sequences of bits. XDR has been implemented on a wide variety of computers, by assuming only that eight-bit bytes can be encoded and decoded in a consistent way. XDR uses the IEEE floating-point standard for floating-point data.
The overall structure of netCDF files is described in section NetCDF File Structure and Performance.
The details of the format are described in section File Format Specification. However, users are discouraged from using the format specification to develop independent low-level software for reading and writing netCDF files, because this could lead to compatibility problems when the format is modified.
One of the goals of netCDF is to support efficient access to small subsets of large datasets. To support this goal, netCDF uses direct access rather than sequential access. This can be much more efficient when data is read in a different order from that in which it was written.
The amount of XDR overhead depends on many factors, including the data type, the type of computer, the granularity of data access, and how well the implementation has been tuned to the computer on which it is run. This overhead is typically small in comparison to the overall resources used by an application. In any case, the overhead of the XDR layer is usually a reasonable price to pay for portable, network-transparent data access.
Although efficiency of data access has been an important concern in designing and implementing netCDF, it is still possible to use the netCDF interface to access data in inefficient ways: for example, by requesting a slice of data that requires a single value from each record. Advice on how to use the interface efficiently is provided in section NetCDF File Structure and Performance.
NetCDF can be used as a general-purpose archive format for storing arrays. Compression of data is possible with netCDF (e.g., using arrays of eight-bit or 16-bit integers to encode low-resolution floating-point numbers instead of arrays of 32-bit numbers), but the current version of netCDF was not designed to achieve optimal compression of data. Hence, using netCDF may require more space than special-purpose archive formats that exploit knowledge of particular characteristics of specific datasets.
The mere use of netCDF is not sufficient to make data "self-describing" and meaningful to both humans and machines. The names of variables and dimensions should be meaningful and conform to any relevant conventions. Dimensions should have corresponding coordinate variables where sensible.
Attributes play a vital role in providing ancillary information. It is important to use all the relevant standard attributes using the relevant conventions. section Attribute Conventions, describes reserved attributes (used by the netCDF library) and attribute conventions for generic application software.
A number of groups have defined their own additional conventions and styles for netCDF data. Descriptions of these conventions, as well as examples incorporating them can be accessed from the netCDF Conventions site (`http://www.unidata.ucar.edu/packages/netcdf/conventions.html').
These conventions should be used where suitable. Additional conventions are often needed for local use. These should be contributed to the above netCDF Conventions site if likely to interest other users in similar areas.
The development of the netCDF interface began with a modest goal related to Unidata's needs: to provide a common interface between Unidata applications and ingested real-time meteorological data. Since Unidata software was intended to run on multiple hardware platforms with access from both C and FORTRAN, achieving Unidata's goals had the potential for providing a package that was useful in a broader context. By making the package widely available and collaborating with other organizations with similar needs, we hoped to improve the then current situation in which software for scientific data access was only rarely reused by others in the same discipline and almost never reused between disciplines (Fulker, 1988).
Important concepts employed in the netCDF software originated in a paper (Treinish and Gough, 1987) that described data-access software developed at the NASA Goddard National Space Science Data Center (NSSDC). The interface provided by this software was called the Common Data Format (CDF). The NASA CDF was originally developed as a platform-specific FORTRAN library to support an abstraction for storing arrays.
The NASA CDF package had been used for many different kinds of data in an extensive collection of applications. It had the virtues of simplicity (only 13 subroutines), independence from storage format, generality, ability to support logical user views of data, and support for generic applications.
Unidata held a workshop on CDF in Boulder in August 1987. We proposed exploring the possibility of collaborating with NASA to extend the CDF FORTRAN interface, to define a C interface, and to permit the access of data aggregates with a single call, while maintaining compatibility with the existing NASA interface.
Independently, Dave Raymond at the New Mexico Institute of Mining and Technology had developed a package of C software for UNIX that supported sequential access to self-describing array-oriented data and a "pipes and filters" (or "data flow") approach to processing, analyzing, and displaying the data. This package also used the "Common Data Format" name, later changed to C-Based Analysis and Display System (CANDIS). Unidata learned of Raymond's work (Raymond, 1988), and incorporated some of his ideas, such as the use of named dimensions and variables with differing shapes in a single data object, into the Unidata netCDF interface.
In early 1988, Glenn Davis of Unidata developed a prototype netCDF package in C that was layered on XDR. This prototype proved that a single-file, network-transparent implementation of the CDF interface could be achieved at acceptable cost and that the resulting programs could be implemented on both UNIX and VMS systems. However, it also demonstrated that providing a small, portable, and NASA CDF-compatible FORTRAN interface with the desired generality was not practical. NASA's CDF and Unidata's netCDF have since evolved separately, but recent CDF versions share many characteristics with netCDF.
In early 1988, Joe Fahle of SeaSpace, Inc. (a commercial software development firm in San Diego, California), a participant in the 1987 Unidata CDF workshop, independently developed a CDF package in C that extended the NASA CDF interface in several important ways (Fahle, 1989). Like Raymond's package, the SeaSpace CDF software permitted variables with unrelated shapes to be included in the same data object and permitted a general form of access to multidimensional arrays. Fahle's implementation was used at SeaSpace as the intermediate form of storage for a variety of steps in their image-processing system. This interface and format have subsequently evolved into the Terascan data format.
After studying Fahle's interface, we concluded that it solved many of the problems we had identified in trying to stretch the NASA interface to our purposes. In August 1988, we convened a small workshop to agree on a Unidata netCDF interface, and to resolve remaining open issues. Attending were Joe Fahle of SeaSpace, Michael Gough of Apple (an author of the NASA CDF software), Angel Li of the University of Miami (who had implemented our prototype netCDF software on VMS and was a potential user), and Unidata systems development staff. Consensus was reached at the workshop after some further simplifications were discovered. A document incorporating the results of the workshop into a proposed Unidata netCDF interface specification was distributed widely for comments before Glenn Davis and Russ Rew implemented the first version of the software. Comparison with other data-access interfaces and experience using netCDF are discussed in (Rew and Davis, 1990a), (Rew and Davis, 1990b), (Jenter and Signell, 1992), and (Brown, Folk, Goucher, and Rew, 1993).
In October 1991, we announced version 2.0 of the netCDF software
distribution. Slight modifications to the C interface (declaring
dimension sizes to be long
rather than int
) improved the
usability of netCDF on inexpensive platforms such as MS-DOS computers,
without requiring recompilation on other platforms. This change to the
interface required no changes to the associated file format.
Release of netCDF version 2.3.2 in June 1993 preserved the same file format but added single call access to records, optimizations for accessing cross-sections involving non-contiguous data, subsampling along specified dimensions (using `strides'), accessing non-contiguous data (using `mapped array sections'), improvements to the ncdump and ncgen utilities, and an experimental C++ interface.
This Guide documents the February 1996 release of netCDF 2.4, which preserves the same file format as earlier versions but includes the following changes from version 2.3.2:
In order to support netCDF on new platforms where the size of a
long
is greater than the size of an int
, the new release
fully integrates the use of the nclong
typedef into the C and C++
interfaces.
Additions and changes were made to the C++ interface to make it easier to step through records, coordinate concurrent access to netCDF files, and access single records.
The netCDF data model is widely applicable to data that can be organized into a collection of named array variables with named attributes, but there are some important limitations to the model and its implementation in software.
The data model does not support nested data structures. The netCDF interface provides little help in representing trees, nested arrays, or other recursive data structures, mostly because of the requirement that the FORTRAN interface should be able to read and write any netCDF dataset. Through use of indirection and conventions it is possible to represent some kinds of nested structures, but the result falls short of the netCDF goal of "self-describing data".
A significant limitation of the current implementation is that only one unlimited dimension is permitted for each netCDF dataset. Multiple variables can share an unlimited dimension, but then they must all grow together. Hence the netCDF model does not cater for variables with several changeable dimension sizes. It is also not possible to have different changeable dimensions in different variables within the same file. Variables that have non-rectangular shapes (e.g. "ragged arrays") cannot be represented conveniently.
The interface does not provide any facilities specific to coordinate variables, such as a using them to specify position along dimensions as an alternative to normal indexing. There are no facilities yet for packing data in bit fields (XDR lacks this capability). Hence an array of 9-bit data must be stored in 16-bit arrays to be conveniently accessed. Dataset sizes are currently limited to 2 Gigabytes, because of the use of 32-bit signed offsets.
Finally, the current implementation limits concurrent access to a netCDF file. One writer and multiple readers may access data in a single file simultaneously, but there is no support for multiple concurrent writers.
XDR is to be replaced by new software under development. This will provide added functionality and greater efficiency.
Current plans are to add transparent data packing, improved concurrency support, access to data by key or coordinate value, support for efficient structure changes (e.g. new variables and attributes), new data types, and the addition of type-safe C and FORTRAN interfaces for accessing data as a specific type, independent of how it is stored. Other desirable extensions that may be added, if practical, include support for pointers to data cross-sections in other files, nested arrays (allowing representation of ragged arrays, trees and other recursive data structures), ability to access datasets larger than 2 Gigabytes, and multiple unlimited dimensions.
Go to the first, previous, next, last section, table of contents.