11. Special Topics

This chapter holds miscellaneous information which did not neatly fit in any of the other chapters.

Subclassing

Subclassing Numeric arrays is not possible due to a limitation of Python. The approach taken in the Masked Array facility (Masked Arrays) is one answer. UserArray.py, described below, can be subclassed, but this is often unsatisfactory unless you put in a similar effort to that in MA.

Code Organization
Numeric.py and friends

Numeric.py is the most commonly used interface to the Numeric extensions. It is a Python module which imports all of the exported functions and attributes from the multiarray module, and then defines some utility functions. As some of the functions defined in Numeric.py could someday be moved into a supporting C module, the utility functions and the multiarray object are documented together, in this section. The multiarray objects are the core of Numeric Python - they are extension types written in C which are designed to provide both space- and time-efficiency when manipulating large arrays of homogeneous data types, with special emphasis to numeric data types.

UserArray.py

In the tradition of UserList.py and UserDict.py , the UserArray.py module defines a class whose instances act in many ways like array objects.

Matrix.py

The Matrix.py python module defines a class Matrix which is a subclass of UserArray . The only differences between Matrix instances and UserArray instances is that the * operator on Matrix performs a matrix multiplication, as opposed to element-wise multiplication, and that the power operator ** is disallowed for Matrix instances.

Precision.py

The Precision.py module contains the code which is used to determine the mapping between typecode names and values, by building small arrays and looking at the number of bytes they use per element.

ArrayPrinter.py

The ArrayPrinter.py module defines the functions used for default printing of arrays. See the section on Textual Representations of arrays on Textual representations of arrays,

MLab.py

The MLab.py module provides some functions which are compatible with the functions of the same name in the MATLAB programming language. We have written these functions so that they will take Python sequences as arguments, such as lists, as well as Numeric arrays.

bartlett(M)

returns the M-point Bartlett window.

blackman(M)

returns the M-point Blackman window.

corrcoef(x, y=None)

The correlation coefficient

cov(m,y=None)

returns the covariance

cumprod(m, axis=0)

returns the cumulative product of the elments along the axis'th dimension of m.

cumsum(m, axis=0)

returns the cumulative sum of the elements along the axis'th dimension of m.

diag(v, k=0)

returns the k-th diagonal if v is a matrix or returns a matrix with v as the k-th diagonal if v is a vector.

diff(x, n=1)

calculates the first-order, discrete difference approximation to the derivative.

eig(m)

returns the the eigenvalues of m in x and the corresponding eigenvectors in the rows of v.

eye(N, M=N, k=0, typecode=None)

returns a N-by-M matrix where the k-th diagonal is all ones, and everything else is zeros.

fliplr(m)

returns a 2-D matrix m with the rows preserved and columns flipped in the left/right direction. Only works with 2-D arrays.

flipud(m)

returns a 2-D matrix with the columns preserved and rows flipped in the up/down direction. Only works with 2-D arrays.

hamming(M)

returns the M-point Hamming window.

hanning(M)

returns the M-point Hanning window.

kaiser(M, beta)

returns a Kaiser window of length M with shape parameter beta. It depends on the cephes module for the modified bessel function i0.

max(m, axis=0)

returns the maximum along the axis'th dimension of m.

mean(m, axis=0)

returns the mean along the axis'th dimension of m. Note: if m is an integer array, the result will be floating point. This was changed in release 10.1; previously, a meaningless integer divide was used.

median(m)

returns a mean of m along the first dimension of m.

min(m, axis=0)

returns the minimum along the axis'th dimension of m.

msort(m)

returns a sort along the first dimension of m as in MATLAB.

prod(m, axis=0)

returns the product of the elements along the axis'th dimension of m.

ptp(m, axis = 0)

returns the maximum - minimum along the axis'th dimension of m.

rand(d1, ..., dn)

returns a matrix of the given dimensions which is initialized to random numbers from a uniform distribution in the range [0,1).

rot90(m,k=1)

returns the matrix found by rotating m by k*90 degrees in the counterclockwise direction.

sinc(x)

returns sin(pi*x)/(pi*x) at all points of array x.

squeeze(a)

removes any ones from the shape of a

std(m, axis = 0)

returns the unbiased estimate of the population standard deviation from a sample along the axis'th dimension of m. (That is, the denominator for the calculation is n-1, not n.)

sum(m, axis=0)

returns the sum of the elements along the axis'th dimension of m.

svd(m)

return the singular value decomposition of m [u,x,v]

trapz(y,x=None)

integrates y = f(x) using the trapezoidal rule.

tri(N, M=N, k=0, typecode=None)

returns a N-by-M matrix where all the diagonals starting from lower left corner up to the k-th are all ones.

tril(m,k=0)

returns the elements on and below the k-th diagonal of m. k=0 is the main diagonal, k > 0 is above and k < 0 is below the main diagonal.

triu(m,k=0)

returns the elements on and above the k-th diagonal of m. k=0 is the main diagonal, k > 0 is above and k < 0 is below the main diagonal.

The multiarray object

The array objects which Numeric Python manipulates is actually a multiarray object, given this name to distinguish it from the one-dimensional array object defined in the standard array module. From here on, however, the terms array and multiarray will be used interchangeably to refer to the new object type. multiarray objects are homogeneous multidimensional sequences. Starting from the back, they are sequences. This means that they are container (compound) objects, which contain references to other objects. They are multidimensional, meaning that unlike standard Python sequences which define only a single dimension along which one can iterate through the contents, multiarray objects can have up to 40 dimensions.1 Finally, they are homogeneous. This means that every object in a multiarray must be of the same type. This is done for efficiency reasons -- storing the type of the contained objects once in the array means that the process of finding the type-specific operation to operate on each element in the array needs to be done only once per array, as opposed to once per element. Furthemore, as the main purpose of these arrays is to process numbers, the numbers can be stored directly, and not as full-fledged Python objects (PyObject *), thus yielding memory savings. It is however possible to make arrays of Python objects, which relinquish both the space and time efficiencies but allow heterogeneous contents (as we shall see, these arrays are still homogeneous from the Numeric perspective, they are just arrays of Python object references).

Typecodes

The kind of number stored in an array is described by its typecode. This code is stored internally as a single-character Python string, but more descriptive names corresponding to the typecodes are made available to the Python programmer in the Precision.py module. The typecodes are defined as follows:

Typecode Listing

Variable defined in
Typecode module

Typecode
character

Description

Char

'c'

Single-character strings

PyObject

'O'

Reference to Python object

UnsignedInt8

'b'

Unsigned integer using a single byte.

Int

'l'

Python standard integers (i.e. C long integers)

Float

'd'

Python standard floating point numbers
(i.e. C double-precision floats)

n/a

'f'

Single-precision floating point numbers

Complex

'D'

Complex numbers consisting of two double-precision floats

n/a

'F'

Complex numbers consisting of two single-precision floats

Int0, Int8, Int16, Int32, Int64, Int128

n/a

These correspond to machine-dependent typecodes: Int0 returns the typecode corresponding to the smallest available integer, Int8 that corresponding to the smallest available integer with at least 8 bits, Int16 that with at least 16 bits, etc. If a typecode is not available (e.g. Int64 on a 32-bit machine), the variable is not defined.

Float0, Float8, Float16, Float32, Float64, Float128

n/a

Same as Int0 , Int8 etc. except for floating point numbers.

Complex0, Complex8, Complex16, Complex32, Complex64, Complex128

n/a

Same as Float0 , etc., except that the number of bits refers to the precision of each of the two (real and imaginary) parts of the complex number.

Note on number fomat: the binary format used by Python is that of the underlying C library. [notes about IEEE formats, etc?]

Indexing in and out, slicing

Indexing arrays works like indexing of other Python sequences, but supports some extensions which are as of yet not implemented for other sequence types2. The standard [start:stop] notation is supported, with start defaulting to 0 (the first index position) and stop defaulting to the length of the sequence, as for lists and tuples. In addition, there is an optional stride argument, which specifies the stride size between successive indices in the slice. It is expressed by a integer following a second : immediately after the usual start:stop slice. Thus [0:11:2] will slice the array at indices 0, 2, 4, .. 10. The start and stop indices are optional, but the first : must be specified for the stride interpretation to occur. Therefore, [::2] means slice from beginning to end, with a stride of 2 (i.e. skip an index for each stride). If the start index is omitted and the stride is negative, the indexing starts from the end of the sequence and works towards the beginning of the sequence. If the stop index is omitted and the stride is negative, the indexing stops at the beginning of the sequence.

>>> print x

[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19]

>>> print x[10]

10

>>> print x[:10]

[0 1 2 3 4 5 6 7 8 9]

>>> print x[5:15:3]

[ 5 8 11 14]

>>> print x[:10:2]

[0 2 4 6 8]

>>> print x[10::-2]

[10 8 6 4 2 0]

>>> print x[::-1]

[19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0]

It is important to note that the out-of-bounds conditions follow the same rules as standard Python indexing, so that slices out of bounds are trimmed to the sequence boundaries, but element indexing with out-of-bound indices yields an IndexError:

>>> print x[:100]

[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19]

>>> print x[-200:4]

[0 1 2 3]

>>> x[100]

Traceback (innermost last):

File "<stdin>", line 1, in ?

IndexError: index out of bounds

The second difference between array indexing and other sequences is that arrays provide multidimensional indexing. An array of rank N can be indexed with up to N indices or slices (or combinations thereof. Indices should be integers (with negative integers indicating offsets from the end of the dimension, as for other Python sequences), and slices can have, as explained above, one or two :'s separating integer arguments. These indices and slies must be separated by commas, and correspond to sequential dimensions starting from the leftmost (first) index on. Thus a[3] means index 3 along dimension 0. a[3,:,-4] means the slice of a along three dimensions: index 3 along the first dimension, the entire range of indices along the second dimension, and the 4th from the end index along the third dimension. If the array being indexed has more dimensions than are specified in the multidimensional slice, those dimensions are assumed to be sliced from beginning to end. Thus, if a is a rank 3 array,

a[0] == a[0,:] == a[0,:,:]
Ellipses

A special slice element called Ellipses (and written ... ) is used to refer to a variable number of slices from beginning to end along the current dimension. It is a shorthand for a set of such slices, specifically the number of dimensions of the array being indexed minus those which are already specified. Only the first (leftmost) Ellipses in an multidimensional slice is expanded, while the others are single dimensional slices from beginning to end.

Thus, if a is a rank-6 array,

a[3,:,:,:,-1,:] == a[3,...,-1,:] == a[3,...,-1,...] .
NewAxis

There is another special symbol which can be used inside indexing operations to create new dimensions in the returned array. The reference NewAxis, used as one of the comma-separated slice elements, does not change the selection of the subset of the array being indexed, but changes the shape of the array returned by the indexing operation, so that an additional dimension (of length 1) is created, at the dimension position corresponding to the location of NewAxis within the indexing sequence. Thus, a[:,3,NewAxis,-3] will perform the indexing of a corresponding to the slice [a:,3,-3] , but will also modify the shape of a so that the new shape of a is (a.shape[0], a.shape[1], 1, a.shape[2]) . This operation is especially useful in conjunction with the broadcasting feature described next, as it replaces a lengthy but common operation with a simple notation (in the example above, the same effect can be had with

reshape(a[:,3,-1], (a.shape[0], a.shape[1], 1, a.shape[2])).

Set-indexing and Broadcasting

The indexing rules described so far specify exactly the behavior of get-indexing. For set-indexing, the rules are exactly the same, and describe the slice of the array on the left hand side of the assignment operator which is the target of the assignment. The only point left to mention is the process of assigning from the source (on the right hand side of the assignment) to the target (on the left hand side).

If both source and target have the same shape, then the assignment is done element by element. The typecode of the target specifies the casting which can be applied in the case of a typecode mismatch between source and target. If the typecode of the source is "lower" than that of the target, then an 'up-cast' is performed and no loss in precision results. If the typecode of the source is "higher" than that of the target, then a downcast is performed, which may lose precision (as discussed in the description of the array call, these casts are truncating casts, not rounding casts). Complex numbers cannot be cast to non-complex numbers.

If the source and the target have different shapes, Numeric Python attempts to broadcast the contents of the source over the range of the target. This broadcasting occurs for all dimensions where the source has dimension 1 or 0 (i.e., is absent). If there exists a dimension for which the two arrays have differing lengths, and the length of that dimension in the source is not 1, then the assignment fails and an exception (ValueError) is raised, notifying the user that the arrays are not aligned.

Axis specifications

In many of the functions defined in this document, indices are used to refer to axes. The numbering scheme is the same as that used by indexing in Python: the first (leftmost) axis is axis 0, the second axis is axis 1, etc. Axis -1 refers to the last axis, -2 refers to the next-to-last axis, etc.

Textual representations of arrays

The algorithm used to display arrays as text strings is defined in the file ArrayPrinter.py, which defines a function array2string (imported into Numeric's namespace) which offers considerable control over how arrays are output. The range of options to the array2string function will be described first, followed by a description of which options are used by default by str and repr .

Note that the optional package MA, if imported, modifies this process so that very long arrays are not printed; rather, a summary of their shape and type are shown. You may wish to import MA even if you do not use it otherwise, to get this effect, because without it accidentally attempting to print a very long array can take a very long time to convert, giving the appearance that the program has hung.

array2string(a, max_line_width = None, precision = None,
suppress_small = None, separator=' ', array_output=0):

The array2string function takes an array and returns a textual representation of it. Each dimension is indicated by a pair of matching square brackets ( [] ), within which each subset of the array is output. The orientation of the dimensions is as follows: the last (rightmost) dimension is always horizontal, so that the frequent rank-1 arrays use a minimum of screen real-estate. The next-to-last dimension is displayed vertically if present, and any earlier dimension is displayed with additional bracket divisions. For example:

>>> a = arange(24)

>>> print array2string(a)

[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]

>>> a.shape = (2,10)

>>> print array2string(a)

[[ 0 1 2 3 4 5 6 7 8 9 10 11]

[12 13 14 15 16 17 18 19 20 21 22 23]]

>>> a.shape = (2,3,4)

>>> print array2string(a)

[[[ 0 1 2 3]

[ 4 5 6 7]

[ 8 9 10 11]]

[[12 13 14 15]

[16 17 18 19]

[20 21 22 23]]]

The max_line_width argument specifies the maximum number of characters which the array2string routine uses in a single line. If it is set to None , then the value of the sys.output_line_width attribute is looked up. If it exists, it is used. If not, the default of 77 characters is used.

>>> print array2string(x)

[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

26 27 28 29]

>>> sys.output_line_width = 30

>>> print array2string(x)

[ 0 1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16 17

18 19 20 21 22 23 24 25

26 27 28 29]

The precision argument specifies the number of digits after the decimal point which are used. If a value of None is used, the value of the sys.float_output_precision is looked up. If it exists, it is used. If not, the default of 8 digits is used.

>>> x = array((10.11111111111123123111, pi))

>>> print array2string(x)

[ 10.11111111 3.14159265]

>>> print array2string(x, precision=3)

[ 10.111 3.142]

>>> sys.float_output_precision = 2

>>> print array2string(x)

[ 10.11 3.14]

The suppress_small argument specifies whether small values should be suppressed (and output as 0). If a value of None is used, the value of the sys.float_output_suppress_small is looked up. If it exists, it is used (all that matters is whether it evaluates to true or false). If not, the default of 0 (false) is used. This variable also interacts with the precision parameters, as it can be used to suppress the use of exponential notation.

>>> print x

[ 1.00000000e-005 3.14159265e+000]

>>> print array2string(x)

[ 1.00000000e-005 3.14159265e+000]

>>> print array2string(x, suppress_small=1)

[ 0.00001 3.14159265]

>>> print array2string(x, precision=3)

[ 1.000e-005 3.142e+000]

>>> print array2string(x, precision=3, suppress_small=1)

[ 0. 3.142]

The separator argument is used to specify what character string should be placed between two numbers which do not straddle a dimension. The default is a single space.

>>> print array2string(x)

[ 0 100 200 300 400 500 600 700 800 900 100]

>>> print array2string(x, separator = ', ')

[ 0, 100, 200, 300, 400, 500, 600, 700, 800, 900, 100]

Finally, the last attribute, array_output, specifies whether to prepend the string "array(" and append either the string ")" or ", 'X')" where X is a typecode for non-default typecodes (in other words, the typecode will only be displayed if it is not that corresponding to Float, Complex or Int, which are the standard typecodes associated with floating point numbers, complex numbers and integers respectively). The array() is so that an eval of the returned string will return an array object (provided a comma separator is also used).

>>> array2string(arange(3))

[0 1 2]

>>> eval(array2string(arange(3), array_output=1))

Traceback (innermost last):

File "<stdin>", line 1, in ?

File "<string>", line 1

array([0 1 2])

^

SyntaxError: invalid syntax

>>> type(eval(array2string(arange(3), array_output=1, separator=',')))

<type 'array'>

>>> array2string(arange(3), array_output=1)

'array([0, 1, 2])'

>>> array2string(zeros((3,), 'i') + arange(3), array_output=1)

"array([0, 1, 2],'i')"

The str and repr operations on arrays call array2string with the max_line_width , precision and suppress_small all set to None, meaning that the defaults are used, but that modifying the attributes in the sys module will affect array printing. str uses the default separator and does not use the array() text, while repr uses a comma as a separator and does use the array(...) text.

>>> x = arange(3)

>>> print x

[0 1 2]

>>> str(x)

'[0 1 2]'

>>> repr(x)

'array([0, 1, 2])' # note the array(...) and ,'s

>>> x = arange(0,.01,.001)

>>> print x

[ 0. 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009]

>>> import sys

>>> sys.float_output_precision = 2

>>> print x

[ 0. 0. 0. 0. 0. 0.01 0.01 0.01 0.01 0.01]

Comparisons

Comparisons of multiarray objects results using the normal comparison operators (such as == or >) result in array results. These comparisons use the routines for comparison describe in Logical Ufuncs. Note that the logical operators "and" and "or" cannot operate on arrays. The bit operation ufuncs & and | may be useful. The functions sometrue and alltrue do reduction using logical_or and logical_and.

Storing arrays on disk

Storing Numeric (and Masked) arrays to disk can be done using the pickle module. Consult standard Python documentation for details. There are also modules available for a variety of other formats, such as netCDF. See the web site for some helpful links.

Dealing with floating point exceptions

Attempts to use NaN's as missing values have proven frustrating and not very portable. Consider Masked Arrays instead. Python's facilty for floating point error control, fpectl, has not yet been incorporated into Numeric.

 


1. This limit is modifiable in the source code if higher dimensionality is needed.

2. The Python syntax can allow other Python datatypes to use both the stride notation and multidimensional indexing, and it is relatively simple to write Python classes which support these operations. See the Python Reference manual for details.

Go to Main Go to Previous Go to Next