Masked arrays are arrays that may have missing or invalid entries. Module MA provides a nearly work-alike replacement for Numeric that supports data arrays with masks.
MA uses Numeric and the optional package Properties. See Properties Reference.
Masked arrays are arrays that may have missing or invalid entries. Module MA provides a work-alike replacement for Numeric that supports data arrays with masks. A mask is either None or an array of ones and zeros, that determines for each element of the masked array whether or not it contains an invalid entry. The package assures that invalid entries are not used in calculations.
A particular element is said to be masked (invalid) if the mask is not None and the corresponding element of the mask is 1; otherwise it is unmasked (valid).
This package was written by Paul F. Dubois at Lawrence Livermore National Laboratory. Please see the legal notice in the software and on License and disclaimer for packages MA, RNG, Properties.
MA is one of the optional Packages and installing it requires a separate step as explained in the Numeric README. To install just the MA package using Distutils, in the MA top directory enter:
Use MA as a replacement for Numeric:
To create an array with the second element invalid, we would do:
y = array([1, 2, 3], mask = [0, 1, 0])
To create a masked array where all values "near" 1.e20 are invalid, we can do:
z = masked_values ([1.0, 1.e20, 3.0, 4.0], 1.e20)
For a complete discussion of creation methods for masked arrays please see Constructing masked arrays.
The Numeric module is an attribute in MA, so to execute a method foo from Numeric, you can reference it as Numeric.foo(...).
Usually people use both MA and Numeric this way, but of course you can always fully-qualify the names:
The principal feature of module MA is class MaskedArray, the class whose instances are returned by the array constructors and most functions in module MA. We will discuss this class first, and later cover the attributes and functions in module MA. For now suffice it to say that among the attributes of the module are the constants from module Numeric including those for declaring typecodes, NewAxis, and the mathematical constants such as pi and e. An additional typecode, MaskType, is the typecode used for masks.
In Module MA, an array is an instance of class MaskedArray, which is defined in the module MA. An instance of class MaskedArray can be thought of as containing the following parts:
We will use the terms "invalid value" and "invalid entry" to refer to the data value at a place corresponding to a mask value of 1. It should be emphasized that the invalid values are never used in any computation, and that the fill value is not used for any computational purpose. When an instance x of class MaskedArray is converted to its string representation, it is the result returned by filled (x) that is converted to a string.
flat: (deprecated) returns the masked array as one-dimensional. This is provided for compatibility with Numeric. ravel (x) is preferred. It can be assigned to: x.flat = value will change the values of x.
real: returns the real part of the array if complex. It can be assigned to: x.real = value will change the real parts of x.
imaginary: returns the imaginary part of the array if complex. It can be assigned to: x.imaginary = value will change the imaginary parts of x.
shape: The shape of a masked array can be accessed or changed by using the special attribute shape, as with Numerical arrays. It can be assigned to: x.shape = newshape will change the shape of x. The new shape describe the same total number of elements.
shared_data: This read-only flag if true indicates that the masked array shared a reference with the original data used to construct it at the time of construction. Changes to the original array will affect the masked array. (This is not the default behavior; see Copying or not?.) This flag is informational only.
shared_mask: This read-only flag if true indicates that the masked array currently shares a reference to the mask used to create it. Unlike shared_data, this flag may change as the result of modifying the array contents, as the mask uses copy on write semantics if it is shared.
The following additional constructors are provided for convenience.
On entry to any of these constructors, data must be any object which the Numeric package can accept to create an array (with the desired typecode, if specified). The mask if given must be None or any object that can be turned into a Numeric array of integer type (it will be converted to typecode MaskType, if necessary), have the same shape as data, and contain only values of 0 or 1.
If the mask is not None but its shape does not match that of data, an exception will be thrown, unless one of the two is of length 1, in which case the scalar will be resized (using Numeric.resize) to match the other.
See Copying or not? for a discussion of whether or not the resulting array shares its data or its mask with the arguments given to these constructors.
filled is very important. It converts its argument to a plain Numeric array.
filled (x, value = None) returns x with any invalid locations replaced by a fill value. filled is guaranteed to return a plain Numeric array. The argument x does not have to be a masked array or even an array, just something that Numeric can turn into one.
Note that a new array is created only if necessary to create a correctly filled, contiguous, Numeric array.
The function filled plays a central role in our design. It is the "exit" back to Numeric, and is used whenever the invalid values must be replaced before an operation. For example, adding two masked arrays a and b is roughly:
masked_array(filled(a, 0)+filled(b, 0), mask_or(getmask(a), getmask(b))
That is, fill the invalid entries a and b with zeros, add them up, and declare any entry of the result invalid if either a or b was invalid at that spot. The functions getmask and mask_or are discussed later.
filled also can be used to simply be certain that some expression is a contiguous Numerical array at little cost. If its argument is a Numeric array already, it is returned without copying.
If you are certain that a masked array x contains a mask that is None or is all zeros, you can convert it to a Numeric array with the Numeric.array(x) constructor. If you turn out to be wrong, an MAError exception is raised.
fill_value (x), and the method x.fill_value() of the same name on masked arrays, returns a value suitable for filling x based on its type. If x is a masked array, then x.fill_value () results. The returned value for a given type can be changed by assigning to these names in module MA: They should be set to scalars or one element arrays.
default_real_fill_value = Numeric.array([1.0e20], Float32)
default_complex_fill_value = Numeric.array([1.0e20 + 0.0j], Complex32)
default_character_fill_value = masked
default_integer_fill_value = Numeric.array([0]).astype(UnsignedInt8)
default_object_fill_value = masked
The variable masked is a module variable of MA and is discussed in The constant masked. Calling filled with a fill_value of masked sometimes produces a useful printed representation of a masked array. The function fill_value works on any kind of object.
set_fill_value (a, fill_value) is the same as a.set_fill_value (fill_value) if a is a masked array; otherwise it does nothing. Please note that the fill value is mostly cosmetic; it is used when it is needed to convert the masked array to a plain Numeric array but not involved in most operations. In particular, setting the fill value to 1.e20 will not, repeat not, cause elements of the array whose values are currently 1.e20 to be masked. For that sort of behavior use the masked_value constructor.
Masks are either None or 1-byte Numerical arrays of 1's and 0's. To avoid excessive performance penalties, mask arrays are never checked to be sure that the values are 1's and 0's, and supplying a mask= argument to a constructor with an illegal mask will have undefined consequences later.
Masks have the savespace attribute set . This attribute, discussed in the Numeric Python manual, may have surprising consequences if you attempt to do any operations on them other than those supplied by this package. In particular, do not add or multiply a quantity involving a mask. For example, if m is a mask consisting of 1080 1 values, sum(m) is 56, not 1080. Oops.
is_mask (m) is true if m is of a type and precision that would be allowed as the mask field of a masked array (that is, it is an array of integers with Numeric's typecode MaskType, or it is None). To be a legal mask, m should contain only zeros or ones, but this is not checked.
make_mask (m, copy=0, flag=0) returns an object whose entries are equal to m and for which is_mask would return true. If m is already a mask or None, it returns m or a copy of it. Otherwise it will attempt to make a mask, so it will accept any sequence of integers of for m. If flag is true, make_mask returns None if its return value otherwise would contain no true elements. To make a legal mask, m should contain only zeros or ones, but this is not checked.
make_mask_none (s) returns a mask of all zeros of shape s (deprecated form: create_mask).
getmask (x) returns x.mask(), the mask of x, if x is a masked array, and None otherwise. Note that getmask may return None if x is a masked array but has a mask of None. (Please see caution above about operating on the result).
getmaskarray (x) returns x.mask() if x is a masked array and has a mask that is not None; otherwise it returns a zero mask array of the same shape as x. Unlike getmask, getmaskarray always returns an Numeric array of typecode MaskType. (Please see caution above about operating on the result).
mask_or (m1, m2) returns an object which when used as a mask behaves like the element-wise "logical or" of m1 and m2, where m1 and m2 are either masks or None (e.g., they are the results of calling getmask). A None is treated as everywhere false. If both m1 and m2 are None, it returns None. If just one of them is None, it returns the other. If m1 and m2 refer to the same object, a reference to that object is returned.
Masked arrays support the operators +, *, /, -, **, and unary plus and minus. The other operand can be another masked array, a scalar, a Numeric array, or something Numeric.array() can convert to a Numeric array. The results are masked arrays.
In addition masked arrays support the in-place operators +=, -=, *=, and /=. Implementation of in-place operators differs from Numeric semantics in being more generous about converting the right-hand side to the required type: any kind or lesser type accepted via an astype conversion. In-place operators truly operate in-place when the target is not masked.
Depending on the arguments results of constructors may or may not contain a separate copy of the data or mask arguments. The easiest way to think about this is as follows: the given field, be it data or a mask, is required to be a Numerical array, possibly with a given typecode, and a mask's shape must match that of the data. If the copy argument is zero, and the candidate array otherwise qualifies, a reference will be made instead of a copy. If for any reason the data is unsuitable as is, an attempt will be made to make a copy that is suitable. Should that fail, an exception will be thrown. Thus, a copy=0 argument is more of a hope than a command.
If the basic array constructor is given a masked array as the first argument, its mask, typecode, spacesaver flag, and fill value will be used unless specifically specified by one of the remaining arguments. In particular, if d is a masked array, array(d, copy=0) is d.
Since the default behavior for masks is to use a reference if possible, rather than a copy, which produces a sizeable time and space savings, it is especially important not to modify something you used as a mask argument to a masked array creation routine, if it was a Numeric array of typecode MaskType.
A masked array defines the conversion operators str (x), repr (x), float (x), and int (x) by applying the corresponding operator to the Numeric array filled (x)
Indexing and slicing differ from Numeric: while generally the same, they return a copy, not a reference, when used in an expression that produces a non-scalar result. Consider this example:
This will print [1., 9., 3.] since x[1:] returns a reference to a portion of x. Doing the same operation using MA,
will print [1., 2., 3.], while y will be a separate array whose present value would be [9., 3.]. While sentiment on the correct semantics here is divided amongst the Numeric community as a whole, it is not divided amongst the author's community, on whose behalf this package is written.
Using multiple sets of square brackets on the left side of an assignment statement will not produce the desired result:
x[1][1] = 20. # Error, does not change x
x[1,1] = 20. # Correct, changes x
The reason is that x[1] is a copy, so changing it changes that copy, not x. Always use just one single square bracket for assignments.
If indexing or another operation on a masked array produces a scalar result, then a scalar value is returned rather than a one-element masked array. This raises the issue of what to return if that result is masked. The answer is that the module constant masked is returned. This constant is discussed in The constant masked. While this most frequently occurs from indexing, you can also get such a result from other functions. For example, averaging a 1-D array, all of whom's values are invalid, would result in masked.
Assignment of a normal value to a single element or slice of a masked array has the effect of clearing the mask in those locations. In this way previously invalid elements become valid. The value being assigned is filled first, so that you are guaranteed that all the elements on the left-hand side are now valid.
Assignment of None to a single element or slice of a masked array has the effect of setting the mask in those locations, and the locations become invalid.
Since these operations change the mask, the result afterwards will no longer share a mask, since masks have copy-on-write semantics.
Constants e, pi, NewAxis from Numeric, and the constants from module Precision that define nice names for the typecodes.
The special variables masked and masked_print_option are discussed in The constant masked.
The module Numeric is an element of MA, so after from MA import *, you can refer to the functions in Numeric such as Numeric.ones.
Each of the operations discussed below returns an instance of class MaskedArray, having performed the desired operation element-wise. In most cases the array arguments can be masked arrays or Numeric arrays or something that Numeric can turn into a Numeric array, such as a list of real numbers.
In most cases, if Numeric has a function of the same name, the behavior of the one in MA is the same, except that it "respects" the mask.
The result of a unary operation will be masked wherever the original operand was masked. It may also be masked if the argument is not in the domain of the function. Functions available are:
sqrt, log, log10, exp, conjugate, sin, cos, tan, arcsin, arccos, arctan, sinh, cosh, tanh, absolute, fabs, negative (also as operator -x), nonzero, around, floor
fabs (x) is the absolute value of x as a Float32 array. The other functions have their standard meaning.
Binary functions return a result that is masked wherever either of the operands were masked; it may also be masked where the arguments are not in the domain of the function.
add (also as operator +), subtract (also as operator -), multiply (also as operator *), divide (also as operator /), power (also as operator **), remainder, fmod, hypot, arctan2, bitwise_and, bitwise_or, bitwise_xor.
To compare arrays, use the following binary functions. Each of them returns a masked array of 1's and 0's.
equal, not_equal, less_equal, greater_equal, less, greater
Note that as in Numeric, you can use a scalar for one argument and an array for the other. Note the special caution, The operators and the comparison functions are not exactly equivalent
Arrays of logical values can be manipulated with:
logical_not (unary), logical_or, logical_and, logical_xor.
alltrue (x) returns 1 if all elements of x are true. Masked elements are treated as true.
sometrue (x) returns 1 if any element of x is true. Masked elements are treated as false.
isarray (x), isMA (x) return true if x is a masked array.
rank (x) is the number of dimensions in x.
shape (x) returns the shape of x, a tuple of array extents.
resize (x, new_shape) returns a new array with specified shape.
reshape (x, new_shape) returns a copy of x with the given new shape.
ravel (x) returns x as one-dimensional.
concatenate (arrays, axis=0) concatenates the arrays along the specified axis.
repeat (array, repeats, axis = 0) repeat elements of a repeats times along axis. repeats is a sequence of length a.shape[axis] telling how many times to repeat each element.
identity (n) returns the identity matrix of shape n by n.
indices (dimensions, typecode = None) returns an array representing a grid of indices with row-only and column-only variation.
len (x) is defined to be the length of the first dimension of x. This definition, peculiar from the array point of view, is required by the way Python implements slicing. Use size (x) for the total length of x.
size (x, axis = None) is the total size of x, or the length of a particular dimension axis whose index is given. When axis is given the dimension of the result is one less than the dimension of x.
count (x, axis = None) counts the number of (non-masked) elements in the array, or in the array along a certain axis.When axis is given the dimension of the result is one less than the dimension of x.
arange, arrayrange, diagonal, fromfunction, fromstring, ones, and zeros are the same as in Numeric, but return masked arrays.
sum, and product are called the same way as count; the difference is that the result is the sum, product, or average respectively of the unmasked element.
average (x, axis=0, weights=None, returned=0) computes the average value of the non-masked elements of x along the selected axis. If weights is given, it must match the size and shape of x, and the value returned is:
In computing these sums, elements that correspond to those that are masked in x or weights are ignored. If returned, a 2-tuple consisting of the average and the sum of the weights is returned.
allclose (x, y, fill_value = 1, rtol = 1.e-5, atol = 1.e-8) tests whether or not arrays x and y are equal subject to the given relative and absolute tolerances. If fill_value is 1, masked values are considered equal, otherwise they are considered different. The formula used for elements where both x and y have a valid value is:
This means essentially that both elements are small compared to atol or their difference divided by their value is small compared to rtol.
allequal (x, y, fill_value = 1) is similar to allclose, except that exact equality is demanded.
take (a, indices, axis=0) returns a selection of items from a. See the documentation in the Numeric manual.
transpose (a, axes=None) performs a reordering of the axes depending on the tuple of indices axes ; the default is to reverse the order of the axes.
put (a, indices, values) is the opposite of take . The values of the array a at the locations specified in indices are set to the corresponding value of values . The array a must be a contiguous array. The argument indices can be any integer sequence object with values suitable for indexing into the flat form of a . The argument v must be any sequence of values that can be converted to the typecode of a .
Note that the target array a is not required to be one-dimensional. Since it is contiguous and stored in row-major order, the array indices can be treated as indexing a 's elements in storage order.
The wrinkle on this for masked arrays is that if the locations being set by put are masked, the mask is cleared in those locations.
choose (condition, t) has a result shaped like condition. t must be a tuple. Each element of the tuple can be an array, a scalar, or the constant element masked ( See The constant masked ) . Each element of the result is the corresponding element of t[i] where condition has the value i. The result is masked where condition is masked or where the selected element is masked or the selected element of t is the constant masked .
where (condition, x, y) returns an array that is filled (x) where condition is true, filled (y) where the condition is false. One of x or y can be the constant element masked ( See The constant masked ) . The result is masked where condition is masked, where the element selected from x or y is masked, or where x or y itself is the constant masked and it is selected.
innerproduct (a, b) and dot (a, b) work as in Numeric, but missing values don't contribute. The result is always a masked array, possibly of length one, because of the possibility that one or more entries in it may be invalid since all the data contributing to that entry was invalid.
outerproduct (a, b) produces a masked array such that result[i, j] = a[i] * b[j]. The result will be masked where a[i] or b[j] is masked.
compress (condition, x, dimension=-1) compresses out only those valid values where condition is true. Masked values in condition are considered false.
maximum (x, y = None) and minimum (x, y = None) compute the minimum and maximum valid values of x if y is None; with two arguments, they return the element-wise larger or smaller of valid values, and mask the result where either x or y is masked. If both arguments are scalars a scalar is returned.
sort (x, axis=-1, value = None) returns the array x sorted along the given axis, with masked values treated as if they have a sort value of value but locations containing value are masked in the result if x had a mask to start with. Thus if x contains value at a non-masked spot, but has other spots masked, the result may not be what you want.
argsort (x, axis = -1, fill_value = None) is unusual in that it returns a Numeric array, equal to
Numeric.argsort (filled (x, fill_value), axis); this is an array of indices for sorting along a given axis.
The functions get_print_limit () and set_print_limit (n=0) query and set the limit for converting arrays using str() or repr (). If an array is printed that is larger than this, the values are not printed; rather you are informed of the type and size of the array. If n is zero, the standard Numeric conversion functions are used.
When imported, MA sets this limit to 300, and the limit is also made to apply to standard Numeric arrays as well.
This section discusses other classes defined in module MA.
Class MAError inherits from Exception, used to raise exceptions in the MA module. Other exceptions are possible, such as errors from the underlying Numeric module.
A constant named masked, in Module MA, serves several purposes.
Another constant, masked_print_option, controls what happens when masked arrays and the constant masked are printed:
Given a unary array function f (x), masked_unary_function (f, fill = 0, domain = None) is a function which when applied to an argument x returns f applied to the array filled (x, fill), with a mask equal to
mask_or (getmask (x), domain (x)).
The argument domain therefore should be a callable object that returns true where x is not in the domain of f. The following domains are also supplied as members of module MA:
Given a binary array function f (x, y), masked_binary_function (f, fillx=0, filly=0) defines a function whose value at x is f (filled (x, fillx), filled (y, filly)) with a resulting mask of mask_or (getmask (x), getmask (y)). The values fillx and filly must be chosen so that (fillx, filly) is in the domain of f.
In addition, an instance of masked_binary_function has two methods defined upon it:
These methods perform reduction, accumulation, and applying the function in an outer-product-like manner, as discussed in the section Ufuncs have special methods.
This class exists to implement division-related operations. It is the same as masked_binary_function, except that a new second argument is a domain which is used to mask operations that would otherwise cause failure, such as dividing by zero. The functions that are created from this class are divide, remainder (mod), and fmod.
The following domains are available for use as the domain argument:
Suppose we have read a one-dimensional list of elements named x. We also know that if any of the values are 1.e20, they represent missing data. We want to compute the average value of the data and the vector of deviations from average.
>>> x = array([0.,1.,2.,3.,4.])
>>> y = masked_values (x, 1.e20)
Suppose now that we wish to print that same data, but with the missing values replaced by the average value.
We can do numerical operations without worrying about missing values, dividing by zero, square roots of negative numbers, etc.
>>> x=array([1., -1., 3., 4., 5., 6.], mask=[0,0,0,0,1,0])
>>> y=array([1., 2., 0., 4., 5., 6.], mask=[0,0,0,0,0,1])
[ 1.00000000e+00, --, --, 1.00000000e+00, --, --,]
Note that four values in the result are invalid: one from a negative square root, one from a divide by zero, and two more where the two arrays x and y had invalid data. Since the result was of a real type, the print command printed str (filled (sqrt (x/y))).
There are various ways to see the mask. One is to print it directly, the other is to convert to the repr representation, and a third is get the mask itself. Use of getmask(x) is more robust than x.mask(), since it will work (returning None) if x is a Numeric array or list.
[0 ,1 ,2 ,-- ,-- ,5 ,6 ,7 ,8 ,9 ,]
*** Masked array, mask present ***
If we want to print the data with -1's where the elements are masked, we use filled.
Suppose we have an array d and we wish to compute the average of the values in d but ignore any data outside the range -100. to 100.