MissingValues {fMultivar}R Documentation

Handling Missing Values

Description

A collection and description of functions for handling missing values in 'timeSeries' objects or in objects which can be transformed into a vector or a two dimensional matrix.

The functions are listed by topic.

removeNA Removes NAs from a matrix object,
substituteNA substitute NAs by zero, the column mean or median,
interpNA interpolates NAs using R's "approx" function,
knnNA imputes NAs by the "knn"-Algorithm from R's EMV package.

Usage

removeNA(x, ...)
substituteNA(x, type = c("zeros", "mean", "median"), ...)
interpNA(x, method = c("linear", "before", "after"), ...)
knnNA(x, k = max(dim(as.matrix(x))[1]*0.01,2), correlation = FALSE, ...)

Arguments

correlation [knnNA] -
a logical value, if TRUE the selection of the neighbours is based on the sample correlation. The neighbours with the highest correlations are selected.
k [knnNA] -
the number of neighboors (rows) to estimate the missing values.
method [interpNA] -
Specifies the method how to interpolate the matrix column by column. One of the applied vector strings: method="linear", method="before" or method="after". For the interpolation the function approx is used.
type [substituteNA] -
Three alternative methods are provided to remove NAs from the data: type="zeros" replaces the missing values by zeros, type="mean" replaces the missing values by the column mean, type="median" replaces the missing values by the the column median.
x a numeric matrix, or any other object which can be transformed into a matrix through x = as.matrix(x, ...). If x is a vector, it will be transformed into a one-dimensional matrix.
... arguments to be passed to the function as.matrix.

Details

Missing Values in Price and Index Series:

Applied to timeSeries objects the function removeNA just removes rows with NAs from the series. For an interpolation of time series points one can use the function interpNA. Three different methods of interpolation are offered: "linear" does a linear interpolation, "before" uses the previous value, and "after" uses the following value. Note, that the interpolation is done on the index scale and not on the time scale.

The function knnNA estimates missing values of a timeSeries object or of a matrix based on a k-th neighbours algorithm. Missing values can be either -Inf, Inf, NA, or NaN. Based on the Euclidian distance, the algorithm selects the k-th nearest rows (that do not contain any missing values) to the one containing at least one missing value, based on the Euclidian distance or the sample correlation. Then the missing values are replaced by the average of the neighbours. Note, that if a row only contains missing values then the estimation is not possible.
[EMV:knn].

Missing Values in Return Series:

For return series the function substituteNA may be useful. The function allows to fill missing values either by method="zeros", the method="mean" or the method="median" value of the appropriate columns.

Author(s)

Raphael Gottardo for the knn function,
Diethelm Wuertz for the Rmetrics R-port.

References

Troyanskaya O., Cantor M., Sherlock G., Brown P., Hastie T., Tibshirani R., Botstein D., Altman R.B., (2001); Missing Value Estimation Methods for DNA microarrays Bioinformatics 17, 520–525.

Examples

## SOURCE("fMultivar.6B-MissingValues")

## Create a Matrix with NAs:
   X = matrix(rnorm(100), ncol = 5)
   # a single NA inside:
   X[3, 5] = NA
   # three in a row inside:
   X[17, 2:4] = c(NA, NA, NA)
   # three in a column inside:
   X[13:15, 4] = c(NA, NA, NA)
   # two at the right border:
   X[11:12, 5] = c(NA, NA)
   # one in the lower left corner:
   X[20, 1] = NA
   print(X)
     
## Remove rows with NA's
   removeNA(X)
   # Now we have only 12 lines!
   
## Subsitute NA's by zeros or column mean
   substituteNA(X, type = "zeros")
   substituteNA(X, type = "mean")
   
## Interpolate NA's liearily:
   interpNA(X, method = "linear")
   # Note the corner missing value cannot be interpolated!
   # Take previous values in a column:
   interpNA(X, method = "before")
   # Also here, the corner value is excluded
   
## Interpolate using the knn Algorithm:
   knnNA(X)

[Package fMultivar version 240.10068 Index]