MissingValues {fMultivar} | R Documentation |
A collection and description of functions
for handling missing values in 'timeSeries'
objects or in objects which can be transformed
into a vector or a two dimensional matrix.
The functions are listed by topic.
removeNA | Removes NAs from a matrix object, |
substituteNA | substitute NAs by zero, the column mean or median, |
interpNA | interpolates NAs using R's "approx" function, |
knnNA | imputes NAs by the "knn"-Algorithm from R's EMV package. |
removeNA(x, ...) substituteNA(x, type = c("zeros", "mean", "median"), ...) interpNA(x, method = c("linear", "before", "after"), ...) knnNA(x, k = max(dim(as.matrix(x))[1]*0.01,2), correlation = FALSE, ...)
correlation |
[knnNA] - a logical value, if TRUE the selection of the neighbours is based on the sample correlation. The neighbours with the highest correlations are selected. |
k |
[knnNA] - the number of neighboors (rows) to estimate the missing values. |
method |
[interpNA] - Specifies the method how to interpolate the matrix column by column. One of the applied vector strings: method="linear" , method="before" or
method="after" .
For the interpolation the function approx is used.
|
type |
[substituteNA] - Three alternative methods are provided to remove NAs from the data: type="zeros" replaces the missing values by zeros,
type="mean" replaces the missing values by the column mean,
type="median" replaces the missing values by the the column
median.
|
x |
a numeric matrix, or any other object which can be transformed
into a matrix through x = as.matrix(x, ...) . If x
is a vector, it will be transformed into a one-dimensional matrix.
|
... |
arguments to be passed to the function as.matrix .
|
Missing Values in Price and Index Series:
Applied to timeSeries
objects the function removeNA
just removes rows with NAs from the series. For an interpolation
of time series points one can use the function interpNA
.
Three different methods of interpolation are offered: "linear"
does a linear interpolation, "before"
uses the previous value,
and "after"
uses the following value. Note, that the
interpolation is done on the index scale and not on the time scale.
The function knnNA
estimates missing values of a timeSeries
object or of a matrix based on a k-th neighbours algorithm. Missing
values can be either -Inf, Inf, NA, or NaN.
Based on the Euclidian distance, the algorithm selects the k-th
nearest rows (that do not contain any missing values) to the one
containing at least one missing value, based on the Euclidian distance
or the sample correlation. Then the missing values are replaced by the
average of the neighbours. Note, that if a row only contains missing
values then the estimation is not possible.
[EMV:knn].
Missing Values in Return Series:
For return series the function substituteNA
may be useful. The
function allows to fill missing values either by method="zeros"
,
the method="mean"
or the method="median"
value of the
appropriate columns.
Raphael Gottardo for the knn
function,
Diethelm Wuertz for the Rmetrics R-port.
Troyanskaya O., Cantor M., Sherlock G., Brown P., Hastie T., Tibshirani R., Botstein D., Altman R.B., (2001); Missing Value Estimation Methods for DNA microarrays Bioinformatics 17, 520–525.
## SOURCE("fMultivar.6B-MissingValues") ## Create a Matrix with NAs: X = matrix(rnorm(100), ncol = 5) # a single NA inside: X[3, 5] = NA # three in a row inside: X[17, 2:4] = c(NA, NA, NA) # three in a column inside: X[13:15, 4] = c(NA, NA, NA) # two at the right border: X[11:12, 5] = c(NA, NA) # one in the lower left corner: X[20, 1] = NA print(X) ## Remove rows with NA's removeNA(X) # Now we have only 12 lines! ## Subsitute NA's by zeros or column mean substituteNA(X, type = "zeros") substituteNA(X, type = "mean") ## Interpolate NA's liearily: interpNA(X, method = "linear") # Note the corner missing value cannot be interpolated! # Take previous values in a column: interpNA(X, method = "before") # Also here, the corner value is excluded ## Interpolate using the knn Algorithm: knnNA(X)