For some cryptic reason I needed a function that calculates function values on sliding windows of a vector. Googling around soon brought me to ‘rollapply’, which when I tested it seems to be a very versatile function. However, I wanted to code my own version just for vector purposes in the hope that it may be somewhat faster.
This is what turned out (wapply for “window apply”):
wapply <- function(x, width, by = NULL, FUN = NULL, ...) { FUN <- match.fun(FUN) if (is.null(by)) by <- width lenX <- length(x) SEQ1 <- seq(1, lenX - width + 1, by = by) SEQ2 <- lapply(SEQ1, function(x) x:(x + width - 1)) OUT <- lapply(SEQ2, function(a) FUN(x[a], ...)) OUT <- base:::simplify2array(OUT, higher = TRUE) return(OUT) }
It is much more restricted than ‘rollapply’ (no padding, left/center/right adjustment etc).
But interestingly, for some setups it is very much faster:
library(zoo)
x <- 1:200000
large window, small slides:
> system.time(RES1 <- rollapply(x, width = 1000, by = 50, FUN = fun))
User System verstrichen
3.71 0.00 3.84
> system.time(RES2 <- wapply(x, width = 1000, by = 50, FUN = fun))
User System verstrichen
1.89 0.00 1.92
> all.equal(RES1, RES2)
[1] TRUE
small window, small slides:
> system.time(RES1 <- rollapply(x, width = 50, by = 50, FUN = fun))
User System verstrichen
2.59 0.00 2.67
> system.time(RES2 <- wapply(x, width = 50, by = 50, FUN = fun))
User System verstrichen
0.86 0.00 0.89
> all.equal(RES1, RES2)
[1] TRUE
small window, large slides:
> system.time(RES1 <- rollapply(x, width = 50, by = 1000, FUN = fun)) User System verstrichen 1.68 0.00 1.77 > system.time(RES2 <- wapply(x, width = 50, by = 1000, FUN = fun)) User System verstrichen 0.06 0.00 0.06 > all.equal(RES1, RES2) [1] TRUE
There is about a 2-3 fold gain in speed for the above two setups but a 35-fold gain in the small window/large slides setup. Interesting…
I noticed that zoo:::rollapply.zoo uses mapply internally, maybe there is some overhead for pure vector calculations…
Cheers,
Andrej
Might want to check the RcppRoll package too. Created because of zoo’s rolling being slow.
Thanks for the notice! Didn’t know of that one…
I had a look at it and it’s blazing fast! However, it doesn’t have a ‘by =’ argument which defines the sliding values,
so the function is always calculated on a +1 sliding window of size n along the vector, which restricts it a bit for some purposes.
Maybe it is easy to implement that in the Rcpp code, a good time to learn it anyway!
Cheers,
Andrej
Using x from the post rollapply(x, 1000, mean) is about 40x faster than wapply(x, 1000, 1, mean) on my machine so you need to be careful about generalizations here.
Yes, you’re right…
I should have mentioned that for “mean”, “median” and “max”, ‘rollapply’ uses internal fast functions such as
zoo:::rollmean.zoo
But for other user defined function setups, I think it is a bit faster [trying to avoid generalization here 😉 ].
Could you please hint how to use wapply with width as vector length > 1? For now I’m stick with mapply to apply different window for different observations. length(x) is equal to length(width). Is it possible to solve better than using mapply?
Regards
just to followup my question with the answer:
http://stackoverflow.com/questions/21368245/performance-of-rolling-window-functions-in-r/
Hi MusX,
thanks for the question and your answer on SO! I think a good hybrid version would be one windowing in C++ but where you can define the function in R. It should be possible since the minpack.lm (nonlinear regression) works this way by defining the minimization function in R and minimizing in Fortran…
Anyone give it a go?
Cheers,
Andrej
hi I am looking for a version of rollapply which allows to apply FUN only at pre-set and not regularly spaced intervals. For instance, run it at for 20, 50, 55, 80, 200
is there anything available?
tx
Not that I know of…
But it’s easy in base R! Split your vector by a factor defining the intervals and apply the function on the splits:
x <- rnorm(100)
CUTS <- rep(1:4, c(10, 20, 30, 40))
SPLIT <- split(x, CUTS)
sapply(SPLIT, mean)
Cheers, Andrej.
Hey Andrej,
Is it possible to edit this function to include compatibility with functions that use multiple columns? Similarly to the by.column=FALSE argument in rollapply?
Hi James,
I looked at the code of ‘rollapply’ and it does nothing special (do.call(merge…)) in case of a data.frame and by.column = TRUE.
So doing a simple apply(yourdata, 2, function(x) wapply(x, …))
should do the same.
Cheers,
Andrej
So useful….looking forwards to coming back. http://bit.ly/2f0xJ92