wapply: A faster (but less functional) ‘rollapply’ for vector setups

For some cryptic reason I needed a function that calculates function values on sliding windows of a vector. Googling around soon brought me to ‘rollapply’, which when I tested it seems to be a very versatile function. However, I wanted to code my own version just for vector purposes in the hope that it may be somewhat faster.
This is what turned out (wapply for “window apply”):

wapply <- function(x, width, by = NULL, FUN = NULL, ...)
{
FUN <- match.fun(FUN)
if (is.null(by)) by <- width

lenX <- length(x)
SEQ1 <- seq(1, lenX - width + 1, by = by)
SEQ2 <- lapply(SEQ1, function(x) x:(x + width - 1))

OUT <- lapply(SEQ2, function(a) FUN(x[a], ...))
OUT <- base:::simplify2array(OUT, higher = TRUE)
return(OUT)
}

It is much more restricted than ‘rollapply’ (no padding, left/center/right adjustment etc).
But interestingly, for some setups it is very much faster:

library(zoo)
x <- 1:200000

large window, small slides:

> system.time(RES1 <- rollapply(x, width = 1000, by = 50, FUN = fun))
       User      System verstrichen 
       3.71        0.00        3.84 
> system.time(RES2 <- wapply(x, width = 1000, by = 50, FUN = fun))
       User      System verstrichen 
       1.89        0.00        1.92 
> all.equal(RES1, RES2)
[1] TRUE

small window, small slides:

> system.time(RES1 <- rollapply(x, width = 50, by = 50, FUN = fun))
       User      System verstrichen 
       2.59        0.00        2.67 
> system.time(RES2 <- wapply(x, width = 50, by = 50, FUN = fun))
       User      System verstrichen 
       0.86        0.00        0.89 
> all.equal(RES1, RES2)
[1] TRUE

small window, large slides:

> system.time(RES1 <- rollapply(x, width = 50, by = 1000, FUN = fun))
       User      System verstrichen 
       1.68        0.00        1.77 
> system.time(RES2 <- wapply(x, width = 50, by = 1000, FUN = fun))
       User      System verstrichen 
       0.06        0.00        0.06 
> all.equal(RES1, RES2)
[1] TRUE

There is about a 2-3 fold gain in speed for the above two setups but a 35-fold gain in the small window/large slides setup. Interesting…
I noticed that zoo:::rollapply.zoo uses mapply internally, maybe there is some overhead for pure vector calculations…

Cheers,
Andrej

12 Responses to wapply: A faster (but less functional) ‘rollapply’ for vector setups

  1. Hansi says:

    Might want to check the RcppRoll package too. Created because of zoo’s rolling being slow.

    • anspiess says:

      Thanks for the notice! Didn’t know of that one…
      I had a look at it and it’s blazing fast! However, it doesn’t have a ‘by =’ argument which defines the sliding values,
      so the function is always calculated on a +1 sliding window of size n along the vector, which restricts it a bit for some purposes.
      Maybe it is easy to implement that in the Rcpp code, a good time to learn it anyway!

      Cheers,
      Andrej

  2. G. Grothendieck says:

    Using x from the post rollapply(x, 1000, mean) is about 40x faster than wapply(x, 1000, 1, mean) on my machine so you need to be careful about generalizations here.

    • anspiess says:

      Yes, you’re right…
      I should have mentioned that for “mean”, “median” and “max”, ‘rollapply’ uses internal fast functions such as
      zoo:::rollmean.zoo
      But for other user defined function setups, I think it is a bit faster [trying to avoid generalization here 😉 ].

  3. MusX says:

    Could you please hint how to use wapply with width as vector length > 1? For now I’m stick with mapply to apply different window for different observations. length(x) is equal to length(width). Is it possible to solve better than using mapply?
    Regards

  4. John says:

    hi I am looking for a version of rollapply which allows to apply FUN only at pre-set and not regularly spaced intervals. For instance, run it at for 20, 50, 55, 80, 200
    is there anything available?
    tx

    • anspiess says:

      Not that I know of…
      But it’s easy in base R! Split your vector by a factor defining the intervals and apply the function on the splits:

      x <- rnorm(100)
      CUTS <- rep(1:4, c(10, 20, 30, 40))
      SPLIT <- split(x, CUTS)
      sapply(SPLIT, mean)

      Cheers, Andrej.

  5. James says:

    Hey Andrej,

    Is it possible to edit this function to include compatibility with functions that use multiple columns? Similarly to the by.column=FALSE argument in rollapply?

    • anspiess says:

      Hi James,

      I looked at the code of ‘rollapply’ and it does nothing special (do.call(merge…)) in case of a data.frame and by.column = TRUE.
      So doing a simple apply(yourdata, 2, function(x) wapply(x, …))
      should do the same.

      Cheers,
      Andrej

  6. So useful….looking forwards to coming back. http://bit.ly/2f0xJ92

Leave a comment