wapply: A faster (but less functional) ‘rollapply’ for vector setups

April 23, 2013

For some cryptic reason I needed a function that calculates function values on sliding windows of a vector. Googling around soon brought me to ‘rollapply’, which when I tested it seems to be a very versatile function. However, I wanted to code my own version just for vector purposes in the hope that it may be somewhat faster.
This is what turned out (wapply for “window apply”):

wapply <- function(x, width, by = NULL, FUN = NULL, ...)
{
FUN <- match.fun(FUN)
if (is.null(by)) by <- width

lenX <- length(x)
SEQ1 <- seq(1, lenX - width + 1, by = by)
SEQ2 <- lapply(SEQ1, function(x) x:(x + width - 1))

OUT <- lapply(SEQ2, function(a) FUN(x[a], ...))
OUT <- base:::simplify2array(OUT, higher = TRUE)
return(OUT)
}

It is much more restricted than ‘rollapply’ (no padding, left/center/right adjustment etc).
But interestingly, for some setups it is very much faster:

library(zoo)
x <- 1:200000

large window, small slides:

> system.time(RES1 <- rollapply(x, width = 1000, by = 50, FUN = fun))
       User      System verstrichen 
       3.71        0.00        3.84 
> system.time(RES2 <- wapply(x, width = 1000, by = 50, FUN = fun))
       User      System verstrichen 
       1.89        0.00        1.92 
> all.equal(RES1, RES2)
[1] TRUE

small window, small slides:

> system.time(RES1 <- rollapply(x, width = 50, by = 50, FUN = fun))
       User      System verstrichen 
       2.59        0.00        2.67 
> system.time(RES2 <- wapply(x, width = 50, by = 50, FUN = fun))
       User      System verstrichen 
       0.86        0.00        0.89 
> all.equal(RES1, RES2)
[1] TRUE

small window, large slides:

> system.time(RES1 <- rollapply(x, width = 50, by = 1000, FUN = fun))
       User      System verstrichen 
       1.68        0.00        1.77 
> system.time(RES2 <- wapply(x, width = 50, by = 1000, FUN = fun))
       User      System verstrichen 
       0.06        0.00        0.06 
> all.equal(RES1, RES2)
[1] TRUE

There is about a 2-3 fold gain in speed for the above two setups but a 35-fold gain in the small window/large slides setup. Interesting…
I noticed that zoo:::rollapply.zoo uses mapply internally, maybe there is some overhead for pure vector calculations…

Cheers,
Andrej


Follow

Get every new post delivered to your Inbox.