## Introducing ‘propagate’

August 31, 2013

With this post, I want to introduce the new ‘propagate’ package on CRAN.
It has one single purpose: propagation of uncertainties (“error propagation”). There is already one package on CRAN available for this task, named ‘metRology’ (http://cran.r-project.org/web/packages/metRology/index.html).
‘propagate’ has some additional functionality that some may find useful. The most important functions are:

* propagate: A general function for the calculation of uncertainty propagation by first-/second-order Taylor expansion and Monte Carlo simulation including covariances. Input data can be any symbolic/numeric differentiable expression and data based on replicates, summaries (mean & s.d.) or sampled from a distribution. Uncertainty propagation is based completely on matrix calculus accounting for full covariance structure. Monte Carlo simulation is conducted using multivariate normal or t-distributions with covariance structure. The second-order Taylor approximation is the new aspect, because it is not based on the assumption of linearity around $f(x)$ but uses a second-order polynomial to account for nonlinearities, making heavy use of numerical or symbolical Hessian matrices. Interestingly, the second-order approximation gives results quite similar to the MC simulations!
* plot.propagate: Graphing error propagation with the histograms of the MC simulations and MC/Taylor-based confidence intervals.
* predictNLS: The propagate function is used to calculate the propagated error to the fitted values of a nonlinear model of type nls or nlsLM. Please refer to my post here: https://rmazing.wordpress.com/2013/08/26/predictnls-part-2-taylor-approximation-confidence-intervals-for-nls-models/.
* makeGrad, makeHess, numGrad, numHess are functions to create symbolical or numerical gradient and Hessian matrices from an expression containing first/second-order partial derivatives. These can then be evaluated in an environment with evalDerivs.
* fitDistr: This function fits 21 different continuous distributions by (weighted) NLS to the histogram or kernel density of the Monte Carlo simulation results as obtained by propagate or any other vector containing large-scale observations. Finally, the fits are sorted by ascending AIC.
* random samplers for 15 continuous distributions under one hood, some of them previously unavailable:
Skewed-normal distribution, Generalized normal distributionm, Scaled and shifted t-distribution, Gumbel distribution, Johnson SU distribution, Johnson SB distribution, 3P Weibull distribution, 4P Beta distribution, Triangular distribution, Trapezoidal distribution, Curvilinear Trapezoidal distribution, Generalized trapezoidal distribution, Laplacian distribution, Arcsine distribution, von Mises distribution.
Most of them sample from the inverse cumulative distribution function, but 11, 12 and 15 use a vectorized version of “Rejection Sampling” giving roughly 100000 random numbers/s.

An example (without covariance for simplicity): $\mu_a = 5, \sigma_a = 0.1, \mu_b = 10, \sigma_b = 0.1, \mu_x = 1, \sigma_x = 0.1$
$f(x) = a^{bx}$:
 >DAT <- data.frame(a = c(5, 0.1), b = c(10, 0.1), x = c(1, 0.1)) >EXPR <- expression(a^b*x) >res <- propagate(EXPR, DAT)

 Results from error propagation: Mean.1 Mean.2 sd.1 sd.2 2.5% 97.5% 9765625 10067885 2690477 2739850 4677411 15414333 

Results from Monte Carlo simulation: Mean sd Median MAD 2.5% 97.5% 10072640 2826027 9713207 2657217 5635222 16594123 

The plot reveals the resulting distribution obtained from Monte Carlo simulation:
 >plot(res) 

Seems like a skewed distributions. We can now use fitDistr to find out which comes closest:
 > fitDistr(res$resSIM) Fitting Normal distribution...Done. Fitting Skewed-normal distribution...Done. Fitting Generalized normal distribution...Done. Fitting Log-normal distribution...Done. Fitting Scaled/shifted t- distribution...Done. Fitting Logistic distribution...Done. Fitting Uniform distribution...Done. Fitting Triangular distribution...Done. Fitting Trapezoidal distribution...Done. Fitting Curvilinear Trapezoidal distribution...Done. Fitting Generalized Trapezoidal distribution...Done. Fitting Gamma distribution...Done. Fitting Cauchy distribution...Done. Fitting Laplace distribution...Done. Fitting Gumbel distribution...Done. Fitting Johnson SU distribution...........10.........20.........30.........40.........50 .........60.........70.........80.Done. Fitting Johnson SB distribution...........10.........20.........30.........40.........50 .........60.........70.........80.Done. Fitting 3P Weibull distribution...........10.........20.......Done. Fitting 4P Beta distribution...Done. Fitting Arcsine distribution...Done. Fitting von Mises distribution...Done.$aic Distribution AIC 4 Log-normal -4917.823 16 Johnson SU -4861.960 15 Gumbel -4595.917 19 4P Beta -4509.716 12 Gamma -4469.780 9 Trapezoidal -4340.195 1 Normal -4284.706 5 Scaled/shifted t- -4283.070 6 Logistic -4266.171 3 Generalized normal -4264.102 14 Laplace -4144.870 13 Cauchy -4099.405 2 Skewed-normal -4060.936 11 Generalized Trapezoidal -4032.484 10 Curvilinear Trapezoidal -3996.495 8 Triangular -3970.993 7 Uniform -3933.513 20 Arcsine -3793.793 18 3P Weibull -3783.041 21 von Mises -3715.034 17 Johnson SB -3711.034

 Log-normal wins, which makes perfect sense after using an exponentiation function... Have fun with the package. Comments welcome! Cheers, Andrej 
 17 Comments | General, R Internals | Tagged: confidence interval, first-order, fitting, Monte Carlo, nls, nonlinear, predict, second-order, Taylor approximation | Permalink Posted by anspiess 
 predictNLS (Part 2, Taylor approximation): confidence intervals for ‘nls’ models August 26, 2013 Initial Remark: Reload this page if formulas don’t display well! As promised, here is the second part on how to obtain confidence intervals for fitted values obtained from nonlinear regression via nls or nlsLM (package ‘minpack.lm’). I covered a Monte Carlo approach in https://rmazing.wordpress.com/2013/08/14/predictnls-part-1-monte-carlo-simulation-confidence-intervals-for-nls-models/, but here we will take a different approach: First- and second-order Taylor approximation around $f(x)$: $f(x) \approx f(a)+\frac {f'(a)}{1!} (x-a)+ \frac{f''(a)}{2!} (x-a)^2$. Using Taylor approximation for calculating confidence intervals is a matter of propagating the uncertainties of the parameter estimates obtained from vcov(model) to the fitted value. When using first-order Taylor approximation, this is also known as the “Delta method”. Those familiar with error propagation will know the formula $\displaystyle \sum_{i=1}^2 \rm{j_i}^2 \sigma_i^2 + 2\sum_{i=1\atop i \neq k}^n\sum_{k=1\atop k \neq i}^n \rm{j_i j_k} \sigma_{ik}$. Heavily underused is the matrix notation of the famous formula above, for which a good derivation can be found at http://www.nada.kth.se/~kai-a/papers/arrasTR-9801-R3.pdf: $\sigma_y^2 = \nabla_x\mathbf{C}_x\nabla_x^T$, where $\nabla_x$ is the gradient vector of first-order partial derivatives and $\mathbf{C}_x$ is the variance-covariance matrix. This formula corresponds to the first-order Taylor approximation. Now the problem with first-order approximations is that they assume linearity around $f(x)$. Using the “Delta method” for nonlinear confidence intervals in R has been discussed in http://thebiobucket.blogspot.de/2011/04/fit-sigmoid-curve-with-confidence.html or http://finzi.psych.upenn.edu/R/Rhelp02a/archive/42932.html. For highly nonlinear functions we need (at least) a second-order polynomial around $f(x)$ to realistically estimate the surrounding interval (red is linear approximation, blue is second-order polynomial on a sine function around $x = 5$): Interestingly, there are also matrix-like notations for the second-order mean and variance in the literature (see http://dml.cz/dmlcz/141418 or http://iopscience.iop.org/0026-1394/44/3/012/pdf/0026-1394_44_3_012.pdf): Second-order mean: $\rm{E}[y] = f(\bar{x}_i) + \frac{1}{2}\rm{tr}(\mathbf{H}_{xx}\mathbf{C}_x)$. Second-order variance: $\sigma_y^2 = \nabla_x\mathbf{C}_x\nabla_x^T + \frac{1}{2}\rm{tr}(\mathbf{H}_{xx}\mathbf{C}_x\mathbf{H}_{xx}\mathbf{C}_x)$, where $\mathbf{H}_{xx}$ is the Hessian matrix of second-order partial derivatives and $tr(\cdot)$ is the matrix trace (sum of diagonals). Enough theory, for wrapping this all up we need three utility functions: 1) numGrad for calculating numerical first-order partial derivatives. numGrad <- function(expr, envir = .GlobalEnv) { f0 <- eval(expr, envir) vars <- all.vars(expr) p <- length(vars) x <- sapply(vars, function(a) get(a, envir)) eps <- 1e-04 d <- 0.1 r <- 4 v <- 2 zero.tol <- sqrt(.Machine$double.eps/7e-07) h0 <- abs(d * x) + eps * (abs(x) < zero.tol) D <- matrix(0, length(f0), p) Daprox <- matrix(0, length(f0), r) for (i in 1:p) { h <- h0 for (k in 1:r) { x1 <- x2 <- x x1 <- x1 + (i == (1:p)) * h f1 <- eval(expr, as.list(x1)) x2 <- x2 - (i == (1:p)) * h f2 <- eval(expr, envir = as.list(x2)) Daprox[, k] <- (f1 - f2)/(2 * h[i]) h <- h/v } for (m in 1:(r - 1)) for (k in 1:(r - m)) { Daprox[, k] <- (Daprox[, k + 1] * (4^m) - Daprox[, k])/(4^m - 1) } D[, i] <- Daprox[, 1] } return(D) } 2) numHess for calculating numerical second-order partial derivatives. numHess <- function(expr, envir = .GlobalEnv) { f0 <- eval(expr, envir) vars <- all.vars(expr) p <- length(vars) x <- sapply(vars, function(a) get(a, envir)) eps <- 1e-04 d <- 0.1 r <- 4 v <- 2 zero.tol <- sqrt(.Machine$double.eps/7e-07) h0 <- abs(d * x) + eps * (abs(x) < zero.tol) Daprox <- matrix(0, length(f0), r) Hdiag <- matrix(0, length(f0), p) Haprox <- matrix(0, length(f0), r) H <- matrix(NA, p, p) for (i in 1:p) { h <- h0 for (k in 1:r) { x1 <- x2 <- x x1 <- x1 + (i == (1:p)) * h f1 <- eval(expr, as.list(x1)) x2 <- x2 - (i == (1:p)) * h f2 <- eval(expr, envir = as.list(x2)) Haprox[, k] <- (f1 - 2 * f0 + f2)/h[i]^2 h <- h/v } for (m in 1:(r - 1)) for (k in 1:(r - m)) { Haprox[, k] <- (Haprox[, k + 1] * (4^m) - Haprox[, k])/(4^m - 1) } Hdiag[, i] <- Haprox[, 1] } for (i in 1:p) { for (j in 1:i) { if (i == j) { H[i, j] <- Hdiag[, i] } else { h <- h0 for (k in 1:r) { x1 <- x2 <- x x1 <- x1 + (i == (1:p)) * h + (j == (1:p)) * h f1 <- eval(expr, as.list(x1)) x2 <- x2 - (i == (1:p)) * h - (j == (1:p)) * h f2 <- eval(expr, envir = as.list(x2)) Daprox[, k] <- (f1 - 2 * f0 + f2 - Hdiag[, i] * h[i]^2 - Hdiag[, j] * h[j]^2)/(2 * h[i] * h[j]) h <- h/v } for (m in 1:(r - 1)) for (k in 1:(r - m)) { Daprox[, k] <- (Daprox[, k + 1] * (4^m) - Daprox[, k])/(4^m - 1) } H[i, j] <- H[j, i] <- Daprox[, 1] } } } return(H) } And a small function for the matrix trace: tr <- function(mat) sum(diag(mat), na.rm = TRUE) 1) and 2) are modified versions of the genD function in the “numDeriv” package that can handle expressions. Now we need the predictNLS function that wraps it all up: predictNLS <- function( object, newdata, interval = c("none", "confidence", "prediction"), level = 0.95, ... ) { require(MASS, quietly = TRUE) interval <- match.arg(interval) ## get right-hand side of formula RHS <- as.list(object$call$formula)[[3]] EXPR <- as.expression(RHS) ## all variables in model VARS <- all.vars(EXPR) ## coefficients COEF <- coef(object) ## extract predictor variable predNAME <- setdiff(VARS, names(COEF)) ## take fitted values, if 'newdata' is missing if (missing(newdata)) { newdata <- eval(object$data)[predNAME] colnames(newdata) <- predNAME } ## check that 'newdata' has same name as predVAR if (names(newdata)[1] != predNAME) stop("newdata should have name '", predNAME, "'!") ## get parameter coefficients COEF <- coef(object) ## get variance-covariance matrix VCOV <- vcov(object) ## augment variance-covariance matrix for 'mvrnorm' ## by adding a column/row for 'error in x' NCOL <- ncol(VCOV) ADD1 <- c(rep(0, NCOL)) ADD1 <- matrix(ADD1, ncol = 1) colnames(ADD1) <- predNAME VCOV <- cbind(VCOV, ADD1) ADD2 <- c(rep(0, NCOL + 1)) ADD2 <- matrix(ADD2, nrow = 1) rownames(ADD2) <- predNAME VCOV <- rbind(VCOV, ADD2) NR <- nrow(newdata) respVEC <- numeric(NR) seVEC <- numeric(NR) varPLACE <- ncol(VCOV) outMAT <- NULL ## define counter function counter <- function (i) { if (i%%10 == 0) cat(i) else cat(".") if (i%%50 == 0) cat("\n") flush.console() } ## calculate residual variance r <- residuals(object) w <- weights(object) rss <- sum(if (is.null(w)) r^2 else r^2 * w) df <- df.residual(object) res.var <- rss/df ## iterate over all entries in 'newdata' as in usual 'predict.' functions for (i in 1:NR) { counter(i) ## get predictor values and optional errors predVAL <- newdata[i, 1] if (ncol(newdata) == 2) predERROR <- newdata[i, 2] else predERROR <- 0 names(predVAL) <- predNAME names(predERROR) <- predNAME ## create mean vector meanVAL <- c(COEF, predVAL) ## create augmented variance-covariance matrix ## by putting error^2 in lower-right position of VCOV newVCOV <- VCOV newVCOV[varPLACE, varPLACE] <- predERROR^2 SIGMA <- newVCOV ## first-order mean: eval(EXPR), first-order variance: G.S.t(G) MEAN1 <- try(eval(EXPR, envir = as.list(meanVAL)), silent = TRUE) if (inherits(MEAN1, "try-error")) stop("There was an error in evaluating the first-order mean!") GRAD <- try(numGrad(EXPR, as.list(meanVAL)), silent = TRUE) if (inherits(GRAD, "try-error")) stop("There was an error in creating the numeric gradient!") VAR1 <- GRAD %*% SIGMA %*% matrix(GRAD) ## second-order mean: firstMEAN + 0.5 * tr(H.S), ## second-order variance: firstVAR + 0.5 * tr(H.S.H.S) HESS <- try(numHess(EXPR, as.list(meanVAL)), silent = TRUE) if (inherits(HESS, "try-error")) stop("There was an error in creating the numeric Hessian!") valMEAN2 <- 0.5 * tr(HESS %*% SIGMA) valVAR2 <- 0.5 * tr(HESS %*% SIGMA %*% HESS %*% SIGMA) MEAN2 <- MEAN1 + valMEAN2 VAR2 <- VAR1 + valVAR2 ## confidence or prediction interval if (interval != "none") { tfrac <- abs(qt((1 - level)/2, df)) INTERVAL <- tfrac * switch(interval, confidence = sqrt(VAR2), prediction = sqrt(VAR2 + res.var)) LOWER <- MEAN2 - INTERVAL UPPER <- MEAN2 + INTERVAL names(LOWER) <- paste((1 - level)/2 * 100, "%", sep = "") names(UPPER) <- paste((1 - (1- level)/2) * 100, "%", sep = "") } else { LOWER <- NULL UPPER <- NULL } RES <- c(mu.1 = MEAN1, mu.2 = MEAN2, sd.1 = sqrt(VAR1), sd.2 = sqrt(VAR2), LOWER, UPPER) outMAT <- rbind(outMAT, RES) } cat("\n") rownames(outMAT) <- NULL return(outMAT) } With all functions at hand, we can now got through the same example as used in the Monte Carlo post: DNase1 <- subset(DNase, Run == 1) fm1DNase1 <- nls(density ~ SSlogis(log(conc), Asym, xmid, scal), DNase1) > predictNLS(fm1DNase1, newdata = data.frame(conc = 5), interval = "confidence") . mu.1 mu.2 sd.1 sd.2 2.5% 97.5% [1,] 1.243631 1.243288 0.03620415 0.03620833 1.165064 1.321511 The errors/confidence intervals are larger than with the MC approch (who knows why?) but it is very interesting to see how close the second-order corrected mean (1.243288) comes to the mean of the simulated values from the Monte Carlo approach (1.243293)! The two approach (MC/Taylor) will be found in the predictNLS function that will be part of the “propagate” package in a few days at CRAN… Cheers, Andrej 12 Comments | General, R Internals | Tagged: confidence interval, first-order, fitting, Monte Carlo, nls, nonlinear, predict, second-order, Taylor approximation | Permalink Posted by anspiess predictNLS (Part 1, Monte Carlo simulation): confidence intervals for ‘nls’ models August 14, 2013 Those that do a lot of nonlinear fitting with the nls function may have noticed that predict.nls does not have a way to calculate a confidence interval for the fitted value. Using confint you can obtain the error of the fit parameters, but how about the error in fitted values? ?predict.nls says: “At present se.fit and interval are ignored.” What a pity… This is largely to the fact that confidence intervals for nonlinear fits are not easily calculated and under some debate, see http://r.789695.n4.nabble.com/plotting-confidence-bands-from-predict-nls-td3505012.html or http://thr3ads.net/r-help/2011/05/1053390-plotting-confidence-bands-from-predict.nls. In principle, since calculating the error in the fitted values is a matter of “error propagation”, two different approaches can be used: 1) Error propagation using approximation by (first-order) Taylor expansion around $f(x)$, 2) Error propagation using Monte Carlo simulation. Topic 1) will be subject of my next post, today we will stick with the MC approach. When calculating the error in the fitted values, we need to propagate the error of all variables, i.e. the error in all predictor variables $x_m$ and the error of the fit parameters $\beta$, to the response $y = f(x_m, \beta)$. Often (as in the ‘Examples’ section of nls), there is no error in the $x_m$-values. The errors of the fit parameters $\beta$ are obtained, together with their correlations, in the variance-covariance matrix $\Sigma$ from vcov(object). A Monte Carlo approach to nonlinear error propagation does the following: 1) Use as input $\mu_m$ and $\sigma_m^2$ of all predictor variables and the vcov matrix $\Sigma$ of the fit parameters $\beta$. 2) For each variable $m$, we create $n$ samples from a multivariate normal distribution using the variance-covariance matrix: $x_{m, n} \sim \mathcal{N}(\mu, \Sigma)$. 3) We evaluate the function on each simulated variable: $y_n = f(x_{m, n}, \beta)$ 4) We calculate statistics (mean, s.d., median, mad) and quantile-based confidence intervals on the vector $y_n$. This is exactly what the following function does: It takes an nls object, extracts the variables/parameter values/parameter variance-covariance matrix, creates an “augmented” covariance matrix (with the variance/covariance values from the parameters and predictor variables included, the latter often being zero), simulates from a multivariate normal distribution (using mvrnorm of the ‘MASS’ package), evaluates the function (object$call$formula) on the values and finally collects statistics. Here we go: predictNLS <- function( object, newdata, level = 0.95, nsim = 10000, ... ) { require(MASS, quietly = TRUE) ## get right-hand side of formula RHS <- as.list(object$call$formula)[[3]] EXPR <- as.expression(RHS) ## all variables in model VARS <- all.vars(EXPR) ## coefficients COEF <- coef(object) ## extract predictor variable predNAME <- setdiff(VARS, names(COEF)) ## take fitted values, if 'newdata' is missing if (missing(newdata)) { newdata <- eval(object$data)[predNAME] colnames(newdata) <- predNAME } ## check that 'newdata' has same name as predVAR if (names(newdata)[1] != predNAME) stop("newdata should have name '", predNAME, "'!") ## get parameter coefficients COEF <- coef(object) ## get variance-covariance matrix VCOV <- vcov(object) ## augment variance-covariance matrix for 'mvrnorm' ## by adding a column/row for 'error in x' NCOL <- ncol(VCOV) ADD1 <- c(rep(0, NCOL)) ADD1 <- matrix(ADD1, ncol = 1) colnames(ADD1) <- predNAME VCOV <- cbind(VCOV, ADD1) ADD2 <- c(rep(0, NCOL + 1)) ADD2 <- matrix(ADD2, nrow = 1) rownames(ADD2) <- predNAME VCOV <- rbind(VCOV, ADD2) ## iterate over all entries in 'newdata' as in usual 'predict.' functions NR <- nrow(newdata) respVEC <- numeric(NR) seVEC <- numeric(NR) varPLACE <- ncol(VCOV) ## define counter function counter <- function (i) { if (i%%10 == 0) cat(i) else cat(".") if (i%%50 == 0) cat("\n") flush.console() } outMAT <- NULL for (i in 1:NR) { counter(i) ## get predictor values and optional errors predVAL <- newdata[i, 1] if (ncol(newdata) == 2) predERROR <- newdata[i, 2] else predERROR <- 0 names(predVAL) <- predNAME names(predERROR) <- predNAME ## create mean vector for 'mvrnorm' MU <- c(COEF, predVAL) ## create variance-covariance matrix for 'mvrnorm' ## by putting error^2 in lower-right position of VCOV newVCOV <- VCOV newVCOV[varPLACE, varPLACE] <- predERROR^2 ## create MC simulation matrix simMAT <- mvrnorm(n = nsim, mu = MU, Sigma = newVCOV, empirical = TRUE) ## evaluate expression on rows of simMAT EVAL <- try(eval(EXPR, envir = as.data.frame(simMAT)), silent = TRUE) if (inherits(EVAL, "try-error")) stop("There was an error evaluating the simulations!") ## collect statistics PRED <- data.frame(predVAL) colnames(PRED) <- predNAME FITTED <- predict(object, newdata = data.frame(PRED)) MEAN.sim <- mean(EVAL, na.rm = TRUE) SD.sim <- sd(EVAL, na.rm = TRUE) MEDIAN.sim <- median(EVAL, na.rm = TRUE) MAD.sim <- mad(EVAL, na.rm = TRUE) QUANT <- quantile(EVAL, c((1 - level)/2, level + (1 - level)/2)) RES <- c(FITTED, MEAN.sim, SD.sim, MEDIAN.sim, MAD.sim, QUANT[1], QUANT[2]) outMAT <- rbind(outMAT, RES) } colnames(outMAT) <- c("fit", "mean", "sd", "median", "mad", names(QUANT[1]), names(QUANT[2])) rownames(outMAT) <- NULL cat("\n") return(outMAT) } The input is an ‘nls’ object, a data.frame ‘newdata’ of values to be predicted with the value $x_{new}$ in the first column and (optional) “errors-in-x” (as $\sigma$) in the second column. The number of simulations can be tweaked with nsim as well as the alpha-level for the confidence interval. The output is $f(x_{new}, \beta)$ (fitted value), $\mu(y_n)$ (mean of simulation), $\sigma(y_n)$ (s.d. of simulation), $median(y_n)$ (median of simulation), $mad(y_n)$ (mad of simulation) and the lower/upper confidence interval. Ok, let’s go to it (taken from the ‘?nls’ documentation): DNase1 <- subset(DNase, Run == 1) fm1DNase1 <- nls(density ~ SSlogis(log(conc), Asym, xmid, scal), DNase1) ## usual predict.nls has no confidence intervals implemented predict(fm1DNase1, newdata = data.frame(conc = 5), interval = "confidence") [1] 1.243631 attr(,"gradient") Asym xmid scal [1,] 0.5302925 -0.5608912 -0.06804642 In the next post we will see how to use the gradient attribute to calculate a first-order Taylor expansion around $f(x)$… However, predictNLS gives us the error and confidence interval at $x = 5$: predictNLS(fm1DNase1, newdata = data.frame(conc = 5)) . fit mean sd median mad 2.5% 97.5% [1,] 1.243631 1.243293 0.009462893 1.243378 0.009637439 1.224608 1.261575 Interesting to see, how close the mean of the simulation comes to the actual fitted value… We could also add some error in $x$ to propagate to $y$: > predictNLS(fm1DNase1, newdata = data.frame(conc = 5, error = 0.1)) . fit mean sd median mad 2.5% 97.5% [1,] 1.243631 1.243174 0.01467673 1.243162 0.01488567 1.214252 1.272103 Have fun. If anyone know how to calculate a “prediction interval” (maybe quantile regression) give me hint… Cheers, Andrej 12 Comments | General, R Internals | Tagged: confidence interval, fitting, Monte Carlo, nls, nonlinear | Permalink Posted by anspiess The magic empty bracket January 30, 2013 I have been working with R for some time now, but once in a while, basic functions catch my eye that I was not aware of… For some project I wanted to transform a correlation matrix into a covariance matrix. Now, since cor2cov does not exist, I thought about “reversing” the cov2cor function (stats:::cov2cor). Inside the code of this function, a specific line jumped into my retina: r[] <- Is * V * rep(Is, each = p) What’s this [ ]? Well, it stands for every element $E_{ij}$ of matrix $E$. Consider this: mat <- matrix(NA, nrow = 5, ncol = 5) > mat [,1] [,2] [,3] [,4] [,5] [1,] NA NA NA NA NA [2,] NA NA NA NA NA [3,] NA NA NA NA NA [4,] NA NA NA NA NA [5,] NA NA NA NA NA With the empty bracket, we can now substitute ALL values by a new value: mat[] <- 1 > mat [,1] [,2] [,3] [,4] [,5] [1,] 1 1 1 1 1 [2,] 1 1 1 1 1 [3,] 1 1 1 1 1 [4,] 1 1 1 1 1 [5,] 1 1 1 1 1 Interestingly, this also works with lists: L <- list(a = 1, b = 2, c = 3) >L $a [1] 1$b [1] 2 $c [1] 3 L[] <- 5 > L$a [1] 5 $b [1] 5$c [1] 5 Cheers, Andrej   13 Comments | R Internals | Tagged: all elements, empty bracket, matrix | Permalink Posted by anspiess A weighting function for ‘nls’ / ‘nlsLM’ July 19, 2012 Standard nonlinear regression assumes homoscedastic data, that is, all response values $y_i$ are distributed normally.  In case of heteroscedastic data (i.e. when the variance is dependent on the magnitude of the data), weighting the fit is essential. In nls (or nlsLM of the minpack.lm package), weighting can be conducted by two different methods: 1) by supplying a vector of weighting values $w_i$ for every $y_i$ that is supplied to the ‘weights’ argument. 2) by modifying the objective function in a way that it has the weights included in the body of the function. An example is given for this one in the documentation to nls. 1) is easy of course, but all $w_i$ values have to be derived from somewhere, for example the reciproce of the response values $w_i = 1/y_i$. 2) is a bit more cumbersome and one does not always want to define a new weighted version of the objective function. I found it about time to design a weighting function that can be supplied to the ‘weights’ argument and that makes weighting by different regimes a breeze. wfct() returns a vector of weights that are calculated from a user-defined expression and transfers this vector within nls. wfct <- function(expr) { expr <- deparse(substitute(expr)) ## create new environment newEnv <- new.env() ## get call mc <- sys.calls()[[1]] mcL <- as.list(mc) ## get data and write to newEnv DATA <- mcL[["data"]] DATA <- eval(DATA) DATA <- as.list(DATA) NAMES <- names(DATA) for (i in 1:length(DATA)) assign(NAMES[i], DATA[[i]], envir = newEnv) ## get parameter, response and predictor names formula <- as.formula(mcL[[2]]) VARS <- all.vars(formula) RESP <- VARS[1] RHS <- VARS[-1] PRED <- match(RHS, names(DATA)) PRED <- names(DATA)[na.omit(PRED)] ## calculate variances for response values if "error" is in expression ## and write to newEnv if (length(grep("error", expr)) > 0) { y <- DATA[[RESP]] x <- DATA[[PRED]] ## test for replication if (!any(duplicated(x))) stop("No replicates available to calculate error from!") ## calculate error error <- tapply(y, x, function(e) var(e, na.rm = TRUE)) error <- as.numeric(sqrt(error)) ## convert to original repititions error <- rep(error, as.numeric(table(x))) assign("error", error, envir = newEnv) } ## calculate fitted or residual values if "fitted"/"resid" is in expression ## and write to newEnv if (length(grep("fitted", expr)) > 0 || length(grep("resid", expr)) > 0) { mc2 <- mc mc2$weights <- NULL MODEL <- eval(mc2) fitted <- fitted(MODEL) resid <- residuals(MODEL) assign("fitted", fitted, newEnv) assign("resid", resid, newEnv) } ## return evaluation in newEnv: vector of weights OUT <- eval(parse(text = expr), envir = newEnv) return(OUT) } The weighting function can take 5 different variable definitions and combinations thereof: the name of the predictor (independent) variable, the name of the response (dependent) variable, error: if replicates $y_{ij}$ exist, the error $\sigma(y_{ij})$, fitted: the fitted values $\hat{y}_i$ of the model, resid: the residuals $y_i - \hat{y}_i$ of the model, For the last two, the model is fit unweighted, fitted values and residuals are extracted and the model is refit by the defined weights. Let’s have some examples (taken from the nls doc): Treated <- Puromycin[Puromycin$state == "treated", ] Weighting by inverse of response $1/y_i$: nls(rate ~ Vm * conc/(K + conc), data = Treated, start = c(Vm = 200, K = 0.05), weights = wfct(1/rate)) Nonlinear regression model model: rate ~ Vm * conc/(K + conc) data: Treated Vm K 209.59669 0.06065 weighted residual sum-of-squares: 12.27 Number of iterations to convergence: 5 Achieved convergence tolerance: 5.2e-06 Weighting by square root of predictor $\sqrt{x_i}$: nls(rate ~ Vm * conc/(K + conc), data = Treated, start = c(Vm = 200, K = 0.05), weights = wfct(sqrt(conc))) Nonlinear regression model model: rate ~ Vm * conc/(K + conc) data: Treated Vm K 216.74134 0.07106 weighted residual sum-of-squares: 332.6 Number of iterations to convergence: 5 Achieved convergence tolerance: 2.403e-06 Weighting by inverse square of fitted values $1/\hat{y_i}^2$: nls(rate ~ Vm * conc/(K + conc), data = Treated, start = c(Vm = 200, K = 0.05), weights = wfct(1/fitted^2)) Nonlinear regression model model: rate ~ Vm * conc/(K + conc) data: Treated Vm K 200.84098 0.04943 weighted residual sum-of-squares: 0.2272 Number of iterations to convergence: 4 Achieved convergence tolerance: 1.47e-06 A common weighting regime is to use the inverse variance $1/\sigma^2(y_i)$: 6 Comments | R Internals | Permalink Posted by anspiess A better ‘nls’ (?) July 5, 2012 Those that do a lot of nonlinear regression will love the nls function of R. In most of the cases it works really well, but there are some mishaps that can occur when using bad starting values for the parameters. One of the most dreaded is the “singular gradient matrix at initial parameter estimates” which brings the function to a stop because the gradient check in stats:::nlsModel will terminate if the QR decomposition is not of full column rank. Nearly all nonlinear fitting programs out there use the Levenberg-Marquardt algorithm for nonlinear regression. This is because its switching between Gauss-Newton and gradient descent is highly robust against far-off-optimal starting values.  Unfortunately, the standard nls function has no LM implemented, instead it houses the Gauss-Newton type, the PORT routines and a partial linear fitter. The fabulous minpack.lm package from Katherine M. Mullen offers an R frontend to a Fortran LM implementation of the MINPACK package. The nls.lm function must be supplied with an objective function that returns a vector of residuals to be minimized. Together with Kate I developed a function nlsLM that has the gee wizz formulaic interface of nls, but it calls LM instead of Gauss-Newton. This has some advantages as we will see below.  The function returns the usual result of class ‘nls’, and due to some modifications, all standard generics work. The modifications were made so that the formula is transformed into a function that returns a vector of (weighted) residuals whose sum square is minimized by nls.lm. The optimized parameters are then transferred to stats:::nlsModel in order to obtain an object of class ‘nlsModel’. The internal C  functions C_nls_iter and stats:::nls_port_fit were removed to avoid subsequent “Gauss-Newton”, “port” or “plinear” optimization of nlsModel. So, what’s similar and what’s, well, better… library(minpack.lm) ### Examples from 'nls' doc ### DNase1 <- subset(DNase, Run == 1) fm1DNase1 <- nlsLM(density ~ Asym/(1 + exp((xmid - log(conc))/scal)), data = DNase1, start = list(Asym = 3, xmid = 0, scal = 1)) ## all generics are applicable coef(fm1DNase1) confint(fm1DNase1) deviance(fm1DNase1) df.residual(fm1DNase1) fitted(fm1DNase1) formula(fm1DNase1) logLik(fm1DNase1) predict(fm1DNase1) print(fm1DNase1) profile(fm1DNase1) residuals(fm1DNase1) summary(fm1DNase1) update(fm1DNase1) vcov(fm1DNase1) weights(fm1DNase1) nlsLM can fit zero noise data, when nls fails: x <- 1:10 y <- 2*x + 3 nls(y ~ a + b * x, start = list(a = 0.12345, b = 0.54321)) nlsLM(y ~ a + b * x, start = list(a = 0.12345, b = 0.54321)) Nonlinear regression model model: y ~ a + b * x data: parent.frame() a b 3 2 residual sum-of-squares: 0 Number of iterations to convergence: 3 Achieved convergence tolerance: 1.49e-08 nlsLM often converges when nls gives the dreaded "singular gradient" error. Example taken from here: x <- 0:140 y <- 200 / (1 + exp(17 - x)/2) * exp(-0.02*x) yeps <- y + rnorm(length(y), sd = 2) nls(yeps ~ p1 / (1 + exp(p2 - x)) * exp(p4 * x), start=list(p1=410,p2=18,p4=-.03)) Optimal starting parameters work: Nonlinear regression model model: yeps ~ p1/(1 + exp(p2 - x)) * exp(p4 * x) data: parent.frame() p1 p2 p4 200.67285 16.32430 -0.02005 residual sum-of-squares: 569 Number of iterations to convergence: 6 Achieved convergence tolerance: 2.611e-07 But off-optimal parameters give error: nls(yeps ~ p1 / (1 + exp(p2 - x)) * exp(p4 * x), start=list(p1=10,p2=18,p4=-.03)) Fehler in nls(yeps ~ p1/(1 + exp(p2 - x)) * exp(p4 * x), start = list(p1 = 10, : singulärer Gradient nlsLM is more robust with these starting parameters: nlsLM(yeps ~ p1 / (1 + exp(p2 - x)) * exp(p4 * x), start=list(p1=10,p2=18,p4=-.03)) Nonlinear regression model model: yeps ~ p1/(1 + exp(p2 - x)) * exp(p4 * x) data: parent.frame() p1 p2 p4 200.67285 16.32430 -0.02005 residual sum-of-squares: 569 Number of iterations to convergence: 10 Achieved convergence tolerance: 1.49e-08 Have fun! Cheers, Andrej     __ATA.cmd.push(function() { __ATA.initDynamicSlot({ id: 'atatags-26942-5fc4a8470f45e', location: 120, formFactor: '001', label: { text: 'Advertisements', }, creative: { reportAd: { text: 'Report this ad', }, privacySettings: { text: 'Privacy settings', } } }); }); 16 Comments | R Internals | Permalink Posted by anspiess Don’t recycle me! June 19, 2012 For me, one of the most annoying features of R is that by default, rbind,  cbind  and data.frame recycle the shorter vector to the length of the longer vector. I still don’t understand why the standard generics don’t have a parameter like cbind(1:10, 1:5, fill = TRUE) to fill up with ‘NA’s. There may be a deeper reason… > cbind(1:10, 1:5) [,1] [,2] [1,] 1 1 [2,] 2 2 [3,] 3 3 [4,] 4 4 [5,] 5 5 [6,] 6 1 [7,] 7 2 [8,] 8 3 [9,] 9 4 [10,] 10 5 > rbind(1:10, 1:5) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 1 2 3 4 5 6 7 8 9 10 [2,] 1 2 3 4 5 1 2 3 4 5 > data.frame(x = 1:10, y = 1:5) x y 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 1 7 7 2 8 8 3 9 9 4 10 10 5 Even more, using unequal size matrices gives an error: > mat1 <- matrix(rnorm(10), nrow = 5) > mat2 <- matrix(rnorm(10), nrow = 2) > cbind(mat1, mat2) Fehler in cbind(mat1, mat2) : Anzahl der Zeilen der Matrizen muss übereinstimmen (siehe Argument 2) (You can learn some german here. Well, I suppose you know the error…) Enter cbind.na, rbind.na, data.frame.na ! I modified the three generic functions so that all arguments are filled with ‘NA’s up to maximal occuring length (vector), rows (cbind) or columns (rbind). One must ‘source’ all three functions as cbind.na calls rbind.na etc: cbind.na <- function (..., deparse.level = 1) { na <- nargs() - (!missing(deparse.level)) deparse.level <- as.integer(deparse.level) stopifnot(0 <= deparse.level, deparse.level <= 2) argl <- list(...) while (na > 0 && is.null(argl[[na]])) { argl <- argl[-na] na <- na - 1 } if (na == 0) return(NULL) if (na == 1) { if (isS4(..1)) return(cbind2(..1)) else return(matrix(...)) ##.Internal(cbind(deparse.level, ...))) } if (deparse.level) { symarg <- as.list(sys.call()[-1L])[1L:na] Nms <- function(i) { if (is.null(r <- names(symarg[i])) || r == "") { if (is.symbol(r <- symarg[[i]]) || deparse.level == 2) deparse(r) } else r } } ## deactivated, otherwise no fill in with two arguments if (na == 0) { r <- argl[[2]] fix.na <- FALSE } else { nrs <- unname(lapply(argl, nrow)) iV <- sapply(nrs, is.null) fix.na <- identical(nrs[(na - 1):na], list(NULL, NULL)) ## deactivated, otherwise data will be recycled #if (fix.na) { # nr <- max(if (all(iV)) sapply(argl, length) else unlist(nrs[!iV])) # argl[[na]] <- cbind(rep(argl[[na]], length.out = nr), # deparse.level = 0) #} if (deparse.level) { if (fix.na) fix.na <- !is.null(Nna <- Nms(na)) if (!is.null(nmi <- names(argl))) iV <- iV & (nmi == "") ii <- if (fix.na) 2:(na - 1) else 2:na if (any(iV[ii])) { for (i in ii[iV[ii]]) if (!is.null(nmi <- Nms(i))) names(argl)[i] <- nmi } } ## filling with NA's to maximum occuring nrows nRow <- as.numeric(sapply(argl, function(x) NROW(x))) maxRow <- max(nRow, na.rm = TRUE) argl <- lapply(argl, function(x) if (is.null(nrow(x))) c(x, rep(NA, maxRow - length(x))) else rbind.na(x, matrix(, maxRow - nrow(x), ncol(x)))) r <- do.call(cbind, c(argl[-1L], list(deparse.level = deparse.level))) } d2 <- dim(r) r <- cbind2(argl[[1]], r) if (deparse.level == 0) return(r) ism1 <- !is.null(d1 <- dim(..1)) && length(d1) == 2L ism2 <- !is.null(d2) && length(d2) == 2L && !fix.na if (ism1 && ism2) return(r) Ncol <- function(x) { d <- dim(x) if (length(d) == 2L) d[2L] else as.integer(length(x) > 0L) } nn1 <- !is.null(N1 <- if ((l1 <- Ncol(..1)) && !ism1) Nms(1)) nn2 <- !is.null(N2 <- if (na == 2 && Ncol(..2) && !ism2) Nms(2)) if (nn1 || nn2 || fix.na) { if (is.null(colnames(r))) colnames(r) <- rep.int("", ncol(r)) setN <- function(i, nams) colnames(r)[i] <<- if (is.null(nams)) "" else nams if (nn1) setN(1, N1) if (nn2) setN(1 + l1, N2) if (fix.na) setN(ncol(r), Nna) } r }   rbind.na <- function (..., deparse.level = 1) { na <- nargs() - (!missing(deparse.level)) deparse.level <- as.integer(deparse.level) stopifnot(0 <= deparse.level, deparse.level <= 2) argl <- list(...) while (na > 0 && is.null(argl[[na]])) { argl <- argl[-na] na <- na - 1 } if (na == 0) return(NULL) if (na == 1) { if (isS4(..1)) return(rbind2(..1)) else return(matrix(..., nrow = 1)) ##.Internal(rbind(deparse.level, ...))) } if (deparse.level) { symarg <- as.list(sys.call()[-1L])[1L:na] Nms <- function(i) { if (is.null(r <- names(symarg[i])) || r == "") { if (is.symbol(r <- symarg[[i]]) || deparse.level == 2) deparse(r) } else r } } ## deactivated, otherwise no fill in with two arguments if (na == 0) { r <- argl[[2]] fix.na <- FALSE } else { nrs <- unname(lapply(argl, ncol)) iV <- sapply(nrs, is.null) fix.na <- identical(nrs[(na - 1):na], list(NULL, NULL)) ## deactivated, otherwise data will be recycled #if (fix.na) { # nr <- max(if (all(iV)) sapply(argl, length) else unlist(nrs[!iV])) # argl[[na]] <- rbind(rep(argl[[na]], length.out = nr), # deparse.level = 0) #} if (deparse.level) { if (fix.na) fix.na <- !is.null(Nna <- Nms(na)) if (!is.null(nmi <- names(argl))) iV <- iV & (nmi == "") ii <- if (fix.na) 2:(na - 1) else 2:na if (any(iV[ii])) { for (i in ii[iV[ii]]) if (!is.null(nmi <- Nms(i))) names(argl)[i] <- nmi } } ## filling with NA's to maximum occuring ncols nCol <- as.numeric(sapply(argl, function(x) if (is.null(ncol(x))) length(x) else ncol(x))) maxCol <- max(nCol, na.rm = TRUE) argl <- lapply(argl, function(x) if (is.null(ncol(x))) c(x, rep(NA, maxCol - length(x))) else cbind(x, matrix(, nrow(x), maxCol - ncol(x)))) ## create a common name vector from the ## column names of all 'argl' items namesVEC <- rep(NA, maxCol) for (i in 1:length(argl)) { CN <- colnames(argl[[i]]) m <- !(CN %in% namesVEC) namesVEC[m] <- CN[m] } ## make all column names from common 'namesVEC' for (j in 1:length(argl)) { if (!is.null(ncol(argl[[j]]))) colnames(argl[[j]]) <- namesVEC } r <- do.call(rbind, c(argl[-1L], list(deparse.level = deparse.level))) } d2 <- dim(r) ## make all column names from common 'namesVEC' colnames(r) <- colnames(argl[[1]]) r <- rbind2(argl[[1]], r) if (deparse.level == 0) return(r) ism1 <- !is.null(d1 <- dim(..1)) && length(d1) == 2L ism2 <- !is.null(d2) && length(d2) == 2L && !fix.na if (ism1 && ism2) return(r) Nrow <- function(x) { d <- dim(x) if (length(d) == 2L) d[1L] else as.integer(length(x) > 0L) } nn1 <- !is.null(N1 <- if ((l1 <- Nrow(..1)) && !ism1) Nms(1)) nn2 <- !is.null(N2 <- if (na == 2 && Nrow(..2) && !ism2) Nms(2)) if (nn1 || nn2 || fix.na) { if (is.null(rownames(r))) rownames(r) <- rep.int("", nrow(r)) setN <- function(i, nams) rownames(r)[i] <<- if (is.null(nams)) "" else nams if (nn1) setN(1, N1) if (nn2) setN(1 + l1, N2) if (fix.na) setN(nrow(r), Nna) } r }   data.frame.na <- function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, stringsAsFactors = FALSE) { data.row.names <- if (check.rows && is.null(row.names)) function(current, new, i) { if (is.character(current)) new <- as.character(new) if (is.character(new)) current <- as.character(current) if (anyDuplicated(new)) return(current) if (is.null(current)) return(new) if (all(current == new) || all(current == "")) return(new) stop(gettextf("mismatch of row names in arguments of 'data.frame', item %d", i), domain = NA) } else function(current, new, i) { if (is.null(current)) { if (anyDuplicated(new)) { warning("some row.names duplicated: ", paste(which(duplicated(new)), collapse = ","), " --> row.names NOT used") current } else new } else current } object <- as.list(substitute(list(...)))[-1L] mrn <- is.null(row.names) x <- list(...) n <- length(x) if (n < 1L) { if (!mrn) { if (is.object(row.names) || !is.integer(row.names)) row.names <- as.character(row.names) if (any(is.na(row.names))) stop("row names contain missing values") if (anyDuplicated(row.names)) stop("duplicate row.names: ", paste(unique(row.names[duplicated(row.names)]), collapse = ", ")) } else row.names <- integer(0L) return(structure(list(), names = character(0L), row.names = row.names, class = "data.frame")) } vnames <- names(x) if (length(vnames) != n) vnames <- character(n) no.vn <- !nzchar(vnames) vlist <- vnames <- as.list(vnames) nrows <- ncols <- integer(n) for (i in seq_len(n)) { xi <- if (is.character(x[[i]]) || is.list(x[[i]])) as.data.frame(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors) else as.data.frame(x[[i]], optional = TRUE) nrows[i] <- .row_names_info(xi) ncols[i] <- length(xi) namesi <- names(xi) if (ncols[i] > 1L) { if (length(namesi) == 0L) namesi <- seq_len(ncols[i]) if (no.vn[i]) vnames[[i]] <- namesi else vnames[[i]] <- paste(vnames[[i]], namesi, sep = ".") } else { if (length(namesi)) vnames[[i]] <- namesi else if (no.vn[[i]]) { tmpname <- deparse(object[[i]])[1L] if (substr(tmpname, 1L, 2L) == "I(") { ntmpn <- nchar(tmpname, "c") if (substr(tmpname, ntmpn, ntmpn) == ")") tmpname <- substr(tmpname, 3L, ntmpn - 1L) } vnames[[i]] <- tmpname } } if (missing(row.names) && nrows[i] > 0L) { rowsi <- attr(xi, "row.names") nc <- nchar(rowsi, allowNA = FALSE) nc <- nc[!is.na(nc)] if (length(nc) && any(nc)) row.names <- data.row.names(row.names, rowsi, i) } nrows[i] <- abs(nrows[i]) vlist[[i]] <- xi } nr <- max(nrows) for (i in seq_len(n)[nrows < nr]) { xi <- vlist[[i]] if (nrows[i] > 0L) { xi <- unclass(xi) fixed <- TRUE for (j in seq_along(xi)) { ### added NA fill to max length/nrow xi1 <- xi[[j]] if (is.vector(xi1) || is.factor(xi1)) xi[[j]] <- c(xi1, rep(NA, nr - nrows[i])) else if (is.character(xi1) && class(xi1) == "AsIs") xi[[j]] <- structure(c(xi1, rep(NA, nr - nrows[i])), class = class(xi1)) else if (inherits(xi1, "Date") || inherits(xi1, "POSIXct")) xi[[j]] <- c(xi1, rep(NA, nr - nrows[i])) else { fixed <- FALSE break } } if (fixed) { vlist[[i]] <- xi next } } stop("arguments imply differing number of rows: ", paste(unique(nrows), collapse = ", ")) } value <- unlist(vlist, recursive = FALSE, use.names = FALSE) vnames <- unlist(vnames[ncols > 0L]) noname <- !nzchar(vnames) if (any(noname)) vnames[noname] <- paste("Var", seq_along(vnames), sep = ".")[noname] if (check.names) vnames <- make.names(vnames, unique = TRUE) names(value) <- vnames if (!mrn) { if (length(row.names) == 1L && nr != 1L) { if (is.character(row.names)) row.names <- match(row.names, vnames, 0L) if (length(row.names) != 1L || row.names < 1L || row.names > length(vnames)) stop("row.names should specify one of the variables") i <- row.names row.names <- value[[i]] value <- value[-i] } else if (!is.null(row.names) && length(row.names) != nr) stop("row names supplied are of the wrong length") } else if (!is.null(row.names) && length(row.names) != nr) { warning("row names were found from a short variable and have been discarded") row.names <- NULL } if (is.null(row.names)) row.names <- .set_row_names(nr) else { if (is.object(row.names) || !is.integer(row.names)) row.names <- as.character(row.names) if (any(is.na(row.names))) stop("row names contain missing values") if (anyDuplicated(row.names)) stop("duplicate row.names: ", paste(unique(row.names[duplicated(row.names)]), collapse = ", ")) } attr(value, "row.names") <- row.names attr(value, "class") <- "data.frame" value } So, let’s start! > cbind.na(1:8, 1:7, 1:5, 1:10) [,1] [,2] [,3] [,4] [1,] 1 1 1 1 [2,] 2 2 2 2 [3,] 3 3 3 3 [4,] 4 4 4 4 [5,] 5 5 5 5 [6,] 6 6 NA 6 [7,] 7 7 NA 7 [8,] 8 NA NA 8 [9,] NA NA NA 9 [10,] NA NA NA 10 > rbind.na(1:8, 1:7, 1:5, 1:10) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 1 2 3 4 5 6 7 8 NA NA [2,] 1 2 3 4 5 6 7 NA NA NA [3,] 1 2 3 4 5 NA NA NA NA NA [4,] 1 2 3 4 5 6 7 8 9 10 > data.frame.na(A = 1:7, B = 1:5, C = letters[1:3], + D = factor(c(1, 1, 2, 2))) A B C D 1 1 1 a 1 2 2 2 b 1 3 3 3 c 2 4 4 4 <NA> 2 5 5 5 <NA> NA 6 6 NA <NA> NA 7 7 NA <NA> NA System time overhead for NA-filling is not too much: > testList <- vector("list", length = 10000) > for (i in 1:10000) testList[[i]] <- sample(1:100) > system.time(res1 <- do.call(cbind, testList)) user system elapsed 0.01 0.00 0.02 > testList <- vector("list", length = 10000) > for (i in 1:10000) testList[[i]] <- sample(1:100, sample(1:100)) > system.time(res2 <- do.call(cbind.na, testList)) user system elapsed 0.23 0.00 0.24 Have fun! 12 Comments | R Internals | Permalink Posted by anspiess 
 You are currently browsing the archives for the R Internals category. Search Recent Posts Monte Carlo-based prediction intervals for nonlinear regression Linear regression with random error giving EXACT predefined parameter estimates Introducing: Orthogonal Nonlinear Least-Squares Regression in R Error propagation based on interval arithmetics I’ll take my NLS with weights, please… Archives May 2018 January 2016 January 2015 September 2014 January 2014 August 2013 July 2013 April 2013 February 2013 January 2013 November 2012 July 2012 June 2012 Categories General nonlinear regression R Internals Uncategorized Meta Register Log in Entries feed Comments feed WordPress.com Blogroll R-bloggers __ATA.cmd.push(function() { __ATA.initDynamicSlot({ id: 'atatags-286348-5fc4a847125e0', location: 140, formFactor: '003', label: { text: 'Advertisements', }, creative: { reportAd: { text: 'Report this ad', }, privacySettings: { text: 'Privacy settings', } } }); }); Blog at WordPress.com. 
 var WPGroHo = {"my_hash":""}; //initialize and attach hovercards to all gravatars jQuery( document ).ready( function( $) { if (typeof Gravatar === "undefined"){ return; } if ( typeof Gravatar.init !== "function" ) { return; } Gravatar.profile_cb = function( hash, id ) { WPGroHo.syncProfileData( hash, id ); }; Gravatar.my_hash = WPGroHo.my_hash; Gravatar.init( 'body', '#wp-admin-bar-my-account' ); }); ( function () { var setupPrivacy = function() { document.addEventListener( 'DOMContentLoaded', function() { // Minimal Mozilla Cookie library // https://developer.mozilla.org/en-US/docs/Web/API/Document/cookie/Simple_document.cookie_framework var cookieLib = window.cookieLib = {getItem:function(e){return e&&decodeURIComponent(document.cookie.replace(new RegExp("(?:(?:^|.*;)\\s*"+encodeURIComponent(e).replace(/[\-\.\+\*]/g,"\\$&")+"\\s*\\=\\s*([^;]*).*$)|^.*$"),"$1"))||null},setItem:function(e,o,n,t,r,i){if(!e||/^(?:expires|max\-age|path|domain|secure)$/i.test(e))return!1;var c="";if(n)switch(n.constructor){case Number:c=n===1/0?"; expires=Fri, 31 Dec 9999 23:59:59 GMT":"; max-age="+n;break;case String:c="; expires="+n;break;case Date:c="; expires="+n.toUTCString()}return"rootDomain"!==r&&".rootDomain"!==r||(r=(".rootDomain"===r?".":"")+document.location.hostname.split(".").slice(-2).join(".")),document.cookie=encodeURIComponent(e)+"="+encodeURIComponent(o)+c+(r?"; domain="+r:"")+(t?"; path="+t:"")+(i?"; secure":""),!0}}; var setDefaultOptInCookie = function() { var value = '1YNN'; var domain = '.wordpress.com' === location.hostname.slice( -14 ) ? '.rootDomain' : location.hostname; cookieLib.setItem( 'usprivacy', value, 365 * 24 * 60 * 60, '/', domain ); }; var setCcpaAppliesCookie = function( value ) { var domain = '.wordpress.com' === location.hostname.slice( -14 ) ? '.rootDomain' : location.hostname; cookieLib.setItem( 'ccpa_applies', value, 24 * 60 * 60, '/', domain ); } var maybeCallDoNotSellCallback = function() { if ( 'function' === typeof window.doNotSellCallback ) { return window.doNotSellCallback(); } return false; } var usprivacyCookie = cookieLib.getItem( 'usprivacy' ); if ( null !== usprivacyCookie ) { maybeCallDoNotSellCallback(); return; } var ccpaCookie = cookieLib.getItem( 'ccpa_applies' ); if ( null === ccpaCookie ) { var request = new XMLHttpRequest(); request.open( 'GET', 'https://public-api.wordpress.com/geo/', true ); request.onreadystatechange = function () { if ( 4 === this.readyState ) { if ( 200 === this.status ) { var data = JSON.parse( this.response ); var ccpa_applies = data['region'] && data['region'].toLowerCase() === 'california'; setCcpaAppliesCookie( ccpa_applies ); if ( ccpa_applies ) { if ( maybeCallDoNotSellCallback() ) { setDefaultOptInCookie(); } } } else { setCcpaAppliesCookie( true ); if ( maybeCallDoNotSellCallback() ) { setDefaultOptInCookie(); } } } }; request.send(); } else { if ( ccpaCookie === 'true' ) { if ( maybeCallDoNotSellCallback() ) { setDefaultOptInCookie(); } } } } ); }; if ( window.defQueue && defQueue.isLOHP && defQueue.isLOHP === 2020 ) { defQueue.items.push( setupPrivacy ); } else { setupPrivacy(); } } )(); ( function( $) {$( document.body ).on( 'post-load', function () { if ( typeof __ATA.insertInlineAds === 'function' ) { __ATA.insertInlineAds(); } } ); } )( jQuery ); window.addEventListener( "load", function( event ) { var link = document.createElement( "link" ); link.href = "https://s0.wp.com/wp-content/mu-plugins/actionbar/actionbar.css?v=20201002"; link.type = "text/css"; link.rel = "stylesheet"; document.head.appendChild( link ); var script = document.createElement( "script" ); script.src = "https://s0.wp.com/wp-content/mu-plugins/actionbar/actionbar.js?v=20201002"; script.defer = true; document.body.appendChild( script ); } ); Post to Cancel Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use. To find out more, including how to control cookies, see here: Cookie Policy (function(){ var corecss = document.createElement('link'); var themecss = document.createElement('link'); var corecssurl = "https://s1.wp.com/wp-content/plugins/syntaxhighlighter/syntaxhighlighter3/styles/shCore.css?ver=3.0.9b"; if ( corecss.setAttribute ) { corecss.setAttribute( "rel", "stylesheet" ); corecss.setAttribute( "type", "text/css" ); corecss.setAttribute( "href", corecssurl ); } else { corecss.rel = "stylesheet"; corecss.href = corecssurl; } document.head.appendChild( corecss ); var themecssurl = "https://s2.wp.com/wp-content/plugins/syntaxhighlighter/syntaxhighlighter3/styles/shThemeDefault.css?m=1363304414h&amp;ver=3.0.9b"; if ( themecss.setAttribute ) { themecss.setAttribute( "rel", "stylesheet" ); themecss.setAttribute( "type", "text/css" ); themecss.setAttribute( "href", themecssurl ); } else { themecss.rel = "stylesheet"; themecss.href = themecssurl; } document.head.appendChild( themecss ); })(); SyntaxHighlighter.config.strings.expandSource = '+ expand source'; SyntaxHighlighter.config.strings.help = '?'; SyntaxHighlighter.config.strings.alert = 'SyntaxHighlighter\n\n'; SyntaxHighlighter.config.strings.noBrush = 'Can\'t find brush for: '; SyntaxHighlighter.config.strings.brushNotHtmlScript = 'Brush wasn\'t configured for html-script option: '; SyntaxHighlighter.defaults['pad-line-numbers'] = false; SyntaxHighlighter.defaults['toolbar'] = false; SyntaxHighlighter.all(); // Infinite scroll support if ( typeof( jQuery ) !== 'undefined' ) { jQuery( function( $) {$( document.body ).on( 'post-load', function() { SyntaxHighlighter.highlight(); } ); } ); } var jetpackCarouselStrings = {"widths":[370,700,1000,1200,1400,2000],"is_logged_in":"","lang":"en","ajaxurl":"https:\/\/rmazing.wordpress.com\/wp-admin\/admin-ajax.php","nonce":"f2c9644429","display_exif":"1","display_comments":"1","display_geo":"1","single_image_gallery":"1","single_image_gallery_media_file":"","background_color":"black","comment":"Comment","post_comment":"Post Comment","write_comment":"Write a Comment...","loading_comments":"Loading Comments...","download_original":"View full size <span class=\"photo-size\">{0}<span class=\"photo-size-times\">\u00d7<\/span>{1}<\/span>","no_comment_text":"Please be sure to submit some text with your comment.","no_comment_email":"Please provide an email address to comment.","no_comment_author":"Please provide your name to comment.","comment_post_error":"Sorry, but there was an error posting your comment. Please try again later.","comment_approved":"Your comment was approved.","comment_unapproved":"Your comment is in moderation.","camera":"Camera","aperture":"Aperture","shutter_speed":"Shutter Speed","focal_length":"Focal Length","copyright":"Copyright","comment_registration":"0","require_name_email":"1","login_url":"https:\/\/rmazing.wordpress.com\/wp-login.php?redirect_to=https%3A%2F%2Frmazing.wordpress.com%2F2013%2F08%2F31%2Fintroducing-propagate%2F","blog_id":"37379885","meta_data":["camera","aperture","shutter_speed","focal_length","copyright"],"local_comments_commenting_as":"<fieldset><label for=\"email\">Email (Required)<\/label> <input type=\"text\" name=\"email\" class=\"jp-carousel-comment-form-field jp-carousel-comment-form-text-field\" id=\"jp-carousel-comment-form-email-field\" \/><\/fieldset><fieldset><label for=\"author\">Name (Required)<\/label> <input type=\"text\" name=\"author\" class=\"jp-carousel-comment-form-field jp-carousel-comment-form-text-field\" id=\"jp-carousel-comment-form-author-field\" \/><\/fieldset><fieldset><label for=\"url\">Website<\/label> <input type=\"text\" name=\"url\" class=\"jp-carousel-comment-form-field jp-carousel-comment-form-text-field\" id=\"jp-carousel-comment-form-url-field\" \/><\/fieldset>","reblog":"Reblog","reblogged":"Reblogged","reblog_add_thoughts":"Add your thoughts here... (optional)","reblogging":"Reblogging...","post_reblog":"Post Reblog","stats_query_args":"blog=37379885&v=wpcom&tz=1&user_id=0&subd=rmazing","is_public":"1","reblog_enabled":""}; // <![CDATA[ (function() { try{ if ( window.external &&'msIsSiteMode' in window.external) { if (window.external.msIsSiteMode()) { var jl = document.createElement('script'); jl.type='text/javascript'; jl.async=true; jl.src='/wp-content/plugins/ie-sitemode/custom-jumplist.php'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(jl, s); } } }catch(e){} })(); // ]]> _tkq = window._tkq || []; _stq = window._stq || []; _tkq.push(['storeContext', {'blog_id':'37379885','blog_tz':'1','user_lang':'en','blog_lang':'en','user_id':'0'}]); _stq.push(['view', {'blog':'37379885','v':'wpcom','tz':'1','user_id':'0','subd':'rmazing'}]); _stq.push(['extra', {'crypt':'UE5XaGUuOTlwaD85flAmcm1mcmZsaDhkV11YdWFnNncxc1tjZG9XVXhRREQ/V0xPZ1hKXy8xNFVXSj9jVnIteUlFc3ddRUw0SjZjfGJdQml5SlBYclRxaXguUEtnZFIxMWRVbC1HVFNycE9KZ3VVcXRpd3g2dz1RLWJoNnJ5WDg/VTh8fHNRZkg9ZUFKa1lLTHJRTU93OGJiUl1zYlpzdFt0cTYsP0xocT93RVR6RzRFdD1EKy53dkolN1BXbWdxbHhKcm5VfnE/VjNIcDJoSDNQYjNnNlhfTF90JVN2flBYaERbdl0xLXxwSnoyZjUlOGxzW2xjaw=='}]); _stq.push([ 'clickTrackerInit', '37379885', '0' ]); if ( 'object' === typeof wpcom_mobile_user_agent_info ) { wpcom_mobile_user_agent_info.init(); var mobileStatsQueryString = ""; if( false !== wpcom_mobile_user_agent_info.matchedPlatformName ) mobileStatsQueryString += "&x_" + 'mobile_platforms' + '=' + wpcom_mobile_user_agent_info.matchedPlatformName; if( false !== wpcom_mobile_user_agent_info.matchedUserAgentName ) mobileStatsQueryString += "&x_" + 'mobile_devices' + '=' + wpcom_mobile_user_agent_info.matchedUserAgentName; if( wpcom_mobile_user_agent_info.isIPad() ) mobileStatsQueryString += "&x_" + 'ipad_views' + '=' + 'views'; if( "" != mobileStatsQueryString ) { new Image().src = document.location.protocol + '//pixel.wp.com/g.gif?v=wpcom-no-pv' + mobileStatsQueryString + '&baba=' + Math.random(); } }