Those that do a lot of nonlinear fitting with the *nls* function may have noticed that *predict.nls* does not have a way to calculate a confidence interval for the fitted value. Using *confint* you can obtain the error of the **fit parameters**, but how about the error in **fitted values**? *?predict.nls* says: “At present se.fit and interval are ignored.” What a pity… This is largely to the fact that confidence intervals for nonlinear fits are not easily calculated and under some debate, see http://r.789695.n4.nabble.com/plotting-confidence-bands-from-predict-nls-td3505012.html or http://thr3ads.net/r-help/2011/05/1053390-plotting-confidence-bands-from-predict.nls. In principle, since calculating the error in the fitted values is a matter of “error propagation”, two different approaches can be used:

1) Error propagation using approximation by (first-order) Taylor expansion around ,

2) Error propagation using Monte Carlo simulation.

Topic 1) will be subject of my next post, today we will stick with the MC approach.

When calculating the error in the fitted values, we need to propagate the error of all variables, i.e. the error in all predictor variables and the error of the fit parameters , to the response . Often (as in the ‘Examples’ section of *nls*), there is no error in the -values. The errors of the fit parameters are obtained, together with their correlations, in the variance-covariance matrix from *vcov(object)*.

A Monte Carlo approach to nonlinear error propagation does the following:

1) Use as input and of all predictor variables and the vcov matrix of the fit parameters .

2) For each variable , we create samples from a multivariate normal distribution using the variance-covariance matrix: .

3) We evaluate the function on each simulated variable:

4) We calculate statistics (mean, s.d., median, mad) and quantile-based confidence intervals on the vector .

This is exactly what the following function does: It takes an *nls* object, extracts the variables/parameter values/parameter variance-covariance matrix, creates an “augmented” covariance matrix (with the variance/covariance values from the parameters and predictor variables included, the latter often being zero), simulates from a multivariate normal distribution (using *mvrnorm* of the ‘MASS’ package), evaluates the function (object$call$formula) on the values and finally collects statistics. Here we go:

predictNLS <- function( object, newdata, level = 0.95, nsim = 10000, ... ) { require(MASS, quietly = TRUE) ## get right-hand side of formula RHS <- as.list(object$call$formula)[[3]] EXPR <- as.expression(RHS) ## all variables in model VARS <- all.vars(EXPR) ## coefficients COEF <- coef(object) ## extract predictor variable predNAME <- setdiff(VARS, names(COEF)) ## take fitted values, if 'newdata' is missing if (missing(newdata)) { newdata <- eval(object$data)[predNAME] colnames(newdata) <- predNAME } ## check that 'newdata' has same name as predVAR if (names(newdata)[1] != predNAME) stop("newdata should have name '", predNAME, "'!") ## get parameter coefficients COEF <- coef(object) ## get variance-covariance matrix VCOV <- vcov(object) ## augment variance-covariance matrix for 'mvrnorm' ## by adding a column/row for 'error in x' NCOL <- ncol(VCOV) ADD1 <- c(rep(0, NCOL)) ADD1 <- matrix(ADD1, ncol = 1) colnames(ADD1) <- predNAME VCOV <- cbind(VCOV, ADD1) ADD2 <- c(rep(0, NCOL + 1)) ADD2 <- matrix(ADD2, nrow = 1) rownames(ADD2) <- predNAME VCOV <- rbind(VCOV, ADD2) ## iterate over all entries in 'newdata' as in usual 'predict.' functions NR <- nrow(newdata) respVEC <- numeric(NR) seVEC <- numeric(NR) varPLACE <- ncol(VCOV) ## define counter function counter <- function (i) { if (i%%10 == 0) cat(i) else cat(".") if (i%%50 == 0) cat("\n") flush.console() } outMAT <- NULL for (i in 1:NR) { counter(i) ## get predictor values and optional errors predVAL <- newdata[i, 1] if (ncol(newdata) == 2) predERROR <- newdata[i, 2] else predERROR <- 0 names(predVAL) <- predNAME names(predERROR) <- predNAME ## create mean vector for 'mvrnorm' MU <- c(COEF, predVAL) ## create variance-covariance matrix for 'mvrnorm' ## by putting error^2 in lower-right position of VCOV newVCOV <- VCOV newVCOV[varPLACE, varPLACE] <- predERROR^2 ## create MC simulation matrix simMAT <- mvrnorm(n = nsim, mu = MU, Sigma = newVCOV, empirical = TRUE) ## evaluate expression on rows of simMAT EVAL <- try(eval(EXPR, envir = as.data.frame(simMAT)), silent = TRUE) if (inherits(EVAL, "try-error")) stop("There was an error evaluating the simulations!") ## collect statistics PRED <- data.frame(predVAL) colnames(PRED) <- predNAME FITTED <- predict(object, newdata = data.frame(PRED)) MEAN.sim <- mean(EVAL, na.rm = TRUE) SD.sim <- sd(EVAL, na.rm = TRUE) MEDIAN.sim <- median(EVAL, na.rm = TRUE) MAD.sim <- mad(EVAL, na.rm = TRUE) QUANT <- quantile(EVAL, c((1 - level)/2, level + (1 - level)/2)) RES <- c(FITTED, MEAN.sim, SD.sim, MEDIAN.sim, MAD.sim, QUANT[1], QUANT[2]) outMAT <- rbind(outMAT, RES) } colnames(outMAT) <- c("fit", "mean", "sd", "median", "mad", names(QUANT[1]), names(QUANT[2])) rownames(outMAT) <- NULL cat("\n") return(outMAT) }

The input is an *‘nls’* object, a data.frame *‘newdata’* of values to be predicted with

the value in the first column and (optional) “errors-in-x” (as ) in the second column.

The number of simulations can be tweaked with *nsim* as well as the *alpha*-level for the

confidence interval.

The output is (fitted value), (mean of simulation), (s.d. of simulation), (median of simulation), (mad of simulation) and the lower/upper confidence interval.

Ok, let’s go to it (taken from the ‘?nls’ documentation):

DNase1 <- subset(DNase, Run == 1)

fm1DNase1 <- nls(density ~ SSlogis(log(conc), Asym, xmid, scal), DNase1)

## usual predict.nls has no confidence intervals implemented

predict(fm1DNase1, newdata = data.frame(conc = 5), interval = "confidence")

[1] 1.243631

attr(,"gradient")

Asym xmid scal

[1,] 0.5302925 -0.5608912 -0.06804642

In the next post we will see how to use the gradient attribute to calculate a first-order Taylor expansion around …

However, *predictNLS* gives us the error and confidence interval at :

predictNLS(fm1DNase1, newdata = data.frame(conc = 5))

.

fit mean sd median mad 2.5% 97.5%

[1,] 1.243631 1.243293 0.009462893 1.243378 0.009637439 1.224608 1.261575

Interesting to see, how close the mean of the simulation comes to the actual fitted value…

We could also add some error in to propagate to :

> predictNLS(fm1DNase1, newdata = data.frame(conc = 5, error = 0.1))

.

fit mean sd median mad 2.5% 97.5%

[1,] 1.243631 1.243174 0.01467673 1.243162 0.01488567 1.214252 1.272103

Have fun. If anyone know how to calculate a “prediction interval” (maybe quantile regression) give me hint…

Cheers,

Andrej

This is a very nice post. I suspect that many people have written code to do the same, and not shared it; you have written a good general-purpose solution. Thanks for taking the effort to share it.

Dear Andrej,

thanks a lot for this post. I attempted to use it for an nls in two variables with three parameters and it fails to work unfortunately. I haven’t fully understood the code yet but it seems to me like this is limited to models of one variable, correct? B.

The error occurs in the augmentation of the covariance matrix:

Error in `colnames<-`(`*tmp*`, value = c("x", "y")) :

length of 'dimnames' [2] not equal to array extent

Dear Bartek,

have you tried the ‘predictNLS* function in my ‘propagate’ package? At the end of the ‘Examples’ section, there is an example using multiple predictor values.

Please tell me if it works for you or otherwise please post your code so I can see what I can do…

Cheers,

Andrej

Dear Andrej,

Thank you for sharing your code, and indeed it is helpful for me. I’m trying to implement it in Splus, and having some trouble. I hope you could help me out.

This is the formula of my nls model, y ~ logist(log(x),xmid,scal). The error in calling this predictNLS function comes from the EVAL statement below. It seems to me there is a mismatch between EXPR (expression(logist, log(x), xmid, scal)) and the names of simMAT (xmid, scal, and x), and the “logist” and “log()” in the formula messes up the eval(). I don’t know if my understanding is correct, and would like to hear your advice how to fix the error.

## evaluate expression on rows of simMAT

EVAL <- try(eval(EXPR, envir = as.data.frame(simMAT)), silent = TRUE)

if (inherits(EVAL, "try-error")) stop("There was an error evaluating the simulations!")

thank you,

Yasong

This is fantastic, thanks for sharing! I was able to get this working for my example after a few minor tweaks.