predictNLS (Part 1, Monte Carlo simulation): confidence intervals for ‘nls’ models

Those that do a lot of nonlinear fitting with the nls function may have noticed that predict.nls does not have a way to calculate a confidence interval for the fitted value. Using confint you can obtain the error of the fit parameters, but how about the error in fitted values? ?predict.nls says: “At present and interval are ignored.” What a pity… This is largely to the fact that confidence intervals for nonlinear fits are not easily calculated and under some debate, see or In principle, since calculating the error in the fitted values is a matter of “error propagation”, two different approaches can be used:

1) Error propagation using approximation by (first-order) Taylor expansion around f(x),

2) Error propagation using Monte Carlo simulation.

Topic 1) will be subject of my next post, today we will stick with the MC approach.

When calculating the error in the fitted values, we need to propagate the error of all variables, i.e. the error in all predictor variables x_m and the error of the fit parameters \beta, to the response y = f(x_m, \beta). Often (as in the ‘Examples’ section of nls), there is no error in the x_m-values. The errors of the fit parameters \beta are obtained, together with their correlations, in the variance-covariance matrix \Sigma from vcov(object).

A Monte Carlo approach to nonlinear error propagation does the following:

1) Use as input \mu_m and \sigma_m^2 of all predictor variables and the vcov matrix \Sigma of the fit parameters \beta.

2) For each variable m, we create n samples from a multivariate normal distribution using the variance-covariance matrix: x_{m, n} \sim \mathcal{N}(\mu, \Sigma).

3) We evaluate the function on each simulated variable: y_n = f(x_{m, n}, \beta)

4) We calculate statistics (mean, s.d., median, mad) and quantile-based confidence intervals on the vector y_n.

This is exactly what the following function does: It takes an nls object, extracts the variables/parameter values/parameter variance-covariance matrix, creates an “augmented” covariance matrix (with the variance/covariance values from the parameters and predictor variables included, the latter often being zero), simulates from a multivariate normal distribution (using mvrnorm of the ‘MASS’ package), evaluates the function (object$call$formula) on the values and finally collects statistics. Here we go:

predictNLS <- function(
level = 0.95, 
nsim = 10000,
  require(MASS, quietly = TRUE)
  ## get right-hand side of formula
  RHS <- as.list(object$call$formula)[[3]]
  EXPR <- as.expression(RHS)
  ## all variables in model
  VARS <- all.vars(EXPR)
  ## coefficients
  COEF <- coef(object)
  ## extract predictor variable    
  predNAME <- setdiff(VARS, names(COEF))  
  ## take fitted values, if 'newdata' is missing
  if (missing(newdata)) {
    newdata <- eval(object$data)[predNAME]
    colnames(newdata) <- predNAME
  ## check that 'newdata' has same name as predVAR
  if (names(newdata)[1] != predNAME) stop("newdata should have name '", predNAME, "'!")
  ## get parameter coefficients
  COEF <- coef(object)
  ## get variance-covariance matrix
  VCOV <- vcov(object)
  ## augment variance-covariance matrix for 'mvrnorm' 
  ## by adding a column/row for 'error in x'
  NCOL <- ncol(VCOV)
  ADD1 <- c(rep(0, NCOL))
  ADD1 <- matrix(ADD1, ncol = 1)
  colnames(ADD1) <- predNAME
  VCOV <- cbind(VCOV, ADD1)
  ADD2 <- c(rep(0, NCOL + 1))
  ADD2 <- matrix(ADD2, nrow = 1)
  rownames(ADD2) <- predNAME
  VCOV <- rbind(VCOV, ADD2) 
  ## iterate over all entries in 'newdata' as in usual 'predict.' functions
  NR <- nrow(newdata)
  respVEC <- numeric(NR)
  seVEC <- numeric(NR)
  varPLACE <- ncol(VCOV)   
  ## define counter function
  counter <- function (i) 
    if (i%%10 == 0) 
    else cat(".")
    if (i%%50 == 0) 
  outMAT <- NULL  
  for (i in 1:NR) {
    ## get predictor values and optional errors
    predVAL <- newdata[i, 1]
    if (ncol(newdata) == 2) predERROR <- newdata[i, 2] else predERROR <- 0
    names(predVAL) <- predNAME  
    names(predERROR) <- predNAME  
    ## create mean vector for 'mvrnorm'
    MU <- c(COEF, predVAL)
    ## create variance-covariance matrix for 'mvrnorm'
    ## by putting error^2 in lower-right position of VCOV
    newVCOV <- VCOV
    newVCOV[varPLACE, varPLACE] <- predERROR^2
    ## create MC simulation matrix
    simMAT <- mvrnorm(n = nsim, mu = MU, Sigma = newVCOV, empirical = TRUE)
    ## evaluate expression on rows of simMAT
    EVAL <- try(eval(EXPR, envir =, silent = TRUE)
    if (inherits(EVAL, "try-error")) stop("There was an error evaluating the simulations!")
    ## collect statistics
    PRED <- data.frame(predVAL)
    colnames(PRED) <- predNAME   
    FITTED <- predict(object, newdata = data.frame(PRED))
    MEAN.sim <- mean(EVAL, na.rm = TRUE)
    SD.sim <- sd(EVAL, na.rm = TRUE)
    MEDIAN.sim <- median(EVAL, na.rm = TRUE)
    MAD.sim <- mad(EVAL, na.rm = TRUE)
    QUANT <- quantile(EVAL, c((1 - level)/2, level + (1 - level)/2))
    RES <- c(FITTED, MEAN.sim, SD.sim, MEDIAN.sim, MAD.sim, QUANT[1], QUANT[2])
    outMAT <- rbind(outMAT, RES)
  colnames(outMAT) <- c("fit", "mean", "sd", "median", "mad", names(QUANT[1]), names(QUANT[2]))
  rownames(outMAT) <- NULL

The input is an ‘nls’ object, a data.frame ‘newdata’ of values to be predicted with
the value x_{new} in the first column and (optional) “errors-in-x” (as \sigma) in the second column.
The number of simulations can be tweaked with nsim as well as the alpha-level for the
confidence interval.
The output is f(x_{new}, \beta) (fitted value), \mu(y_n) (mean of simulation), \sigma(y_n) (s.d. of simulation), median(y_n) (median of simulation), mad(y_n) (mad of simulation) and the lower/upper confidence interval.

Ok, let’s go to it (taken from the ‘?nls’ documentation):

DNase1 <- subset(DNase, Run == 1)
fm1DNase1 <- nls(density ~ SSlogis(log(conc), Asym, xmid, scal), DNase1)
## usual predict.nls has no confidence intervals implemented
predict(fm1DNase1, newdata = data.frame(conc = 5), interval = "confidence")
[1] 1.243631
Asym xmid scal
[1,] 0.5302925 -0.5608912 -0.06804642

In the next post we will see how to use the gradient attribute to calculate a first-order Taylor expansion around f(x)

However, predictNLS gives us the error and confidence interval at x = 5:

predictNLS(fm1DNase1, newdata = data.frame(conc = 5))
fit mean sd median mad 2.5% 97.5%
[1,] 1.243631 1.243293 0.009462893 1.243378 0.009637439 1.224608 1.261575

Interesting to see, how close the mean of the simulation comes to the actual fitted value…
We could also add some error in x to propagate to y:

> predictNLS(fm1DNase1, newdata = data.frame(conc = 5, error = 0.1))
fit mean sd median mad 2.5% 97.5%
[1,] 1.243631 1.243174 0.01467673 1.243162 0.01488567 1.214252 1.272103

Have fun. If anyone know how to calculate a “prediction interval” (maybe quantile regression) give me hint…


12 Responses to predictNLS (Part 1, Monte Carlo simulation): confidence intervals for ‘nls’ models

  1. Katharine Mullen says:

    This is a very nice post. I suspect that many people have written code to do the same, and not shared it; you have written a good general-purpose solution. Thanks for taking the effort to share it.

  2. Bartek says:

    Dear Andrej,

    thanks a lot for this post. I attempted to use it for an nls in two variables with three parameters and it fails to work unfortunately. I haven’t fully understood the code yet but it seems to me like this is limited to models of one variable, correct? B.

    • Bartek says:

      The error occurs in the augmentation of the covariance matrix:

      Error in `colnames<-`(`*tmp*`, value = c("x", "y")) :
      length of 'dimnames' [2] not equal to array extent

    • anspiess says:

      Dear Bartek,

      have you tried the ‘predictNLS* function in my ‘propagate’ package? At the end of the ‘Examples’ section, there is an example using multiple predictor values.
      Please tell me if it works for you or otherwise please post your code so I can see what I can do…


      • Jason says:

        I have had the same issue both with my own ns function (which has 2 predictor variables) and the example with multiple predictor variables in the propagate package. I am using R in Linux.

      • anspiess says:

        Could you supply some code example for me to check?

  3. Yasong Lu says:

    Dear Andrej,

    Thank you for sharing your code, and indeed it is helpful for me. I’m trying to implement it in Splus, and having some trouble. I hope you could help me out.

    This is the formula of my nls model, y ~ logist(log(x),xmid,scal). The error in calling this predictNLS function comes from the EVAL statement below. It seems to me there is a mismatch between EXPR (expression(logist, log(x), xmid, scal)) and the names of simMAT (xmid, scal, and x), and the “logist” and “log()” in the formula messes up the eval(). I don’t know if my understanding is correct, and would like to hear your advice how to fix the error.

    ## evaluate expression on rows of simMAT
    EVAL <- try(eval(EXPR, envir =, silent = TRUE)
    if (inherits(EVAL, "try-error")) stop("There was an error evaluating the simulations!")

    thank you,

  4. […] the argument interval="confidence", but that argument is not yet implemented in R (there is some uncertainty about how best to do it). Instead I can use a Monte Carlo procedure developed by A. N. […]

  5. beckmw says:

    This is fantastic, thanks for sharing! I was able to get this working for my example after a few minor tweaks.

  6. Dear Andrej,

    I apologize on my R-inexperience, but I am trying to use you package and you function for propagate my uncertainties in experimental data (the predictors ?) for getting the real uncertainty in the estimated parameters, more than just the standard error that gives the NLS summary. I do not how to deal with “…Error in predictNLS(M1b, df): newdata should have name(s) YNbifX1kw!…” (Y are my x-axis componentes, Nbif and kw are numeric values that that have an associated uncertainty as the X1 values does. X1 are mi data to be modeled).

  7. Sam H says:

    Hi, this function worked straight away and it will be many years until I can write something like that, so many thanks for posting.

    I used it with SSfpl() for some growth data, a nutritional response. Because of the real-world situation I dont have data that covers the regions of the curve for the lower asymtote (Parameter A). As such the 95CIs for lower values of X are large (at least i think this is the cause).

    What is the direction I should/could take here?
    My initial thoughts:
    1. Abandon the model (it passes the assumptions listed by Ritz and Streibig (2008)) and use one with simpler parameter structure, however these do not fit my data (based on RSS and AIC) as well as the fpl() as fpl has the scale parameter which alters the bending.
    2. Is it possible to say that the lower asymptote is known with certainty? (not ideal but for my data you would not be able to define it, so my logic says that the best lower asymptote for my data could be fixed with zero uncertainty).

    If anyone knows a paper or discussion with this function would be amazing, thank you.

  8. Tom Wenseleers says:

    You may be interested to know that the investr CRAN package has methods to calculate both confidence and prediction intervals for nls models, see

Leave a Reply to Tom Wenseleers Cancel reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: