A better ‘nls’ (?)

Those that do a lot of nonlinear regression will love the nls function of R. In most of the cases it works really well, but there are some mishaps that can occur when using bad starting values for the parameters. One of the most dreaded is the “singular gradient matrix at initial parameter estimates” which brings the function to a stop because the gradient check in stats:::nlsModel will terminate if the QR decomposition is not of full column rank.

Nearly all nonlinear fitting programs out there use the Levenberg-Marquardt algorithm for nonlinear regression. This is because its switching between Gauss-Newton and gradient descent is highly robust against far-off-optimal starting values.  Unfortunately, the standard nls function has no LM implemented, instead it houses the Gauss-Newton type, the PORT routines and a partial linear fitter. The fabulous minpack.lm package from Katherine M. Mullen offers an R frontend to a Fortran LM implementation of the MINPACK package. The nls.lm function must be supplied with an objective function that returns a vector of residuals to be minimized. Together with Kate I developed a function nlsLM that has the gee wizz formulaic interface of nls, but it calls LM instead of Gauss-Newton. This has some advantages as we will see below.  The function returns the usual result of class ‘nls’, and due to some modifications, all standard generics work. The modifications were made so that the formula is transformed into a function that returns a vector of (weighted) residuals whose sum square is minimized by nls.lm. The optimized parameters are then transferred to stats:::nlsModel in order to obtain an object of class ‘nlsModel’. The internal C  functions C_nls_iter and stats:::nls_port_fit were removed to avoid subsequent “Gauss-Newton”, “port” or “plinear” optimization of nlsModel.

So, what’s similar and what’s, well, better…

library(minpack.lm)
### Examples from 'nls' doc ###
DNase1 <- subset(DNase, Run == 1)
fm1DNase1 <- nlsLM(density ~ Asym/(1 + exp((xmid - log(conc))/scal)), 
data = DNase1, start = list(Asym = 3, xmid = 0, scal = 1))
## all generics are applicable
coef(fm1DNase1)
confint(fm1DNase1)
deviance(fm1DNase1)
df.residual(fm1DNase1)
fitted(fm1DNase1)
formula(fm1DNase1)
logLik(fm1DNase1)
predict(fm1DNase1)
print(fm1DNase1)
profile(fm1DNase1)
residuals(fm1DNase1)
summary(fm1DNase1)
update(fm1DNase1)
vcov(fm1DNase1)
weights(fm1DNase1)

nlsLM can fit zero noise data, when nls fails:

x <- 1:10
y <- 2*x + 3                           
nls(y ~ a + b * x, start = list(a = 0.12345, b = 0.54321))
nlsLM(y ~ a + b * x, start = list(a = 0.12345, b = 0.54321))

Nonlinear regression model
model: y ~ a + b * x
data: parent.frame()
a b
3 2
residual sum-of-squares: 0

Number of iterations to convergence: 3
Achieved convergence tolerance: 1.49e-08

nlsLM often converges when nls gives the dreaded "singular gradient" error.
Example taken from here:

x <- 0:140
y <- 200 / (1 + exp(17 - x)/2) * exp(-0.02*x)
yeps <- y + rnorm(length(y), sd = 2)
nls(yeps ~ p1 / (1 + exp(p2 - x)) * exp(p4 * x), start=list(p1=410,p2=18,p4=-.03))

Optimal starting parameters work:
Nonlinear regression model
model: yeps ~ p1/(1 + exp(p2 - x)) * exp(p4 * x)
data: parent.frame()
p1 p2 p4
200.67285 16.32430 -0.02005
residual sum-of-squares: 569

Number of iterations to convergence: 6
Achieved convergence tolerance: 2.611e-07

But off-optimal parameters give error:

nls(yeps ~ p1 / (1 + exp(p2 - x)) * exp(p4 * x), start=list(p1=10,p2=18,p4=-.03))

Fehler in nls(yeps ~ p1/(1 + exp(p2 - x)) * exp(p4 * x), start = list(p1 = 10, :
singulärer Gradient

nlsLM is more robust with these starting parameters:

nlsLM(yeps ~ p1 / (1 + exp(p2 - x)) * exp(p4 * x), start=list(p1=10,p2=18,p4=-.03))

Nonlinear regression model
model: yeps ~ p1/(1 + exp(p2 - x)) * exp(p4 * x)
data: parent.frame()
p1 p2 p4
200.67285 16.32430 -0.02005
residual sum-of-squares: 569

Number of iterations to convergence: 10
Achieved convergence tolerance: 1.49e-08

Have fun!
Cheers, Andrej

 

 

16 Responses to A better ‘nls’ (?)

  1. Very useful contribution! Thanks!

  2. AlanP says:

    Great post! I’m out of nls from now. I’d kind of seen that this was “a good idea”, but never really got it until I read this. Anyone who wants to buy the Springer nls book contact me :-);

  3. krzyklo says:

    Your library saved my day! ThankYou

  4. Alvaro says:

    Hi!, i’m from Chile, i’m working on determine water retention curve by van genuchten model and i’m trying to use the modified version of nls: nlsLM function, but I have a question about how to work with ‘start’ option, I mean, how to determine initial parameters of this model?, right now I set random starting values, and works well (no warnings or errors codes), here my code:

    muestra <- data.frame(h = c(61,82,173,337,683,2356,7283,15300), theta_h = c(0.481,0.41,0.374,0.235,0.24,0.155,0.084,0.096))

    mod <- nlsLM(theta_h ~ theta_r + (theta_s – theta_r)/(1 + (alpha*h)^n)^(1-1/n),
    data = muestra,
    control = nls.lm.control(maxiter=200,options(warn=-1)),
    start = c(theta_s = 0.45, theta_r = 0.067, alpha = 0.02, n = 1.41),
    trace = TRUE)

    appreciate any help, thanks!

    • anspiess says:

      Hi Alvaro,

      I think it is safe to use theta_r as the min y value, theta_s as the max value and n = 1. The most important value is alpha, which is the slope of the curve. You might do some log transformation of your data and then regress lm(y ~ log(x)) and see if the exp(slope) of your linear regression correlates with alpha.
      But since nlsLM converges in 38 iterations, maybe this is not so important…

      Cheers,
      Andrej

  5. Alvaro says:

    Hi, thanks for your response, i noted your observations and i tested initials values of parameters with ‘preview’ functions from nlstools package and i noticed that RSS value was 0.547 with alpha = 0.02

    preview(formula = theta_h ~ theta_r + (theta_s – theta_r)/(1 + (alpha*h)^n)^(1-1/n),
    data = muestra,
    start = list(theta_r = min(muestra$theta_h),
    theta_s = max(muestra$theta_h),
    alpha = 0.02,
    n = 1)
    )

    Finding out new methods for estimating initial values for parameters I found that adding brute force and multiple starting values to nlsLM with nls2 package i could get a better RSS = 0.00474 with
    theta_r theta_s alpha n
    0.00 0.90 0.15 1.30

    initial values parameters helping to fit VG model with nlsLM, but I don’t know if this procedure will have a good estimation with another sample soils. my code is:

    # DEFINITION OF THE VAN GENUCHTEN MODEL (m = 1-1/n)

    VGm <- function(h, theta_r, theta_s, alpha, n) {
    theta_r + (theta_s – theta_r)/(1 + (alpha*h)^n)^(1-1/n)
    }

    # ESTIMATION OF INITIAL VALUES OF PARAMETERS

    #by command "preview"
    preview(formula = theta_h ~ theta_r + (theta_s – theta_r)/(1 + (alpha*h)^n)^(1-1/n),
    data = sample,
    start = list(theta_r = min(sample$theta_h),
    theta_s = max(sample$theta_h),
    alpha = 0.02,
    n = 1)
    )

    #by grid
    grid.constraints<-expand.grid(list(theta_r = seq(0,1,by=0.15),
    theta_s = seq(0, 1, by=0.15),
    alpha = seq(0, 1, by=0.15),
    n = seq(1, 10, by=0.15))
    )

    est<-nls2(sample$theta_h ~ VGm(h, theta_r, theta_s, alpha, n),
    data=sample,
    start=grid.constraints,
    algorithm="brute-force"
    )

    est

    What are your insights about this code?
    thanks, cheers

    • anspiess says:

      You don’t need to do this approach, because nlsLM is much better than those functions that use ‘nls’, i.e. nls2. When I use your original code from the first post I get the following:

      > library(minpack.lm)
      > muestra mod <- nlsLM(theta_h ~ theta_r + (theta_s – theta_r)/(1 + (alpha*h)^n)^(1-1/n),
      + data = muestra,
      + control = nls.lm.control(maxiter=200,options(warn=-1)),
      + start = c(theta_s = 0.45, theta_r = 0.067, alpha = 0.02, n = 1.41),
      + trace = TRUE)
      It. 0, RSS = 0.0279492, Par. = 0.45 0.067 0.02 1.41
      It. 1, RSS = 0.00964885, Par. = 0.673598 0.0152172 0.0410602 1.30656
      It. 2, RSS = 0.00443676, Par. = 0.776374 0.0120868 0.0669746 1.33437
      It. 3, RSS = 0.00432892, Par. = 0.849211 0.0106518 0.0884594 1.33656
      It. 4, RSS = 0.004295, Par. = 0.862772 0.0111738 0.0942901 1.33784
      It. 5, RSS = 0.00429473, Par. = 0.868621 0.0109935 0.0966623 1.33742
      It. 6, RSS = 0.00429456, Par. = 0.880399 0.010553 0.101397 1.33637
      It. 7, RSS = 0.00429436, Par. = 0.891801 0.0101741 0.106161 1.33546
      It. 8, RSS = 0.00429421, Par. = 0.898686 0.0099701 0.109139 1.33498
      It. 9, RSS = 0.00429412, Par. = 0.912448 0.00954101 0.115094 1.33397
      It. 10, RSS = 0.00429396, Par. = 0.925704 0.00917939 0.121086 1.33312
      It. 11, RSS = 0.00429383, Par. = 0.938612 0.00885276 0.1271 1.33235
      It. 12, RSS = 0.00429373, Par. = 0.951195 0.0085562 0.133136 1.33166
      It. 13, RSS = 0.00429364, Par. = 0.96347 0.00828622 0.139191 1.33104
      It. 14, RSS = 0.00429358, Par. = 0.975455 0.00803989 0.145263 1.33047
      It. 15, RSS = 0.00429352, Par. = 0.987163 0.00781404 0.151352 1.32995
      It. 16, RSS = 0.00429347, Par. = 0.99861 0.00760642 0.157455 1.32947
      It. 17, RSS = 0.00429344, Par. = 1.00981 0.00741522 0.163573 1.32904
      It. 18, RSS = 0.00429341, Par. = 1.02077 0.00723859 0.169703 1.32864
      It. 19, RSS = 0.00429338, Par. = 1.03151 0.00707512 0.175844 1.32826
      It. 20, RSS = 0.00429336, Par. = 1.04204 0.00692332 0.181997 1.32792
      It. 21, RSS = 0.00429334, Par. = 1.05236 0.00678232 0.18816 1.3276
      It. 22, RSS = 0.00429333, Par. = 1.06248 0.00665083 0.194333 1.3273
      It. 23, RSS = 0.00429332, Par. = 1.07242 0.00652803 0.200515 1.32703
      It. 24, RSS = 0.00429331, Par. = 1.08218 0.00641308 0.206705 1.32677
      It. 25, RSS = 0.0042933, Par. = 1.09177 0.00630543 0.212903 1.32652
      It. 26, RSS = 0.0042933, Par. = 1.10119 0.00620424 0.219109 1.3263
      It. 27, RSS = 0.00429329, Par. = 1.11045 0.00610895 0.225322 1.32608
      It. 28, RSS = 0.00429329, Par. = 1.11957 0.00601986 0.231541 1.32588
      It. 29, RSS = 0.00429329, Par. = 1.12853 0.00593534 0.237766 1.3257
      It. 30, RSS = 0.00429329, Par. = 1.13736 0.00585565 0.243998 1.32552
      It. 31, RSS = 0.00429328, Par. = 1.14051 0.00583095 0.24629 1.32546
      It. 32, RSS = 0.00429328, Par. = 1.14122 0.00582543 0.2468 1.32545
      It. 33, RSS = 0.00429328, Par. = 1.14132 0.00582482 0.246876 1.32545
      It. 34, RSS = 0.00429328, Par. = 1.14154 0.00582306 0.247027 1.32544
      It. 35, RSS = 0.00429328, Par. = 1.14154 0.00582299 0.247029 1.32544
      It. 36, RSS = 0.00429328, Par. = 1.14154 0.00582298 0.247027 1.32544
      It. 37, RSS = 0.00429328, Par. = 1.14153 0.00582306 0.247024 1.32544
      It. 38, RSS = 0.00429328, Par. = 1.14153 0.00582306 0.247024 1.32544

      The RSS of your last iteration is 0.00429329, which is close (and even better) than using your grid/brute-force approach.

      Cheers, Andrej.

  6. Alvaro says:

    The RSS of ‘nlsLM’ approach was possible with prior knowledge of the initial values ​​of the parameters but generally i don’t know so when I use this approach my idea is to use the result of the nls2 helping to de fine starting values for parameters for nlsLM approach, the RSS obtained was 0.00429328 converging in 23 iterations. My additional code is:

    mod <- nlsLM(theta_h ~ theta_r + (theta_s – theta_r)/(1 + (alpha*h)^n)^(1-1/n),
    data = sample,
    control = nls.lm.control(maxiter=200,options(warn=-1)),
    start = coef(est),
    trace = TRUE)

    I got almost the same result but when I define my grid I'm sacrificing more CPU resources. This grid was generated by regular sequences of constraints of each parameters. Will there be another way to get initial values ​​of quality and more efficiently? Thanks for your attention.

    Cheers

    • anspiess says:

      I know from logistic models that you can obtain fairly good starting parameters by linearizing the function using log(f) or log(log(f)) and then using the slope of a linear regression of the logged data.

      Cheers,
      Andrej

      • Alvaro says:

        I’m having a very hard time when i use this example data:
        sample <- data.frame(h = c(60,102,337,683,1020,3060,5100,15300), theta_h = c(0.493,0.455,0.344,0.300,0.282,0.158,0.137,0.120))

        mod <- nlsLM(theta_h ~ theta_r + (theta_s – theta_r)/(1 + (alpha*h)^n)^(1-1/n),
        data = sample,
        control = nls.lm.control(maxiter=200,options(warn=-1)),
        start = c(theta_r = 0.065, theta_s = 0.41, alpha = 0.075, n = 1.89),
        trace = TRUE)

        Error en nlsModel(formula, mf, start, wts) :
        singular gradient matrix at initial parameter estimates

        any idea what happened? any advice would be really really appreciated, thanks.

      • anspiess says:

        Hi Alvaro,

        your theta_r seems to be far off the optimal starting value. If I set it to 1:

        mod mod
        Nonlinear regression model
        model: theta_h ~ theta_r + (theta_s – theta_r)/(1 + (alpha * h)^n)^(1 – 1/n)
        data: sample
        theta_r theta_s alpha n
        16.6720 0.8565 3.1954 0.9955
        residual sum-of-squares: 0.004202

        Cheers,
        Andrej

  7. charles says:

    Thank you. Seriously this packaged saved me a huge amount of work.

  8. Evans Ochiaha says:

    After 2 months of struggle, finally have found a solution. Thank you for this package

  9. Orestis Kopsacheilis says:

    Hi and thanks in advance for this very useful indeed package. I have been trying to estimate simultaneously two parameters: w and a for each of a total of 40 subjects.
    Although in most cases the algorithm returns sensible values there are a few exceptions. In those exceptions I noticed that the algorithm converges after more than on average iterations and returns non-realistic values for these parameter; typically higher than normal. I noticed also, that the improvement in terms of reduction in RSS is marginal. For example, if I was to stop after the 4th iteration I would get a rather sensible value for “a” (e.g. 1.5) and value of RSS=11.5. Iif I leave it converge (e.g. until the 31st iteration) however I get a small reduction in RSS (11.3 from 11.5) at a cost of an unreasonable value for “a”, (e.g. 64!).
    What puzzles me even further is that when I check the standard errors, I notice a positive correlation between their size and the number of iterations. As if se accumulate over iterations. So for example, when I specify maxiter=4, I get a s.e=5 for “a” but when I specify maxiter=6, I get s.e=10. This happens despite RSS decrease. The same holds for w, so it’s not that there is a different allocation of errors, both se increase with more iterations.

    Any thoughts as to why this happens?

    • anspiess says:

      Hi Orestis,

      unfortunately nlsLM, as any NLS method, aims to minimize RSS and not the parameter s.e.’s… 😉
      If you type vcov(your model), can you see that your both parameters have a high covariance (off-diagonal values)?
      I often had this in asymmetric (5-parameter) sigmoidal models, where the asymmetry parameter has a high covariance to the slope parameter, and often the converged values were extremely unrealistic. The only way to avoid this is to set some reasonable bounds on the parameters, using the ‘lower’ and ‘upper’ arguments of nlsLM. However, in most of the cases if you set i.e. lower = c(-Inf, 10) and upper = c(Inf, 20) for w in [-Inf, Inf] and a in [10, 20] then the iteration will stop when a reaches either 10 or 20. At that point, it will have its lowest RSS of all values in the bounds, however not the minimal one over the complete parameter space. Can you define “sensible” boundary values?

      Cheers,
      Andrej

      • Orestis says:

        Hi Andrej and apologies for what appears to be (and is in fact) an extremely belated response. I have since found several ways to deal with this problem. Setting reasonable bounds is one of them. Thank you for your response.

Leave a reply to anspiess Cancel reply