Those that do a lot of nonlinear regression will love the *nls* function of R. In most of the cases it works really well, but there are some mishaps that can occur when using bad starting values for the parameters. One of the most dreaded is the “singular gradient matrix at initial parameter estimates” which brings the function to a stop because the gradient check in *stats:::nlsModel* will terminate if the QR decomposition is not of full column rank.

Nearly all nonlinear fitting programs out there use the Levenberg-Marquardt algorithm for nonlinear regression. This is because its switching between Gauss-Newton and gradient descent is highly robust against far-off-optimal starting values. Unfortunately, the standard *nls* function has no LM implemented, instead it houses the Gauss-Newton type, the PORT routines and a partial linear fitter. The fabulous minpack.lm package from Katherine M. Mullen offers an *R* frontend to a Fortran LM implementation of the MINPACK package. The *nls.lm* function must be supplied with an objective function that returns a vector of residuals to be minimized. Together with Kate I developed a function *nlsLM* that has the gee wizz formulaic interface of *nls*, but it calls LM instead of Gauss-Newton. This has some advantages as we will see below. The function returns the usual result of class ‘nls’, and due to some modifications, all standard generics work. The modifications were made so that the `formula`

is transformed into a function that returns a vector of (weighted) residuals whose sum square is minimized by * nls.lm*. The optimized parameters are then transferred to

*in order to obtain an object of class ‘nlsModel’. The internal C functions*

` stats:::nlsModel`

`C_nls_iter`

and `stats:::nls_port_fit`

were removed to avoid subsequent “Gauss-Newton”, “port” or “plinear” optimization of `nlsModel`

.So, what’s similar and what’s, well, better…

library(minpack.lm) ### Examples from 'nls' doc ### DNase1 <- subset(DNase, Run == 1) fm1DNase1 <- nlsLM(density ~ Asym/(1 + exp((xmid - log(conc))/scal)), data = DNase1, start = list(Asym = 3, xmid = 0, scal = 1)) ## all generics are applicable coef(fm1DNase1) confint(fm1DNase1) deviance(fm1DNase1) df.residual(fm1DNase1) fitted(fm1DNase1) formula(fm1DNase1) logLik(fm1DNase1) predict(fm1DNase1) print(fm1DNase1) profile(fm1DNase1) residuals(fm1DNase1) summary(fm1DNase1) update(fm1DNase1) vcov(fm1DNase1) weights(fm1DNase1)

nlsLMcan fit zero noise data, whennlsfails:x <- 1:10 y <- 2*x + 3 nls(y ~ a + b * x, start = list(a = 0.12345, b = 0.54321)) nlsLM(y ~ a + b * x, start = list(a = 0.12345, b = 0.54321))Nonlinear regression model

model: y ~ a + b * x

data: parent.frame()

a b

3 2

residual sum-of-squares: 0Number of iterations to convergence: 3

Achieved convergence tolerance: 1.49e-08

nlsLMoften converges whennlsgives the dreaded "singular gradient" error.

Example taken from here:x <- 0:140 y <- 200 / (1 + exp(17 - x)/2) * exp(-0.02*x) yeps <- y + rnorm(length(y), sd = 2) nls(yeps ~ p1 / (1 + exp(p2 - x)) * exp(p4 * x), start=list(p1=410,p2=18,p4=-.03))Optimal starting parameters work:

Nonlinear regression model

model: yeps ~ p1/(1 + exp(p2 - x)) * exp(p4 * x)

data: parent.frame()

p1 p2 p4

200.67285 16.32430 -0.02005

residual sum-of-squares: 569Number of iterations to convergence: 6

Achieved convergence tolerance: 2.611e-07But off-optimal parameters give error:

nls(yeps ~ p1 / (1 + exp(p2 - x)) * exp(p4 * x), start=list(p1=10,p2=18,p4=-.03))Fehler in nls(yeps ~ p1/(1 + exp(p2 - x)) * exp(p4 * x), start = list(p1 = 10, :

singulärer GradientnlsLM is more robust with these starting parameters:

nlsLM(yeps ~ p1 / (1 + exp(p2 - x)) * exp(p4 * x), start=list(p1=10,p2=18,p4=-.03))Nonlinear regression model

model: yeps ~ p1/(1 + exp(p2 - x)) * exp(p4 * x)

data: parent.frame()

p1 p2 p4

200.67285 16.32430 -0.02005

residual sum-of-squares: 569Number of iterations to convergence: 10

Achieved convergence tolerance: 1.49e-08Have fun!

Cheers, Andrej

Very useful contribution! Thanks!

Great post! I’m out of nls from now. I’d kind of seen that this was “a good idea”, but never really got it until I read this. Anyone who wants to buy the Springer nls book contact me :-);

Your library saved my day! ThankYou

Hi!, i’m from Chile, i’m working on determine water retention curve by van genuchten model and i’m trying to use the modified version of nls: nlsLM function, but I have a question about how to work with ‘start’ option, I mean, how to determine initial parameters of this model?, right now I set random starting values, and works well (no warnings or errors codes), here my code:

muestra <- data.frame(h = c(61,82,173,337,683,2356,7283,15300), theta_h = c(0.481,0.41,0.374,0.235,0.24,0.155,0.084,0.096))

mod <- nlsLM(theta_h ~ theta_r + (theta_s – theta_r)/(1 + (alpha*h)^n)^(1-1/n),

data = muestra,

control = nls.lm.control(maxiter=200,options(warn=-1)),

start = c(theta_s = 0.45, theta_r = 0.067, alpha = 0.02, n = 1.41),

trace = TRUE)

appreciate any help, thanks!

Hi Alvaro,

I think it is safe to use theta_r as the min y value, theta_s as the max value and n = 1. The most important value is alpha, which is the slope of the curve. You might do some log transformation of your data and then regress lm(y ~ log(x)) and see if the exp(slope) of your linear regression correlates with alpha.

But since nlsLM converges in 38 iterations, maybe this is not so important…

Cheers,

Andrej

Hi, thanks for your response, i noted your observations and i tested initials values of parameters with ‘preview’ functions from nlstools package and i noticed that RSS value was 0.547 with alpha = 0.02

preview(formula = theta_h ~ theta_r + (theta_s – theta_r)/(1 + (alpha*h)^n)^(1-1/n),

data = muestra,

start = list(theta_r = min(muestra$theta_h),

theta_s = max(muestra$theta_h),

alpha = 0.02,

n = 1)

)

Finding out new methods for estimating initial values for parameters I found that adding brute force and multiple starting values to nlsLM with nls2 package i could get a better RSS = 0.00474 with

theta_r theta_s alpha n

0.00 0.90 0.15 1.30

initial values parameters helping to fit VG model with nlsLM, but I don’t know if this procedure will have a good estimation with another sample soils. my code is:

# DEFINITION OF THE VAN GENUCHTEN MODEL (m = 1-1/n)

VGm <- function(h, theta_r, theta_s, alpha, n) {

theta_r + (theta_s – theta_r)/(1 + (alpha*h)^n)^(1-1/n)

}

# ESTIMATION OF INITIAL VALUES OF PARAMETERS

#by command "preview"

preview(formula = theta_h ~ theta_r + (theta_s – theta_r)/(1 + (alpha*h)^n)^(1-1/n),

data = sample,

start = list(theta_r = min(sample$theta_h),

theta_s = max(sample$theta_h),

alpha = 0.02,

n = 1)

)

#by grid

grid.constraints<-expand.grid(list(theta_r = seq(0,1,by=0.15),

theta_s = seq(0, 1, by=0.15),

alpha = seq(0, 1, by=0.15),

n = seq(1, 10, by=0.15))

)

est<-nls2(sample$theta_h ~ VGm(h, theta_r, theta_s, alpha, n),

data=sample,

start=grid.constraints,

algorithm="brute-force"

)

est

What are your insights about this code?

thanks, cheers

You don’t need to do this approach, because nlsLM is much better than those functions that use ‘nls’, i.e. nls2. When I use your original code from the first post I get the following:

> library(minpack.lm)

> muestra mod <- nlsLM(theta_h ~ theta_r + (theta_s – theta_r)/(1 + (alpha*h)^n)^(1-1/n),

+ data = muestra,

+ control = nls.lm.control(maxiter=200,options(warn=-1)),

+ start = c(theta_s = 0.45, theta_r = 0.067, alpha = 0.02, n = 1.41),

+ trace = TRUE)

It. 0, RSS = 0.0279492, Par. = 0.45 0.067 0.02 1.41

It. 1, RSS = 0.00964885, Par. = 0.673598 0.0152172 0.0410602 1.30656

It. 2, RSS = 0.00443676, Par. = 0.776374 0.0120868 0.0669746 1.33437

It. 3, RSS = 0.00432892, Par. = 0.849211 0.0106518 0.0884594 1.33656

It. 4, RSS = 0.004295, Par. = 0.862772 0.0111738 0.0942901 1.33784

It. 5, RSS = 0.00429473, Par. = 0.868621 0.0109935 0.0966623 1.33742

It. 6, RSS = 0.00429456, Par. = 0.880399 0.010553 0.101397 1.33637

It. 7, RSS = 0.00429436, Par. = 0.891801 0.0101741 0.106161 1.33546

It. 8, RSS = 0.00429421, Par. = 0.898686 0.0099701 0.109139 1.33498

It. 9, RSS = 0.00429412, Par. = 0.912448 0.00954101 0.115094 1.33397

It. 10, RSS = 0.00429396, Par. = 0.925704 0.00917939 0.121086 1.33312

It. 11, RSS = 0.00429383, Par. = 0.938612 0.00885276 0.1271 1.33235

It. 12, RSS = 0.00429373, Par. = 0.951195 0.0085562 0.133136 1.33166

It. 13, RSS = 0.00429364, Par. = 0.96347 0.00828622 0.139191 1.33104

It. 14, RSS = 0.00429358, Par. = 0.975455 0.00803989 0.145263 1.33047

It. 15, RSS = 0.00429352, Par. = 0.987163 0.00781404 0.151352 1.32995

It. 16, RSS = 0.00429347, Par. = 0.99861 0.00760642 0.157455 1.32947

It. 17, RSS = 0.00429344, Par. = 1.00981 0.00741522 0.163573 1.32904

It. 18, RSS = 0.00429341, Par. = 1.02077 0.00723859 0.169703 1.32864

It. 19, RSS = 0.00429338, Par. = 1.03151 0.00707512 0.175844 1.32826

It. 20, RSS = 0.00429336, Par. = 1.04204 0.00692332 0.181997 1.32792

It. 21, RSS = 0.00429334, Par. = 1.05236 0.00678232 0.18816 1.3276

It. 22, RSS = 0.00429333, Par. = 1.06248 0.00665083 0.194333 1.3273

It. 23, RSS = 0.00429332, Par. = 1.07242 0.00652803 0.200515 1.32703

It. 24, RSS = 0.00429331, Par. = 1.08218 0.00641308 0.206705 1.32677

It. 25, RSS = 0.0042933, Par. = 1.09177 0.00630543 0.212903 1.32652

It. 26, RSS = 0.0042933, Par. = 1.10119 0.00620424 0.219109 1.3263

It. 27, RSS = 0.00429329, Par. = 1.11045 0.00610895 0.225322 1.32608

It. 28, RSS = 0.00429329, Par. = 1.11957 0.00601986 0.231541 1.32588

It. 29, RSS = 0.00429329, Par. = 1.12853 0.00593534 0.237766 1.3257

It. 30, RSS = 0.00429329, Par. = 1.13736 0.00585565 0.243998 1.32552

It. 31, RSS = 0.00429328, Par. = 1.14051 0.00583095 0.24629 1.32546

It. 32, RSS = 0.00429328, Par. = 1.14122 0.00582543 0.2468 1.32545

It. 33, RSS = 0.00429328, Par. = 1.14132 0.00582482 0.246876 1.32545

It. 34, RSS = 0.00429328, Par. = 1.14154 0.00582306 0.247027 1.32544

It. 35, RSS = 0.00429328, Par. = 1.14154 0.00582299 0.247029 1.32544

It. 36, RSS = 0.00429328, Par. = 1.14154 0.00582298 0.247027 1.32544

It. 37, RSS = 0.00429328, Par. = 1.14153 0.00582306 0.247024 1.32544

It. 38, RSS = 0.00429328, Par. = 1.14153 0.00582306 0.247024 1.32544

The RSS of your last iteration is 0.00429329, which is close (and even better) than using your grid/brute-force approach.

Cheers, Andrej.

The RSS of ‘nlsLM’ approach was possible with prior knowledge of the initial values of the parameters but generally i don’t know so when I use this approach my idea is to use the result of the nls2 helping to define starting values for parameters for nlsLM approach, the RSS obtained was 0.00429328 converging in 23 iterations. My additional code is:

mod <- nlsLM(theta_h ~ theta_r + (theta_s – theta_r)/(1 + (alpha*h)^n)^(1-1/n),

data = sample,

control = nls.lm.control(maxiter=200,options(warn=-1)),

start = coef(est),

trace = TRUE)

I got almost the same result but when I define my grid I'm sacrificing more CPU resources. This grid was generated by regular sequences of constraints of each parameters. Will there be another way to get initial values of quality and more efficiently? Thanks for your attention.

Cheers

I know from logistic models that you can obtain fairly good starting parameters by linearizing the function using log(f) or log(log(f)) and then using the slope of a linear regression of the logged data.

Cheers,

Andrej

I’m having a very hard time when i use this example data:

sample <- data.frame(h = c(60,102,337,683,1020,3060,5100,15300), theta_h = c(0.493,0.455,0.344,0.300,0.282,0.158,0.137,0.120))

mod <- nlsLM(theta_h ~ theta_r + (theta_s – theta_r)/(1 + (alpha*h)^n)^(1-1/n),

data = sample,

control = nls.lm.control(maxiter=200,options(warn=-1)),

start = c(theta_r = 0.065, theta_s = 0.41, alpha = 0.075, n = 1.89),

trace = TRUE)

Error en nlsModel(formula, mf, start, wts) :

singular gradient matrix at initial parameter estimates

any idea what happened? any advice would be really really appreciated, thanks.

Hi Alvaro,

your theta_r seems to be far off the optimal starting value. If I set it to 1:

mod mod

Nonlinear regression model

model: theta_h ~ theta_r + (theta_s – theta_r)/(1 + (alpha * h)^n)^(1 – 1/n)

data: sample

theta_r theta_s alpha n

16.6720 0.8565 3.1954 0.9955

residual sum-of-squares: 0.004202

Cheers,

Andrej

Thank you. Seriously this packaged saved me a huge amount of work.

After 2 months of struggle, finally have found a solution. Thank you for this package

Hi and thanks in advance for this very useful indeed package. I have been trying to estimate simultaneously two parameters: w and a for each of a total of 40 subjects.

Although in most cases the algorithm returns sensible values there are a few exceptions. In those exceptions I noticed that the algorithm converges after more than on average iterations and returns non-realistic values for these parameter; typically higher than normal. I noticed also, that the improvement in terms of reduction in RSS is marginal. For example, if I was to stop after the 4th iteration I would get a rather sensible value for “a” (e.g. 1.5) and value of RSS=11.5. Iif I leave it converge (e.g. until the 31st iteration) however I get a small reduction in RSS (11.3 from 11.5) at a cost of an unreasonable value for “a”, (e.g. 64!).

What puzzles me even further is that when I check the standard errors, I notice a positive correlation between their size and the number of iterations. As if se accumulate over iterations. So for example, when I specify maxiter=4, I get a s.e=5 for “a” but when I specify maxiter=6, I get s.e=10. This happens despite RSS decrease. The same holds for w, so it’s not that there is a different allocation of errors, both se increase with more iterations.

Any thoughts as to why this happens?

Hi Orestis,

unfortunately nlsLM, as any NLS method, aims to minimize RSS and not the parameter s.e.’s… 😉

If you type vcov(your model), can you see that your both parameters have a high covariance (off-diagonal values)?

I often had this in asymmetric (5-parameter) sigmoidal models, where the asymmetry parameter has a high covariance to the slope parameter, and often the converged values were extremely unrealistic. The only way to avoid this is to set some reasonable bounds on the parameters, using the ‘lower’ and ‘upper’ arguments of nlsLM. However, in most of the cases if you set i.e. lower = c(-Inf, 10) and upper = c(Inf, 20) for w in [-Inf, Inf] and a in [10, 20] then the iteration will stop when a reaches either 10 or 20. At that point, it will have its lowest RSS of all values in the bounds, however not the minimal one over the complete parameter space. Can you define “sensible” boundary values?

Cheers,

Andrej