A better ‘nls’ (?)

Those that do a lot of nonlinear regression will love the nls function of R. In most of the cases it works really well, but there are some mishaps that can occur when using bad starting values for the parameters. One of the most dreaded is the “singular gradient matrix at initial parameter estimates” which brings the function to a stop because the gradient check in stats:::nlsModel will terminate if the QR decomposition is not of full column rank.

Nearly all nonlinear fitting programs out there use the Levenberg-Marquardt algorithm for nonlinear regression. This is because its switching between Gauss-Newton and gradient descent is highly robust against far-off-optimal starting values. Unfortunately, the standard nls function has no LM implemented, instead it houses the Gauss-Newton type, the PORT routines and a partial linear fitter. The fabulous minpack.lm package from Katherine M. Mullen offers an R frontend to a Fortran LM implementation of the MINPACK package. The nls.lm function must be supplied with an objective function that returns a vector of residuals to be minimized. Together with Kate I developed a function nlsLM that has the gee wizz formulaic interface of nls, but it calls LM instead of Gauss-Newton. This has some advantages as we will see below. The function returns the usual result of class ‘nls’, and due to some modifications, all standard generics work. The modifications were made so that the formula is transformed into a function that returns a vector of (weighted) residuals whose sum square is minimized by nls.lm. The optimized parameters are then transferred to stats:::nlsModel in order to obtain an object of class ‘nlsModel’. The internal C functions C_nls_iter and stats:::nls_port_fit were removed to avoid subsequent “Gauss-Newton”, “port” or “plinear” optimization of nlsModel.

So, what’s similar and what’s, well, better…

library(minpack.lm)
### Examples from 'nls' doc ###
DNase1 <- subset(DNase, Run == 1)
fm1DNase1 <- nlsLM(density ~ Asym/(1 + exp((xmid - log(conc))/scal)), 
data = DNase1, start = list(Asym = 3, xmid = 0, scal = 1))
## all generics are applicable
coef(fm1DNase1)
confint(fm1DNase1)
deviance(fm1DNase1)
df.residual(fm1DNase1)
fitted(fm1DNase1)
formula(fm1DNase1)
logLik(fm1DNase1)
predict(fm1DNase1)
print(fm1DNase1)
profile(fm1DNase1)
residuals(fm1DNase1)
summary(fm1DNase1)
update(fm1DNase1)
vcov(fm1DNase1)
weights(fm1DNase1)

nlsLM can fit zero noise data, when nls fails:
x <- 1:10
y <- 2*x + 3                           
nls(y ~ a + b * x, start = list(a = 0.12345, b = 0.54321))
nlsLM(y ~ a + b * x, start = list(a = 0.12345, b = 0.54321))

Nonlinear regression model

  model:  y ~ a + b * x

   data:  parent.frame()

a b

3 2

 residual sum-of-squares: 0
Number of iterations to convergence: 3

Achieved convergence tolerance: 1.49e-08 
nlsLM often converges when nls gives the dreaded "singular gradient" error.

Example taken from here:
x <- 0:140
y <- 200 / (1 + exp(17 - x)/2) * exp(-0.02*x)
yeps <- y + rnorm(length(y), sd = 2)
nls(yeps ~ p1 / (1 + exp(p2 - x)) * exp(p4 * x), start=list(p1=410,p2=18,p4=-.03))

Optimal starting parameters work:

Nonlinear regression model

  model:  yeps ~ p1/(1 + exp(p2 - x)) * exp(p4 * x)

   data:  parent.frame()

       p1        p2        p4

200.67285  16.32430  -0.02005

 residual sum-of-squares: 569
Number of iterations to convergence: 6

Achieved convergence tolerance: 2.611e-07 
But off-optimal parameters give error:
nls(yeps ~ p1 / (1 + exp(p2 - x)) * exp(p4 * x), start=list(p1=10,p2=18,p4=-.03))

Fehler in nls(yeps ~ p1/(1 + exp(p2 - x)) * exp(p4 * x), start = list(p1 = 10,  :

  singulärer Gradient
nlsLM is more robust with these starting parameters:
nlsLM(yeps ~ p1 / (1 + exp(p2 - x)) * exp(p4 * x), start=list(p1=10,p2=18,p4=-.03))

Nonlinear regression model

  model:  yeps ~ p1/(1 + exp(p2 - x)) * exp(p4 * x)

   data:  parent.frame()

       p1        p2        p4

200.67285  16.32430  -0.02005

 residual sum-of-squares: 569
Number of iterations to convergence: 10

Achieved convergence tolerance: 1.49e-08 
Have fun!

Cheers, Andrej
 
 

		
	
			
			
			Share this:
Twitter
Facebook
Like Loading...


	Related

				
				
					
					This entry was posted  on Thursday, July 5th, 2012 at 9:50 pm and is filed under R Internals.					You can follow any responses to this entry through the RSS 2.0 feed.
											You can leave a response, or trackback from your own site.
					
					
				

				
					Post navigation
					« Previous Post
					Next Post »

16 Responses to A better ‘nls’ (?)

Tobias Verbeke says:

July 6, 2012 at 9:04 am

Very useful contribution! Thanks!

Reply
AlanP says:

July 6, 2012 at 10:27 pm

Great post! I’m out of nls from now. I’d kind of seen that this was “a good idea”, but never really got it until I read this. Anyone who wants to buy the Springer nls book contact me :-);

Reply
krzyklo says:

January 7, 2013 at 7:53 pm

Your library saved my day! ThankYou

Reply
Alvaro says:

September 23, 2013 at 6:51 am

Hi!, i’m from Chile, i’m working on determine water retention curve by van genuchten model and i’m trying to use the modified version of nls: nlsLM function, but I have a question about how to work with ‘start’ option, I mean, how to determine initial parameters of this model?, right now I set random starting values, and works well (no warnings or errors codes), here my code:

muestra <- data.frame(h = c(61,82,173,337,683,2356,7283,15300), theta_h = c(0.481,0.41,0.374,0.235,0.24,0.155,0.084,0.096))

mod <- nlsLM(theta_h ~ theta_r + (theta_s – theta_r)/(1 + (alpha*h)^n)^(1-1/n),
data = muestra,
control = nls.lm.control(maxiter=200,options(warn=-1)),
start = c(theta_s = 0.45, theta_r = 0.067, alpha = 0.02, n = 1.41),
trace = TRUE)

appreciate any help, thanks!

Reply
- anspiess says:
  
  September 23, 2013 at 10:56 am
  
  Hi Alvaro,
  
  I think it is safe to use theta_r as the min y value, theta_s as the max value and n = 1. The most important value is alpha, which is the slope of the curve. You might do some log transformation of your data and then regress lm(y ~ log(x)) and see if the exp(slope) of your linear regression correlates with alpha.
  But since nlsLM converges in 38 iterations, maybe this is not so important…
  
  Cheers,
  Andrej
  
  Reply
Alvaro says:

September 24, 2013 at 7:23 am

Hi, thanks for your response, i noted your observations and i tested initials values of parameters with ‘preview’ functions from nlstools package and i noticed that RSS value was 0.547 with alpha = 0.02

preview(formula = theta_h ~ theta_r + (theta_s – theta_r)/(1 + (alpha*h)^n)^(1-1/n),
data = muestra,
start = list(theta_r = min(muestra$theta_h),
theta_s = max(muestra$theta_h),
alpha = 0.02,
n = 1)
)

Finding out new methods for estimating initial values for parameters I found that adding brute force and multiple starting values to nlsLM with nls2 package i could get a better RSS = 0.00474 with
theta_r theta_s alpha n
0.00 0.90 0.15 1.30

initial values parameters helping to fit VG model with nlsLM, but I don’t know if this procedure will have a good estimation with another sample soils. my code is:

# DEFINITION OF THE VAN GENUCHTEN MODEL (m = 1-1/n)

VGm <- function(h, theta_r, theta_s, alpha, n) {
theta_r + (theta_s – theta_r)/(1 + (alpha*h)^n)^(1-1/n)
}

# ESTIMATION OF INITIAL VALUES OF PARAMETERS

#by command "preview"
preview(formula = theta_h ~ theta_r + (theta_s – theta_r)/(1 + (alpha*h)^n)^(1-1/n),
data = sample,
start = list(theta_r = min(sample$theta_h),
theta_s = max(sample$theta_h),
alpha = 0.02,
n = 1)
)

#by grid
grid.constraints<-expand.grid(list(theta_r = seq(0,1,by=0.15),
theta_s = seq(0, 1, by=0.15),
alpha = seq(0, 1, by=0.15),
n = seq(1, 10, by=0.15))
)

est<-nls2(sample$theta_h ~ VGm(h, theta_r, theta_s, alpha, n),
data=sample,
start=grid.constraints,
algorithm="brute-force"
)

est

What are your insights about this code?
thanks, cheers

Reply
- anspiess says:
  
  September 24, 2013 at 9:36 am
  
  You don’t need to do this approach, because nlsLM is much better than those functions that use ‘nls’, i.e. nls2. When I use your original code from the first post I get the following:
  
  > library(minpack.lm)
  > muestra mod <- nlsLM(theta_h ~ theta_r + (theta_s – theta_r)/(1 + (alpha*h)^n)^(1-1/n),
  + data = muestra,
  + control = nls.lm.control(maxiter=200,options(warn=-1)),
  + start = c(theta_s = 0.45, theta_r = 0.067, alpha = 0.02, n = 1.41),
  + trace = TRUE)
  It. 0, RSS = 0.0279492, Par. = 0.45 0.067 0.02 1.41
  It. 1, RSS = 0.00964885, Par. = 0.673598 0.0152172 0.0410602 1.30656
  It. 2, RSS = 0.00443676, Par. = 0.776374 0.0120868 0.0669746 1.33437
  It. 3, RSS = 0.00432892, Par. = 0.849211 0.0106518 0.0884594 1.33656
  It. 4, RSS = 0.004295, Par. = 0.862772 0.0111738 0.0942901 1.33784
  It. 5, RSS = 0.00429473, Par. = 0.868621 0.0109935 0.0966623 1.33742
  It. 6, RSS = 0.00429456, Par. = 0.880399 0.010553 0.101397 1.33637
  It. 7, RSS = 0.00429436, Par. = 0.891801 0.0101741 0.106161 1.33546
  It. 8, RSS = 0.00429421, Par. = 0.898686 0.0099701 0.109139 1.33498
  It. 9, RSS = 0.00429412, Par. = 0.912448 0.00954101 0.115094 1.33397
  It. 10, RSS = 0.00429396, Par. = 0.925704 0.00917939 0.121086 1.33312
  It. 11, RSS = 0.00429383, Par. = 0.938612 0.00885276 0.1271 1.33235
  It. 12, RSS = 0.00429373, Par. = 0.951195 0.0085562 0.133136 1.33166
  It. 13, RSS = 0.00429364, Par. = 0.96347 0.00828622 0.139191 1.33104
  It. 14, RSS = 0.00429358, Par. = 0.975455 0.00803989 0.145263 1.33047
  It. 15, RSS = 0.00429352, Par. = 0.987163 0.00781404 0.151352 1.32995
  It. 16, RSS = 0.00429347, Par. = 0.99861 0.00760642 0.157455 1.32947
  It. 17, RSS = 0.00429344, Par. = 1.00981 0.00741522 0.163573 1.32904
  It. 18, RSS = 0.00429341, Par. = 1.02077 0.00723859 0.169703 1.32864
  It. 19, RSS = 0.00429338, Par. = 1.03151 0.00707512 0.175844 1.32826
  It. 20, RSS = 0.00429336, Par. = 1.04204 0.00692332 0.181997 1.32792
  It. 21, RSS = 0.00429334, Par. = 1.05236 0.00678232 0.18816 1.3276
  It. 22, RSS = 0.00429333, Par. = 1.06248 0.00665083 0.194333 1.3273
  It. 23, RSS = 0.00429332, Par. = 1.07242 0.00652803 0.200515 1.32703
  It. 24, RSS = 0.00429331, Par. = 1.08218 0.00641308 0.206705 1.32677
  It. 25, RSS = 0.0042933, Par. = 1.09177 0.00630543 0.212903 1.32652
  It. 26, RSS = 0.0042933, Par. = 1.10119 0.00620424 0.219109 1.3263
  It. 27, RSS = 0.00429329, Par. = 1.11045 0.00610895 0.225322 1.32608
  It. 28, RSS = 0.00429329, Par. = 1.11957 0.00601986 0.231541 1.32588
  It. 29, RSS = 0.00429329, Par. = 1.12853 0.00593534 0.237766 1.3257
  It. 30, RSS = 0.00429329, Par. = 1.13736 0.00585565 0.243998 1.32552
  It. 31, RSS = 0.00429328, Par. = 1.14051 0.00583095 0.24629 1.32546
  It. 32, RSS = 0.00429328, Par. = 1.14122 0.00582543 0.2468 1.32545
  It. 33, RSS = 0.00429328, Par. = 1.14132 0.00582482 0.246876 1.32545
  It. 34, RSS = 0.00429328, Par. = 1.14154 0.00582306 0.247027 1.32544
  It. 35, RSS = 0.00429328, Par. = 1.14154 0.00582299 0.247029 1.32544
  It. 36, RSS = 0.00429328, Par. = 1.14154 0.00582298 0.247027 1.32544
  It. 37, RSS = 0.00429328, Par. = 1.14153 0.00582306 0.247024 1.32544
  It. 38, RSS = 0.00429328, Par. = 1.14153 0.00582306 0.247024 1.32544
  
  The RSS of your last iteration is 0.00429329, which is close (and even better) than using your grid/brute-force approach.
  
  Cheers, Andrej.
  
  Reply
Alvaro says:

September 24, 2013 at 2:35 pm

The RSS of ‘nlsLM’ approach was possible with prior knowledge of the initial values of the parameters but generally i don’t know so when I use this approach my idea is to use the result of the nls2 helping to define starting values for parameters for nlsLM approach, the RSS obtained was 0.00429328 converging in 23 iterations. My additional code is:

mod <- nlsLM(theta_h ~ theta_r + (theta_s – theta_r)/(1 + (alpha*h)^n)^(1-1/n),
data = sample,
control = nls.lm.control(maxiter=200,options(warn=-1)),
start = coef(est),
trace = TRUE)

I got almost the same result but when I define my grid I'm sacrificing more CPU resources. This grid was generated by regular sequences of constraints of each parameters. Will there be another way to get initial values of quality and more efficiently? Thanks for your attention.

Cheers

Reply
- anspiess says:
  
  September 24, 2013 at 4:55 pm
  
  I know from logistic models that you can obtain fairly good starting parameters by linearizing the function using log(f) or log(log(f)) and then using the slope of a linear regression of the logged data.
  
  Cheers,
  Andrej
  
  Reply
  - Alvaro says:
    
    October 7, 2013 at 7:20 am
    
    I’m having a very hard time when i use this example data:
    sample <- data.frame(h = c(60,102,337,683,1020,3060,5100,15300), theta_h = c(0.493,0.455,0.344,0.300,0.282,0.158,0.137,0.120))
    
    mod <- nlsLM(theta_h ~ theta_r + (theta_s – theta_r)/(1 + (alpha*h)^n)^(1-1/n),
    data = sample,
    control = nls.lm.control(maxiter=200,options(warn=-1)),
    start = c(theta_r = 0.065, theta_s = 0.41, alpha = 0.075, n = 1.89),
    trace = TRUE)
    
    Error en nlsModel(formula, mf, start, wts) :
    singular gradient matrix at initial parameter estimates
    
    any idea what happened? any advice would be really really appreciated, thanks.
  - anspiess says:
    
    October 7, 2013 at 4:02 pm
    
    Hi Alvaro,
    
    your theta_r seems to be far off the optimal starting value. If I set it to 1:
    
    mod mod
    Nonlinear regression model
    model: theta_h ~ theta_r + (theta_s – theta_r)/(1 + (alpha * h)^n)^(1 – 1/n)
    data: sample
    theta_r theta_s alpha n
    16.6720 0.8565 3.1954 0.9955
    residual sum-of-squares: 0.004202
    
    Cheers,
    Andrej
charles says:

September 28, 2013 at 10:52 pm

Thank you. Seriously this packaged saved me a huge amount of work.

Reply
Evans Ochiaha says:

March 9, 2015 at 7:58 pm

After 2 months of struggle, finally have found a solution. Thank you for this package

Reply
Orestis Kopsacheilis says:

May 25, 2016 at 7:54 pm

Hi and thanks in advance for this very useful indeed package. I have been trying to estimate simultaneously two parameters: w and a for each of a total of 40 subjects.
Although in most cases the algorithm returns sensible values there are a few exceptions. In those exceptions I noticed that the algorithm converges after more than on average iterations and returns non-realistic values for these parameter; typically higher than normal. I noticed also, that the improvement in terms of reduction in RSS is marginal. For example, if I was to stop after the 4th iteration I would get a rather sensible value for “a” (e.g. 1.5) and value of RSS=11.5. Iif I leave it converge (e.g. until the 31st iteration) however I get a small reduction in RSS (11.3 from 11.5) at a cost of an unreasonable value for “a”, (e.g. 64!).
What puzzles me even further is that when I check the standard errors, I notice a positive correlation between their size and the number of iterations. As if se accumulate over iterations. So for example, when I specify maxiter=4, I get a s.e=5 for “a” but when I specify maxiter=6, I get s.e=10. This happens despite RSS decrease. The same holds for w, so it’s not that there is a different allocation of errors, both se increase with more iterations.

Any thoughts as to why this happens?

Reply
- anspiess says:
  
  May 27, 2016 at 2:50 pm
  
  Hi Orestis,
  
  unfortunately nlsLM, as any NLS method, aims to minimize RSS and not the parameter s.e.’s… 😉
  If you type vcov(your model), can you see that your both parameters have a high covariance (off-diagonal values)?
  I often had this in asymmetric (5-parameter) sigmoidal models, where the asymmetry parameter has a high covariance to the slope parameter, and often the converged values were extremely unrealistic. The only way to avoid this is to set some reasonable bounds on the parameters, using the ‘lower’ and ‘upper’ arguments of nlsLM. However, in most of the cases if you set i.e. lower = c(-Inf, 10) and upper = c(Inf, 20) for w in [-Inf, Inf] and a in [10, 20] then the iteration will stop when a reaches either 10 or 20. At that point, it will have its lowest RSS of all values in the bounds, however not the minimal one over the complete parameter space. Can you define “sensible” boundary values?
  
  Cheers,
  Andrej
  
  Reply
  - Orestis says:
    
    April 12, 2018 at 12:45 pm
    
    Hi Andrej and apologies for what appears to be (and is in fact) an extremely belated response. I have since found several ways to deal with this problem. Setting reasonable bounds is one of them. Thank you for your response.