## A weighting function for ‘nls’ / ‘nlsLM’

Standard nonlinear regression assumes homoscedastic data, that is, all response values $y_i$ are distributed normally.  In case of heteroscedastic data (i.e. when the variance is dependent on the magnitude of the data), weighting the fit is essential. In nls (or nlsLM of the minpack.lm package), weighting can be conducted by two different methods:

1) by supplying a vector of weighting values $w_i$ for every $y_i$ that is supplied to the ‘weights’ argument.

2) by modifying the objective function in a way that it has the weights included in the body of the function. An example is given for this one in the documentation to nls.

1) is easy of course, but all $w_i$ values have to be derived from somewhere, for example the reciproce of the response values $w_i = 1/y_i$. 2) is a bit more cumbersome and one does not always want to define a new weighted version of the objective function.

I found it about time to design a weighting function that can be supplied to the ‘weights’ argument and that makes weighting by different regimes a breeze. wfct() returns a vector of weights that are calculated from a user-defined expression and transfers this vector within nls.

wfct <- function(expr)
{
expr <- deparse(substitute(expr))

## create new environment
newEnv <- new.env()

## get call
mc <- sys.calls()[]
mcL <- as.list(mc)

## get data and write to newEnv
DATA <- mcL[["data"]]
DATA <- eval(DATA)
DATA <- as.list(DATA)
NAMES <- names(DATA)
for (i in 1:length(DATA)) assign(NAMES[i], DATA[[i]], envir = newEnv)

## get parameter, response and predictor names
formula <- as.formula(mcL[])
VARS <- all.vars(formula)
RESP <- VARS
RHS <- VARS[-1]
PRED <- match(RHS, names(DATA))
PRED <- names(DATA)[na.omit(PRED)]

## calculate variances for response values if "error" is in expression
## and write to newEnv
if (length(grep("error", expr)) > 0) {
y <- DATA[[RESP]]
x <- DATA[[PRED]]
## test for replication
if (!any(duplicated(x))) stop("No replicates available to calculate error from!")
## calculate error
error <- tapply(y, x, function(e) var(e, na.rm = TRUE))
error <- as.numeric(sqrt(error))
## convert to original repititions
error <- rep(error, as.numeric(table(x)))
assign("error", error, envir = newEnv)
}

## calculate fitted or residual values if "fitted"/"resid" is in expression
## and write to newEnv
if (length(grep("fitted", expr)) > 0 || length(grep("resid", expr)) > 0) {
mc2 <- mc
mc2$weights <- NULL MODEL <- eval(mc2) fitted <- fitted(MODEL) resid <- residuals(MODEL) assign("fitted", fitted, newEnv) assign("resid", resid, newEnv) } ## return evaluation in newEnv: vector of weights OUT <- eval(parse(text = expr), envir = newEnv) return(OUT) }  The weighting function can take 5 different variable definitions and combinations thereof: the name of the predictor (independent) variable, the name of the response (dependent) variable, error: if replicates $y_{ij}$ exist, the error $\sigma(y_{ij})$, fitted: the fitted values $\hat{y}_i$ of the model, resid: the residuals $y_i - \hat{y}_i$ of the model, For the last two, the model is fit unweighted, fitted values and residuals are extracted and the model is refit by the defined weights. Let’s have some examples (taken from the nls doc): Treated <- Puromycin[Puromycin$state == "treated", ]

Weighting by inverse of response $1/y_i$:

nls(rate ~ Vm * conc/(K + conc), data = Treated,
start = c(Vm = 200, K = 0.05), weights = wfct(1/rate))
Nonlinear regression model
model:  rate ~ Vm * conc/(K + conc)
data:  Treated
Vm         K
209.59669   0.06065
weighted residual sum-of-squares: 12.27

Number of iterations to convergence: 5
Achieved convergence tolerance: 5.2e-06


Weighting by square root of predictor $\sqrt{x_i}$:

nls(rate ~ Vm * conc/(K + conc), data = Treated,
start = c(Vm = 200, K = 0.05), weights = wfct(sqrt(conc)))
Nonlinear regression model
model:  rate ~ Vm * conc/(K + conc)
data:  Treated
Vm         K
216.74134   0.07106
weighted residual sum-of-squares: 332.6

Number of iterations to convergence: 5
Achieved convergence tolerance: 2.403e-06


Weighting by inverse square of fitted values $1/\hat{y_i}^2$:

nls(rate ~ Vm * conc/(K + conc), data = Treated,
start = c(Vm = 200, K = 0.05), weights = wfct(1/fitted^2))
Nonlinear regression model
model:  rate ~ Vm * conc/(K + conc)
data:  Treated
Vm         K
200.84098   0.04943
weighted residual sum-of-squares: 0.2272

Number of iterations to convergence: 4
Achieved convergence tolerance: 1.47e-06


A common weighting regime is to use the inverse variance $1/\sigma^2(y_i)$:

### 6 Responses to A weighting function for ‘nls’ / ‘nlsLM’

1. ikuyasu says:

Great article! It is just a little bit difficult to read inline math symbols. Maybe the font is too small, so the line of letters gets thin.

2. René Wuttke says:

Fantastic article/code. That saved me a lot of time. Thanks a lot.

3. s says:

Hi Andrej, I am trying to use the wfct function in a non-linear regression analysis. However, the cost function should be the sum of squared error (SEE) weighted by the uncertainty which gives you something like weights= wfct(sum(((fitted-observation)/Uncertainty)^2))). The uncertainty term is included in my dataset. I got an error which says variable lengths differ (found for ‘(weights)’)

4. sbesnard says:

Hi Andrej, I am trying to use the wfct function in a non-linear regression analysis. However, the cost function should be the sum of squared error (SEE) weighted by the uncertainty which gives you something like weights= wfct(sum(((fitted-observation)/Uncertainty)^2))). The uncertainty term is included in my dataset. I got an error which says variable lengths differ (found for ‘(weights)’). Any ideas why?

• anspiess says:

Hi, with wfct(sum(((fitted-observation)/Uncertainty)^2))) you are creating ONE value (the sum of all residuals weighted by uncertainty squared). That’s why the error is thrown (length not the same as predictor vector length). You need a weighting vector of length(predictor), so if you have uncertainty for all y_i, the correct would be

wfct(sum((fitted-observation)^2)/Uncertainty)

Cheers,
Andrej

5. YUNG-JIN LEE says:

I find the function wfct() only works R console mode using copy-and paste or type it line by line. If I include two lines:

Treated <- Puromycin[Puromycin\$state == "treated"]
nlsLM(rate ~ Vm * conc/(K + conc), data = Treated,
start = c(Vm = 200, K = 0.05), weights = wfct(1/fitted^2))

in a R script file, called nlsLM.wfct.R. Then I open R console (Windows 10) and type ‘source(“nlsLM.wfct.R”) to run it. I would get the error messages as “Error in DATA[[i]] : subscript out of bounds” Any clue about this? Thanks in advanced.