I’ll take my NLS with weights, please…

Today I want to advocate weighted nonlinear regression. Why so?
Minimum-variance estimation of the adjustable parameters in linear and non-linear least squares requires that the data be weighted inversely as their variances w_i \propto \sigma^{-2}. Only then \hat{\beta} is the BLUE (Best Linear Unbiased Estimator) for linear regression and nonlinear regression with small errors (http://en.wikipedia.org/wiki/Weighted_least_squares#Weighted_least_squares), an important fact frequently neglected, especially in scenarios with heteroscedasticity.
The variance of a fit s^2 is also characterized by the statistic \chi^2 defined as followed:
\chi^2 \equiv \sum_{i=1}^n \frac{(y_i - f(x_i))^2}{\sigma_i^2}
The relationship between s^2 and \chi^2 can be seen most easily by comparison with the reduced \chi^2: \chi_\nu^2 = \frac{\chi^2}{\nu} = \frac{s^2}{\langle \sigma_i^2 \rangle}
whereas \nu = degrees of freedom (N – p), and \langle \sigma_i^2 \rangle is the weighted average of the individual variances. If the fitting function is a good approximation to the parent function, the value of the reduced chi-square should be approximately unity, \chi_\nu^2 = 1. If the fitting function is not appropriate for describing the data, the deviations will be larger and the estimated variance will be too large, yielding a value greater than 1. A value less than 1 can be a consequence of the fact that there exists an uncertainty in the determination of s^2, and the observed values of \chi_\nu^2 will fluctuate from experiment to experiment. To assign significance to the \chi^2 value, one can use the integral probability P_\chi(\chi^2;\nu) = \int_{\chi^2}^\infty P_\chi(x^2, \nu)dx^2 which describes the probability that a random set of n data points sampled from the parent distribution would yield a value of \chi^2 equal to or greater than the calculated one. This can be calculated by 1 - pchisq(chi^2, nu) in R.

To see that this actually works, we can Monte Carlo simulate some heteroscedastic data with defined variance as a function of y-magnitude and compare unweighted and weighted NLS.
First we take the example from the documentation to nls and fit an enzyme kinetic model:

DNase1 <- subset(DNase, Run == 1)
fm3DNase1 <- nls(density ~ Asym/(1 + exp((xmid - log(conc))/scal)),
data = DNase1,
start = list(Asym = 3, xmid = 0, scal = 1))

Then we take the fitted values \hat{y} (which are duplicated because of the initial replicates), create a new unique dataset on which we create 20 response values y_i for each concentration x sampled from a normal distribution with 2% random heteroscedastic gaussian noise as a function of the value’s magnitude y_i = \hat{y} + \mathcal{N}(0, 0.02 \cdot \hat{y}):

FITTED <- unique(fitted(fm3DNase1))
DAT <- sapply(FITTED, function(x) rnorm(20, mean = x, sd = 0.02 * x))
matplot(t(DAT), type = "p", pch = 16, lty = 1, col = 1)
lines(FITTED, col = 2)

Now we create the new dataframe to be fitted. For this we have to stack the unique x– and y_i-values into a 2-column dataframe:

CONC <- unique(DNase1$conc)
fitDAT <- data.frame(conc = rep(CONC, each = 20), density = matrix(DAT))

First we create the unweighted fit:

FIT1 <- nls(density ~ Asym/(1 + exp((xmid - log(conc))/scal)),
data = fitDAT,
start = list(Asym = 3, xmid = 0, scal = 1))

Then we fit the data with weights w = 1/var(y). IMPORTANT: we need to replicate the weight values by 20 in order to match the data length.

VAR <- tapply(fitDAT$density, fitDAT$conc, var)
VAR <- rep(VAR, each = 20)
FIT2 <- nls(density ~ Asym/(1 + exp((xmid - log(conc))/scal)),
data = fitDAT, weights = 1/VAR,
start = list(Asym = 3, xmid = 0, scal = 1))

For calculation of \chi^2_\nu and its corresponding p-value, we use the fitchisq function of my ‘qpcR’ package:

> fitchisq(FIT1)
[1] 191.7566
[1] 1.22138
[1] 0.03074883

> fitchisq(FIT2)
[1] 156.7153
[1] 0.9981866
[1] 0.4913983

Now we see the benefit of weighted fitting: Only the weighted model shows us with it’s reduced chi-square value of almost exactly 1 and its high p-value that our fitted model approximates the parent model. And of course it does, because we simulated our data from it…


6 Responses to I’ll take my NLS with weights, please…

  1. huftis says:

    Two small comments:

    Instead of messing with tapply and rep, just use the little-known but very useful ave function:

    VAR = ave(fitDAT$density, fitDAT$conc, FUN=var)

    I haven’t actually used the gnls function before, but it seems better to use this to estimate the variance at each point in the actual fitting procedure than to manually estimate them beforehand and then treating the estimates as real/known parameters. Example:

    FIT3 <- gnls(density ~ Asym/(1 + exp((xmid - log(conc))/scal)),
    data = fitDAT, weights = varIdent(form=~1|conc),
    start = list(Asym = 3, xmid = 0, scal = 1))

    But unfortunately fitchisq() doesn’t work with gnls objects.

    • anspiess says:

      Hi Karl,
      thanks for making me aware of ‘ave’, probably heavily underused because the name is not very intuitive 😉
      If you can tell me how to extract the ‘gnls’ calculated error estimates, a reduced chi-square could also be calculated from ‘gnls’ objects. There is the FIT3$apVar item (covariance matrix) however the diagonal does not even nearly match my calculated variance estimates. Any clue? Thanks in advance!

      • huftis says:

        How to extract the error estimates is not very intuitive, but here’s how to do it:

        First, let’s just check that the estimates from the summary() function agrees (approximately) with the ones calculated manually:

        SD <- tapply(fitDAT$density, fitDAT$conc, FUN=sd)

        (Look at the last line under heading Variance function:.)

        These are the relative error estimates (i.e., scaled so that the first value is 1). To extract the absolute error estimates, we can use:

        FIT3$sigma * coef(FIT3$modelStruct$varStruct, unconstrained=FALSE, allCoef=TRUE)

        The values are again very close to the values in the SD vector.

        Note that it’s possible to defined other variance functions, e.g., for letting the variance increase as a power of the mean. See ?varClasses for an overview.

  2. anspiess says:

    thanks for the description! If I use the ‘gnls’-derived variances to calculate reduced chi-square I get exactly 1, which surely comes from the estimation of the variances during the fitting procedure which minimizes residual-sum-squares so it must be 1. However, when I use my empirical variances in VAR on FIT3 I get 1.05 which is a bit inferior to FIT2. Or is a comparison this way out of line?

    • huftis says:

      I’m not sure I understand the question. The empirical variances and the variances from gnls are almost exactly the same (and the latter ones will of course change a bit too if you make some adjustments to the control argument – like for all optimisation problems), so in practice there is no problem.

      But the gnls estimates are in theory different (and if the model is correct, better) than the empirical ones. The empirical variances estimates the average square differences from the mean at each concentration, while the gnls estimates the average square differences from model mean (estimated by the fitted values). If the model is correct, the latter will be better estimators (since they always use all</em the data instead of only the data for the given concentration).

      If the model is wrong, the empirical variances might very well be better (but then the estimates means will be wrong, so what’s the use of the model?). Actually, if the model is wrong but not too wrong, I guess it’s still more useful to use the gnls estimates, as these measure the deviations from the fitted/predicted values (instead of from the ‘real’ values), which is what you’re really interested in.

      BTW, for this dataset, where we know the correct form of the variance function (it’s proportional to the mean), we could have just used varPower():

      FIT4 <- gnls(density ~ Asym/(1 + exp((xmid – log(conc))/scal)),
      data = fitDAT, weights = varPower(),
      start = list(Asym = 3, xmid = 0, scal = 1))

      The estimate of the coefficient 0.02 (from ‘sd = 0.02 * x’) is now listed as the ‘Residual standard error’.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: