Introducing: Orthogonal Nonlinear Least-Squares Regression in R

January 18, 2015

With this post I want to introduce my newly bred ‘onls’ package which conducts Orthogonal Nonlinear Least-Squares Regression (ONLS):
http://cran.r-project.org/web/packages/onls/index.html.
Orthogonal nonlinear least squares (ONLS) is a not so frequently applied and maybe overlooked regression technique that comes into question when one encounters an “error in variables” problem. While classical nonlinear least squares (NLS) aims to minimize the sum of squared vertical residuals, ONLS minimizes the sum of squared orthogonal residuals. The method is based on finding points on the fitted line that are orthogonal to the data by minimizing for each (x_i, y_i) the Euclidean distance \|D_i\| to some point (x_{0i}, y_{0i}) on the fitted curve. There is a 25 year old FORTRAN implementation for ONLS available (ODRPACK, http://www.netlib.org/toms/869.zip), which has been included in the ‘scipy’ package for Python (http://docs.scipy.org/doc/scipy-0.14.0/reference/odr.html). Here, onls has been developed for easy future algorithm tweaking in R. The results obtained from onls are exactly similar to those found in the original implementation [1, 2]. It is based on an inner loop using optimize for each (x_i, y_i) to find \min \|D_i\| within some border [x_{i-w}, x_{i+w}] and an outer loop for the fit parameters using nls.lm of the ‘minpack’ package. Sensible starting parameters for onls are obtained by prior fitting with standard nls, as parameter values for ONLS are usually fairly similar to those from NLS.

There is a package vignette available with more details in the “/onls/inst” folder, especially on what to do if fitting fails or not all points are orthogonal. I will work through one example here, the famous DNase 1 dataset of the nls documentation, with 10% added error. The semantics are exactly as in nls, albeit with a (somewhat) different output:


> DNase1 <- subset(DNase, Run == 1)
> DNase1$density <- sapply(DNase1$density, function(x) rnorm(1, x, 0.1 * x))
> mod1 <- onls(density ~ Asym/(1 + exp((xmid - log(conc))/scal)),
data = DNase1, start = list(Asym = 3, xmid = 0, scal = 1))

Obtaining starting parameters from ordinary NLS...
Passed...
Relative error in the sum of squares is at most `ftol'.
Optimizing orthogonal NLS...
Passed... Relative error in the sum of squares is at most `ftol'.

The print.onls method gives, as in nls, the parameter values and the vertical residual sum-of-squares. However, the orthogonal residual sum-of-squares is also returned and MOST IMPORTANTLY, information on how many points (x_{0i}, y_{0i}) are actually orthogonal to (x_i, y_i) after fitting:

> print(mod1)
Nonlinear orthogonal regression model
model: density ~ Asym/(1 + exp((xmid - log(conc))/scal))
data: DNase1
Asym xmid scal
2.422 1.568 1.099
vertical residual sum-of-squares: 0.2282
orthogonal residual sum-of-squares: 0.2234
PASSED: 16 out of 16 fitted points are orthogonal.

Number of iterations to convergence: 2
Achieved convergence tolerance: 1.49e-08

Checking all points for orthogonality is accomplished using the independent checking routine check_o which calculates the angle between the slope \mathrm{m}_i of the tangent obtained from the first derivative at (x_{0i}, y_{0i}) and the slope \mathrm{n}_i of the onls-minimized Euclidean distance between (x_{0i}, y_{0i}) and (x_i, y_i):

\tan(\alpha_i) = \left|\frac{\mathrm{m}_i - \mathrm{n}_i}{1 + \mathrm{m}_i \cdot \mathrm{n}_i}\right|
\mathrm{m}_i = \frac{df(x, \beta)}{dx_{0i}}, \,\, \mathrm{n}_i = \frac{y_i - y_{0i}}{x_i - x_{0i}}
=> \alpha_i[^{\circ}] = \tan^{-1} \left( \left|\frac{\mathrm{m}_i - \mathrm{n}_i}{1 + \mathrm{m}_i \cdot \mathrm{n}_i}\right| \right) \cdot \frac{360}{2\pi} which should be 90^{\circ}, if the Euclidean distance has been minimized.

When plotting an ONLS model with the plot.onls function, it is important to know that orthogonality is only evident with equal scaling of both axes:

> plot(mod1, xlim = c(0, 0.5), ylim = c(0, 0.5))

orth

As with nls, all generics work:
print(mod1), plot(mod1), summary(mod1), predict(mod1, newdata = data.frame(conc = 6)), logLik(mod1), deviance(mod1), formula(mod1), weights(mod1), df.residual(mod1), fitted(mod1), residuals(mod1), vcov(mod1), coef(mod1), confint(mod1).
However, deviance and residuals deliver the vertical, standard NLS values. To calculate orthogonal deviance and obtain orthogonal residuals, use deviance_o and residuals_o.

[1] ALGORITHM 676 ODRPACK: Software for Weighted Orthogonal Distance Regression.
Boggs PT, Donaldson JR, Byrd RH and Schnabel RB.
ACM Trans Math Soft (1989), 15: 348-364.
[2] User’s Reference Guide for ODRPACK Version 2.01.
Software for Weighted Orthogonal Distance Regression.
Boggs PT, Byrd RH, Rogers JE and Schnabel RB.\\
NISTIR (1992), 4834: 1-113.

Cheers,
Andrej

Advertisements

I’ll take my NLS with weights, please…

January 13, 2014

Today I want to advocate weighted nonlinear regression. Why so?
Minimum-variance estimation of the adjustable parameters in linear and non-linear least squares requires that the data be weighted inversely as their variances w_i \propto \sigma^{-2}. Only then \hat{\beta} is the BLUE (Best Linear Unbiased Estimator) for linear regression and nonlinear regression with small errors (http://en.wikipedia.org/wiki/Weighted_least_squares#Weighted_least_squares), an important fact frequently neglected, especially in scenarios with heteroscedasticity.
The variance of a fit s^2 is also characterized by the statistic \chi^2 defined as followed:
\chi^2 \equiv \sum_{i=1}^n \frac{(y_i - f(x_i))^2}{\sigma_i^2}
The relationship between s^2 and \chi^2 can be seen most easily by comparison with the reduced \chi^2: \chi_\nu^2 = \frac{\chi^2}{\nu} = \frac{s^2}{\langle \sigma_i^2 \rangle}
whereas \nu = degrees of freedom (N – p), and \langle \sigma_i^2 \rangle is the weighted average of the individual variances. If the fitting function is a good approximation to the parent function, the value of the reduced chi-square should be approximately unity, \chi_\nu^2 = 1. If the fitting function is not appropriate for describing the data, the deviations will be larger and the estimated variance will be too large, yielding a value greater than 1. A value less than 1 can be a consequence of the fact that there exists an uncertainty in the determination of s^2, and the observed values of \chi_\nu^2 will fluctuate from experiment to experiment. To assign significance to the \chi^2 value, one can use the integral probability P_\chi(\chi^2;\nu) = \int_{\chi^2}^\infty P_\chi(x^2, \nu)dx^2 which describes the probability that a random set of n data points sampled from the parent distribution would yield a value of \chi^2 equal to or greater than the calculated one. This can be calculated by 1 - pchisq(chi^2, nu) in R.

To see that this actually works, we can Monte Carlo simulate some heteroscedastic data with defined variance as a function of y-magnitude and compare unweighted and weighted NLS.
First we take the example from the documentation to nls and fit an enzyme kinetic model:

DNase1 <- subset(DNase, Run == 1)
fm3DNase1 <- nls(density ~ Asym/(1 + exp((xmid - log(conc))/scal)),
data = DNase1,
start = list(Asym = 3, xmid = 0, scal = 1))

Then we take the fitted values \hat{y} (which are duplicated because of the initial replicates), create a new unique dataset on which we create 20 response values y_i for each concentration x sampled from a normal distribution with 2% random heteroscedastic gaussian noise as a function of the value’s magnitude y_i = \hat{y} + \mathcal{N}(0, 0.02 \cdot \hat{y}):

FITTED <- unique(fitted(fm3DNase1))
DAT <- sapply(FITTED, function(x) rnorm(20, mean = x, sd = 0.02 * x))
matplot(t(DAT), type = "p", pch = 16, lty = 1, col = 1)
lines(FITTED, col = 2)

NLS
Now we create the new dataframe to be fitted. For this we have to stack the unique x– and y_i-values into a 2-column dataframe:

CONC <- unique(DNase1$conc)
fitDAT <- data.frame(conc = rep(CONC, each = 20), density = matrix(DAT))

First we create the unweighted fit:

FIT1 <- nls(density ~ Asym/(1 + exp((xmid - log(conc))/scal)),
data = fitDAT,
start = list(Asym = 3, xmid = 0, scal = 1))

Then we fit the data with weights w = 1/var(y). IMPORTANT: we need to replicate the weight values by 20 in order to match the data length.

VAR <- tapply(fitDAT$density, fitDAT$conc, var)
VAR <- rep(VAR, each = 20)
FIT2 <- nls(density ~ Asym/(1 + exp((xmid - log(conc))/scal)),
data = fitDAT, weights = 1/VAR,
start = list(Asym = 3, xmid = 0, scal = 1))

For calculation of \chi^2_\nu and its corresponding p-value, we use the fitchisq function of my ‘qpcR’ package:

library(qpcR)
> fitchisq(FIT1)
$chi2
[1] 191.7566
$chi2.red
[1] 1.22138
$p.value
[1] 0.03074883

> fitchisq(FIT2)
$chi2
[1] 156.7153
$chi2.red
[1] 0.9981866
$p.value
[1] 0.4913983

Now we see the benefit of weighted fitting: Only the weighted model shows us with it’s reduced chi-square value of almost exactly 1 and its high p-value that our fitted model approximates the parent model. And of course it does, because we simulated our data from it…

Cheers,
Andrej