Error propagation based on interval arithmetics

September 27, 2014

I added an interval function to my ‘propagate’ package (now on CRAN) that conducts error propagation based on interval arithmetics. It calculates the uncertainty of a model by using interval arithmetics based on (what I call) a “combinatorial sequence grid evaluation” approach, thereby avoiding the classical dependency problem that often inflates the result interval.
This is how it works:
For two variables $x, y$ with intervals $[x_1, x_2]$ and $[y_1, y_2]$, the four basic arithmetic operations $\langle \mathrm{op} \rangle \in \{+, -, \cdot, /\}$ are
$[x_1, x_2] \,\langle\!\mathrm{op}\!\rangle\, [y_1, y_2] =$
$\left[ \min(x_1 \langle\!\mathrm{op}\!\rangle y_1, x_1 \langle\!\mathrm{op}\!\rangle y_2, x_2 \langle\!\mathrm{op}\!\rangle y_1, x_2 \langle\!\mathrm{op}\!\rangle y_2) \right.,$
$\left. \max(x_1 \langle\!\mathrm{op}\!\rangle y_1, x_1 \langle\!\mathrm{op}\!\rangle y_2, x_2 \langle\!\mathrm{op}\!\rangle y_1, x_2 \langle\!\mathrm{op}\!\rangle y_2)\right]$

So for a function $f([x_1, x_2], [y_1, y_2], [z_1, z_2], ...)$ with $k$ variables, we have to create all combinations $C_i = {{\{\{x_1, x_2\}, \{y_1, y2\}, \{z_1, z_2\}, ...\}} \choose k}$, evaluate their function values $R_i = f(C_i)$ and select $R = [\min R_i, \max R_i]$.
The so-called dependency problem is a major obstacle to the application of interval arithmetic and arises when the same variable exists in several terms of a complicated and often nonlinear function. In these cases, over-estimation can cover a range that is significantly larger, i.e. $\min R_i \ll \min f(x, y, z, ...) , \max R_i \gg \max f(x, y, z, ...)$. For an example, see here under “Dependency problem”. A partial solution to this problem is to refine $R_i$ by dividing $[x_1, x_2]$ into $i$ smaller subranges to obtain sequence $(x_1, x_{1.1}, x_{1.2}, x_{1.i}, ..., x_2)$. Again, all combinations are evaluated as described above, resulting in a larger number of $R_i$ in which $\min R_i$ and $\max R_i$ may be closer to $\min f(x, y, z, ...)$ and $\max f(x, y, z, ...)$, respectively. This is the “combinatorial sequence grid evaluation” approach which works quite well in scenarios where monotonicity changes direction, obviating the need to create multivariate derivatives (Hessians) or use some multivariate minimization algorithm.
If the interval is of type $[x_1 < 0, x_2 > 0]$, a zero is included into the middle of the sequence to avoid wrong results in case of even powers, i.e. $[-1, 1]^2 = [-1, 1][-1, 1] = [-1, 1]$ when actually the correct interval is $[0, 1]$, as exemplified by curve(x^2, -1, 1). Some examples to illustrate:

 ## Example 2: A complicated nonlinear model. ## Reduce sequence length to 2 => original interval ## for quicker evaluation. EXPR2 <- expression(C * sqrt((520 * H * P)/(M *(t + 460)))) H <- c(64, 65) M <- c(16, 16.2) P <- c(361, 365) t <- c(165, 170) C <- c(38.4, 38.5) DAT2 <- makeDat(EXPR2) interval(DAT2, EXPR2, seq = 2) [1317.494, 1352.277] 
 ## Example 5: Overestimation from dependency problem. # Original interval with seq = 2 => [1, 7] EXPR5 <- expression(x^2 - x + 1) x <- c(-2, 1) DAT5 <- makeDat(EXPR5) interval(DAT5, EXPR5, seq = 2) [1, 7] 
 # Refine with large sequence => [0.75, 7] interval(DAT5, EXPR5, seq = 100) [0.7502296, 7] # Tallies with curve function. curve(x^2 - x + 1, -2, 1) 

Have fun!
Cheers,
-ans

I’ll take my NLS with weights, please…

January 13, 2014

Today I want to advocate weighted nonlinear regression. Why so?
Minimum-variance estimation of the adjustable parameters in linear and non-linear least squares requires that the data be weighted inversely as their variances $w_i \propto \sigma^{-2}$. Only then $\hat{\beta}$ is the BLUE (Best Linear Unbiased Estimator) for linear regression and nonlinear regression with small errors (http://en.wikipedia.org/wiki/Weighted_least_squares#Weighted_least_squares), an important fact frequently neglected, especially in scenarios with heteroscedasticity.
The variance of a fit $s^2$ is also characterized by the statistic $\chi^2$ defined as followed:
$\chi^2 \equiv \sum_{i=1}^n \frac{(y_i - f(x_i))^2}{\sigma_i^2}$
The relationship between $s^2$ and $\chi^2$ can be seen most easily by comparison with the reduced $\chi^2$: $\chi_\nu^2 = \frac{\chi^2}{\nu} = \frac{s^2}{\langle \sigma_i^2 \rangle}$
whereas $\nu$ = degrees of freedom (N – p), and $\langle \sigma_i^2 \rangle$ is the weighted average of the individual variances. If the fitting function is a good approximation to the parent function, the value of the reduced chi-square should be approximately unity, $\chi_\nu^2 = 1$. If the fitting function is not appropriate for describing the data, the deviations will be larger and the estimated variance will be too large, yielding a value greater than 1. A value less than 1 can be a consequence of the fact that there exists an uncertainty in the determination of $s^2$, and the observed values of $\chi_\nu^2$ will fluctuate from experiment to experiment. To assign significance to the $\chi^2$ value, one can use the integral probability $P_\chi(\chi^2;\nu) = \int_{\chi^2}^\infty P_\chi(x^2, \nu)dx^2$ which describes the probability that a random set of $n$ data points sampled from the parent distribution would yield a value of $\chi^2$ equal to or greater than the calculated one. This can be calculated by 1 - pchisq(chi^2, nu) in R.

To see that this actually works, we can Monte Carlo simulate some heteroscedastic data with defined variance as a function of $y$-magnitude and compare unweighted and weighted NLS.
First we take the example from the documentation to nls and fit an enzyme kinetic model:
 DNase1 <- subset(DNase, Run == 1) fm3DNase1 <- nls(density ~ Asym/(1 + exp((xmid - log(conc))/scal)), data = DNase1, start = list(Asym = 3, xmid = 0, scal = 1)) 
Then we take the fitted values $\hat{y}$ (which are duplicated because of the initial replicates), create a new unique dataset on which we create 20 response values $y_i$ for each concentration $x$ sampled from a normal distribution with 2% random heteroscedastic gaussian noise as a function of the value’s magnitude $y_i = \hat{y} + \mathcal{N}(0, 0.02 \cdot \hat{y})$:
 FITTED <- unique(fitted(fm3DNase1)) DAT <- sapply(FITTED, function(x) rnorm(20, mean = x, sd = 0.02 * x)) matplot(t(DAT), type = "p", pch = 16, lty = 1, col = 1) lines(FITTED, col = 2) 

Now we create the new dataframe to be fitted. For this we have to stack the unique $x$- and $y_i$-values into a 2-column dataframe:
 CONC <- unique(DNase1$conc) fitDAT <- data.frame(conc = rep(CONC, each = 20), density = matrix(DAT))  First we create the unweighted fit:  FIT1 <- nls(density ~ Asym/(1 + exp((xmid - log(conc))/scal)), data = fitDAT, start = list(Asym = 3, xmid = 0, scal = 1))  Then we fit the data with weights $w = 1/var(y)$. IMPORTANT: we need to replicate the weight values by 20 in order to match the data length.  VAR <- tapply(fitDAT$density, fitDAT$conc, var) VAR <- rep(VAR, each = 20) FIT2 <- nls(density ~ Asym/(1 + exp((xmid - log(conc))/scal)), data = fitDAT, weights = 1/VAR, start = list(Asym = 3, xmid = 0, scal = 1))  For calculation of $\chi^2_\nu$ and its corresponding p-value, we use the fitchisq function of my ‘qpcR’ package:  library(qpcR) > fitchisq(FIT1)$chi2 [1] 191.7566 $chi2.red [1] 1.22138$p.value [1] 0.03074883