Linear regression with random error giving EXACT predefined parameter estimates

January 26, 2016

When simulating linear models based on some defined slope/intercept and added gaussian noise, the parameter estimates vary after least-squares fitting. Here is some code I developed that does a double transform of these models as to obtain a fitted model with EXACT defined parameter estimates a (intercept) and b (slope).

It does so by:
1) Fitting a linear model #1 Y_i = \beta_0 + \beta_1X_i + \varepsilon_i to the x,y data.
2) Correcting y by \beta_1: Y_i = Y_i \cdot (\mathrm{b}/\beta_1).
3) Refitting linear model #2: Y_i = \beta_0 + \beta_1X_i + \varepsilon_i.
4) Correcting y by \beta_0: Y_i = Y_i + (\mathrm{a} - \beta_0).
5) Refitting linear model #3: Y_i = \beta_0 + \beta_1X_i + \varepsilon_i, which is the final model with parameter estimates a and b.

Below is the code:

exactLM <- function(
x = 1:100, ## predictor values
b = 0.01, ## slope
a = 3, ## intercept
error = NULL, ## homoscedastic error
n = 1, ## number of replicates
weights = NULL, ## possible weights, i.e. when heteroscedastic
plot = TRUE, ## plot data and regression
... ## other parameters to 'lm'
)
{
if (is.null(error)) stop("Please define some error!")

## create x and y-values
x <- rep(x, n)
y <- a + b * x
if (!is.null(error) & length(error) != length(x)) stop("'x' and 'error' must be of same length!")
if (!is.null(weights) & length(weights) != length(x)) stop("'x' and 'weights' must be of same length!")

## add error
y <- y + error

## create linear model #1
LM1 <- lm(y ~ x, weights = weights, ...)
COEF1 <- coef(LM1)

## correct slope and create linear model #2
y <- y * (b/COEF1[2])
LM2 <- lm(y ~ x, weights = weights, ...)
COEF2 <- coef(LM2)

## correct intercept and create linear model #3
y <- y + (a - COEF2[1])
LM3 <- lm(y ~ x, weights = weights, ...)

## plot data and regression
plot(x, y, pch = 16)
abline(LM3, lwd = 2, col = "darkred")

return(list(model = LM3, x = x, y = y))
}

Here are some applications using replicates and weighted fitting:

############ Examples #################
## n = 1
exactLM(x = 1:100, a = 0.5, b = 0.2, error = rnorm(100, 0, 2))

## n = 3
exactLM(x = 1:100, a = 0.5, b = 0.2, error = rnorm(300, 0, 2), n = 3)

## weighted by exact 1/var
x <- 1:100
error <- rnorm(100, 0, 0.1 * x)
weights <- 1/(0.1 * x)^2
exactLM(x = x, a = 0.5, b = 0.2, error = error, weights = weights)

## weighted by empirical 1/var
x <- rep(1:100, 3)
error <- rnorm(300, 0, 0.1 * x)
weights <- rep(1/(tapply(error, x, mean)^2), 3)
exactLM(x = x, a = 0.5, b = 0.2, error = error, weights = weights)

I am curious on comments concerning simplification and more importantly, application (other than cheating data…)!

Cheers,
Andrej

Advertisements