Linear regression with random error giving EXACT predefined parameter estimates

When simulating linear models based on some defined slope/intercept and added gaussian noise, the parameter estimates vary after least-squares fitting. Here is some code I developed that does a double transform of these models as to obtain a fitted model with EXACT defined parameter estimates a (intercept) and b (slope).

It does so by:
1) Fitting a linear model #1 Y_i = \beta_0 + \beta_1X_i + \varepsilon_i to the x,y data.
2) Correcting y by \beta_1: Y_i = Y_i \cdot (\mathrm{b}/\beta_1).
3) Refitting linear model #2: Y_i = \beta_0 + \beta_1X_i + \varepsilon_i.
4) Correcting y by \beta_0: Y_i = Y_i + (\mathrm{a} - \beta_0).
5) Refitting linear model #3: Y_i = \beta_0 + \beta_1X_i + \varepsilon_i, which is the final model with parameter estimates a and b.

Below is the code:

exactLM <- function(
x = 1:100, ## predictor values
b = 0.01, ## slope
a = 3, ## intercept
error = NULL, ## homoscedastic error
n = 1, ## number of replicates
weights = NULL, ## possible weights, i.e. when heteroscedastic
plot = TRUE, ## plot data and regression
... ## other parameters to 'lm'
)
{
if (is.null(error)) stop("Please define some error!")

## create x and y-values
x <- rep(x, n)
y <- a + b * x
if (!is.null(error) & length(error) != length(x)) stop("'x' and 'error' must be of same length!")
if (!is.null(weights) & length(weights) != length(x)) stop("'x' and 'weights' must be of same length!")

## add error
y <- y + error

## create linear model #1
LM1 <- lm(y ~ x, weights = weights, ...)
COEF1 <- coef(LM1)

## correct slope and create linear model #2
y <- y * (b/COEF1[2])
LM2 <- lm(y ~ x, weights = weights, ...)
COEF2 <- coef(LM2)

## correct intercept and create linear model #3
y <- y + (a - COEF2[1])
LM3 <- lm(y ~ x, weights = weights, ...)

## plot data and regression
plot(x, y, pch = 16)
abline(LM3, lwd = 2, col = "darkred")

return(list(model = LM3, x = x, y = y))
}

Here are some applications using replicates and weighted fitting:

############ Examples #################
## n = 1
exactLM(x = 1:100, a = 0.5, b = 0.2, error = rnorm(100, 0, 2))

## n = 3
exactLM(x = 1:100, a = 0.5, b = 0.2, error = rnorm(300, 0, 2), n = 3)

## weighted by exact 1/var
x <- 1:100
error <- rnorm(100, 0, 0.1 * x)
weights <- 1/(0.1 * x)^2
exactLM(x = x, a = 0.5, b = 0.2, error = error, weights = weights)

## weighted by empirical 1/var
x <- rep(1:100, 3)
error <- rnorm(300, 0, 0.1 * x)
weights <- rep(1/(tapply(error, x, mean)^2), 3)
exactLM(x = x, a = 0.5, b = 0.2, error = error, weights = weights)

I am curious on comments concerning simplification and more importantly, application (other than cheating data…)!

Cheers,
Andrej

Advertisements

6 Responses to Linear regression with random error giving EXACT predefined parameter estimates

  1. jonocarroll says:

    Given that the “unknown” exact values of `a` and `b` appear in your solving code, isn’t it just as valid to use `lm(a + b*x ~ x)` and forego the correcting process entirely?

  2. You are using the true parameters (a and b) to help you find the true parameters, therefore it’s not surprising that you end up obtaining the true parameters:

    See these lines (it’s not surprising because you are using a and b)
    y <- y * (b / COEF1[2])
    y <- y + (a – COEF2[1])

    This is similar to:
    1. Think of a number
    2. Add 2 to the number
    3. Divide by 2
    4. Minus 1
    5. Multiply by 2
    6. Now you should have your original number

    • anspiess says:

      Very true, but the point is to create a set of response y-values with random noise that when fitted with lm(y ~ x) deliver a least-squares fit with exact slope/intercept that I can predefine.

      Cheers,
      Andrej

  3. OK, I see where you’re coming from now. I can imagine it might be useful for teaching where you want nice integer valued coefficients.

  4. Glen_b says:

    This is a lot more work than necessary. You can calculate residuals from a regression of noise on x and call that “noise 2”. If you add that new noise to a *noiseless* model (with the exact parameter values you want), you get data with the desired least squares fit.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: