Linear regression with random error giving EXACT predefined parameter estimates

When simulating linear models based on some defined slope/intercept and added gaussian noise, the parameter estimates vary after least-squares fitting. Here is some code I developed that does a double transform of these models as to obtain a fitted model with EXACT defined parameter estimates a (intercept) and b (slope).

It does so by:
1) Fitting a linear model #1 $Y_i = \beta_0 + \beta_1X_i + \varepsilon_i$ to the x,y data.
2) Correcting y by $\beta_1$: $Y_i = Y_i \cdot (\mathrm{b}/\beta_1)$.
3) Refitting linear model #2: $Y_i = \beta_0 + \beta_1X_i + \varepsilon_i$.
4) Correcting y by $\beta_0$: $Y_i = Y_i + (\mathrm{a} - \beta_0)$.
5) Refitting linear model #3: $Y_i = \beta_0 + \beta_1X_i + \varepsilon_i$, which is the final model with parameter estimates a and b.

Below is the code:
 exactLM <- function( x = 1:100, ## predictor values b = 0.01, ## slope a = 3, ## intercept error = NULL, ## homoscedastic error n = 1, ## number of replicates weights = NULL, ## possible weights, i.e. when heteroscedastic plot = TRUE, ## plot data and regression ... ## other parameters to 'lm' ) { if (is.null(error)) stop("Please define some error!")

 ## create x and y-values x <- rep(x, n) y <- a + b * x if (!is.null(error) & length(error) != length(x)) stop("'x' and 'error' must be of same length!") if (!is.null(weights) & length(weights) != length(x)) stop("'x' and 'weights' must be of same length!") ## add error y <- y + error ## create linear model #1 LM1 <- lm(y ~ x, weights = weights, ...) COEF1 <- coef(LM1) ## correct slope and create linear model #2 y <- y * (b/COEF1) LM2 <- lm(y ~ x, weights = weights, ...) COEF2 <- coef(LM2) ## correct intercept and create linear model #3 y <- y + (a - COEF2) LM3 <- lm(y ~ x, weights = weights, ...) ## plot data and regression plot(x, y, pch = 16) abline(LM3, lwd = 2, col = "darkred") 

 return(list(model = LM3, x = x, y = y)) } 

Here are some applications using replicates and weighted fitting:
 ############ Examples ################# ## n = 1 exactLM(x = 1:100, a = 0.5, b = 0.2, error = rnorm(100, 0, 2))

 ## n = 3 exactLM(x = 1:100, a = 0.5, b = 0.2, error = rnorm(300, 0, 2), n = 3) ## weighted by exact 1/var x <- 1:100 error <- rnorm(100, 0, 0.1 * x) weights <- 1/(0.1 * x)^2 exactLM(x = x, a = 0.5, b = 0.2, error = error, weights = weights) 

## weighted by empirical 1/var x <- rep(1:100, 3) error <- rnorm(300, 0, 0.1 * x) weights <- rep(1/(tapply(error, x, mean)^2), 3) exactLM(x = x, a = 0.5, b = 0.2, error = error, weights = weights) 

I am curious on comments concerning simplification and more importantly, application (other than cheating data…)!

Cheers,
Andrej

8 Responses to Linear regression with random error giving EXACT predefined parameter estimates

1. jonocarroll says:

Given that the “unknown” exact values of a and b appear in your solving code, isn’t it just as valid to use lm(a + b*x ~ x) and forego the correcting process entirely?

2. Richard Perry says:

You are using the true parameters (a and b) to help you find the true parameters, therefore it’s not surprising that you end up obtaining the true parameters:

See these lines (it’s not surprising because you are using a and b)
y <- y * (b / COEF1)
y <- y + (a – COEF2)

This is similar to:
1. Think of a number
2. Add 2 to the number
3. Divide by 2
4. Minus 1
5. Multiply by 2
6. Now you should have your original number

• anspiess says:

Very true, but the point is to create a set of response y-values with random noise that when fitted with lm(y ~ x) deliver a least-squares fit with exact slope/intercept that I can predefine.

Cheers,
Andrej

3. Richard Perry says:

OK, I see where you’re coming from now. I can imagine it might be useful for teaching where you want nice integer valued coefficients.

4. Glen_b says:

This is a lot more work than necessary. You can calculate residuals from a regression of noise on x and call that “noise 2”. If you add that new noise to a *noiseless* model (with the exact parameter values you want), you get data with the desired least squares fit.

• anspiess says:

Yip, good catch! Any ideas for an application of this method, other than “cheating” y-values, i.e. when some data with defined slope (effect) is desired?

• Simon Wotherspoon says:

Most efficient to use qr on chine( model.matrix(~x,…),rnorm(…)) and take the final column of Q for your errors. Is about 5 lines.

5. […] Linear regression with random error giving EXACT predefined parameter estimates When simulating linear models based on some defined slope/intercept and added gaussian noise, the parameter estimates vary after least-squares fitting. Here is some code I developed that does a double transform of these models as to obtain a fitted model with EXACT defined parameter estimates a (intercept) and b (slope). […]