## Linear regression with random error giving EXACT predefined parameter estimates

When simulating linear models based on some defined slope/intercept and added gaussian noise, the parameter estimates vary after least-squares fitting. Here is some code I developed that does a double transform of these models as to obtain a fitted model with EXACT defined parameter estimates a (intercept) and b (slope).

It does so by:
1) Fitting a linear model #1 $Y_i = \beta_0 + \beta_1X_i + \varepsilon_i$ to the x,y data.
2) Correcting y by $\beta_1$: $Y_i = Y_i \cdot (\mathrm{b}/\beta_1)$.
3) Refitting linear model #2: $Y_i = \beta_0 + \beta_1X_i + \varepsilon_i$.
4) Correcting y by $\beta_0$: $Y_i = Y_i + (\mathrm{a} - \beta_0)$.
5) Refitting linear model #3: $Y_i = \beta_0 + \beta_1X_i + \varepsilon_i$, which is the final model with parameter estimates a and b.

Below is the code:
 exactLM <- function( x = 1:100, ## predictor values b = 0.01, ## slope a = 3, ## intercept error = NULL, ## homoscedastic error n = 1, ## number of replicates weights = NULL, ## possible weights, i.e. when heteroscedastic plot = TRUE, ## plot data and regression ... ## other parameters to 'lm' ) { if (is.null(error)) stop("Please define some error!")

 ## create x and y-values x <- rep(x, n) y <- a + b * x if (!is.null(error) & length(error) != length(x)) stop("'x' and 'error' must be of same length!") if (!is.null(weights) & length(weights) != length(x)) stop("'x' and 'weights' must be of same length!") ## add error y <- y + error ## create linear model #1 LM1 <- lm(y ~ x, weights = weights, ...) COEF1 <- coef(LM1) ## correct slope and create linear model #2 y <- y * (b/COEF1[2]) LM2 <- lm(y ~ x, weights = weights, ...) COEF2 <- coef(LM2) ## correct intercept and create linear model #3 y <- y + (a - COEF2[1]) LM3 <- lm(y ~ x, weights = weights, ...) ## plot data and regression plot(x, y, pch = 16) abline(LM3, lwd = 2, col = "darkred") 

 return(list(model = LM3, x = x, y = y)) } 

Here are some applications using replicates and weighted fitting:
 ############ Examples ################# ## n = 1 exactLM(x = 1:100, a = 0.5, b = 0.2, error = rnorm(100, 0, 2))

 ## n = 3 exactLM(x = 1:100, a = 0.5, b = 0.2, error = rnorm(300, 0, 2), n = 3) ## weighted by exact 1/var x <- 1:100 error <- rnorm(100, 0, 0.1 * x) weights <- 1/(0.1 * x)^2 exactLM(x = x, a = 0.5, b = 0.2, error = error, weights = weights) 

## weighted by empirical 1/var x <- rep(1:100, 3) error <- rnorm(300, 0, 0.1 * x) weights <- rep(1/(tapply(error, x, mean)^2), 3) exactLM(x = x, a = 0.5, b = 0.2, error = error, weights = weights) 

I am curious on comments concerning simplification and more importantly, application (other than cheating data…)!

Cheers,
Andrej

### 8 Responses to Linear regression with random error giving EXACT predefined parameter estimates

1. jonocarroll says:

Given that the “unknown” exact values of a and b appear in your solving code, isn’t it just as valid to use lm(a + b*x ~ x) and forego the correcting process entirely?

2. You are using the true parameters (a and b) to help you find the true parameters, therefore it’s not surprising that you end up obtaining the true parameters:

See these lines (it’s not surprising because you are using a and b)
y <- y * (b / COEF1[2])
y <- y + (a – COEF2[1])

This is similar to:
1. Think of a number
2. Add 2 to the number
3. Divide by 2
4. Minus 1
5. Multiply by 2
6. Now you should have your original number

• anspiess says:

Very true, but the point is to create a set of response y-values with random noise that when fitted with lm(y ~ x) deliver a least-squares fit with exact slope/intercept that I can predefine.

Cheers,
Andrej

3. OK, I see where you’re coming from now. I can imagine it might be useful for teaching where you want nice integer valued coefficients.

4. Glen_b says:

This is a lot more work than necessary. You can calculate residuals from a regression of noise on x and call that “noise 2”. If you add that new noise to a *noiseless* model (with the exact parameter values you want), you get data with the desired least squares fit.

• anspiess says:

Yip, good catch! Any ideas for an application of this method, other than “cheating” y-values, i.e. when some data with defined slope (effect) is desired?

• Simon Wotherspoon says:

Most efficient to use qr on chine( model.matrix(~x,…),rnorm(…)) and take the final column of Q for your errors. Is about 5 lines.

5. […] Linear regression with random error giving EXACT predefined parameter estimates When simulating linear models based on some defined slope/intercept and added gaussian noise, the parameter estimates vary after least-squares fitting. Here is some code I developed that does a double transform of these models as to obtain a fitted model with EXACT defined parameter estimates a (intercept) and b (slope). […]