When simulating linear models based on some defined slope/intercept and added gaussian noise, the parameter estimates vary after least-squares fitting. Here is some code I developed that does a double transform of these models as to obtain a fitted model with EXACT defined parameter estimates `a`

(intercept) and `b`

(slope).

It does so by:

1) Fitting a linear model #1 to the `x,y`

data.

2) Correcting `y`

by : .

3) Refitting linear model #2: .

4) Correcting `y`

by : .

5) Refitting linear model #3: , which is the final model with parameter estimates `a`

and `b`

.

Below is the code:

exactLM <- function(

x = 1:100, ## predictor values

b = 0.01, ## slope

a = 3, ## intercept

error = NULL, ## homoscedastic error

n = 1, ## number of replicates

weights = NULL, ## possible weights, i.e. when heteroscedastic

plot = TRUE, ## plot data and regression

... ## other parameters to 'lm'

)

{

if (is.null(error)) stop("Please define some error!")

```
``` ## create x and y-values

x <- rep(x, n)

y <- a + b * x

if (!is.null(error) & length(error) != length(x)) stop("'x' and 'error' must be of same length!")

if (!is.null(weights) & length(weights) != length(x)) stop("'x' and 'weights' must be of same length!")

## add error

y <- y + error

## create linear model #1

LM1 <- lm(y ~ x, weights = weights, ...)

COEF1 <- coef(LM1)

## correct slope and create linear model #2

y <- y * (b/COEF1[2])

LM2 <- lm(y ~ x, weights = weights, ...)

COEF2 <- coef(LM2)

## correct intercept and create linear model #3

y <- y + (a - COEF2[1])

LM3 <- lm(y ~ x, weights = weights, ...)

## plot data and regression

plot(x, y, pch = 16)

abline(LM3, lwd = 2, col = "darkred")

` return(list(model = LM3, x = x, y = y))`

}

Here are some applications using replicates and weighted fitting:

############ Examples #################

## n = 1

exactLM(x = 1:100, a = 0.5, b = 0.2, error = rnorm(100, 0, 2))

```
```## n = 3

exactLM(x = 1:100, a = 0.5, b = 0.2, error = rnorm(300, 0, 2), n = 3)

## weighted by exact 1/var

x <- 1:100

error <- rnorm(100, 0, 0.1 * x)

weights <- 1/(0.1 * x)^2

exactLM(x = x, a = 0.5, b = 0.2, error = error, weights = weights)

`## weighted by empirical 1/var`

x <- rep(1:100, 3)

error <- rnorm(300, 0, 0.1 * x)

weights <- rep(1/(tapply(error, x, mean)^2), 3)

exactLM(x = x, a = 0.5, b = 0.2, error = error, weights = weights)

I am curious on comments concerning simplification and more importantly, application (other than cheating data…)!

Cheers,

Andrej

Given that the “unknown” exact values of `a` and `b` appear in your solving code, isn’t it just as valid to use `lm(a + b*x ~ x)` and forego the correcting process entirely?

You are using the true parameters (a and b) to help you find the true parameters, therefore it’s not surprising that you end up obtaining the true parameters:

See these lines (it’s not surprising because you are using a and b)

y <- y * (b / COEF1[2])

y <- y + (a – COEF2[1])

This is similar to:

1. Think of a number

2. Add 2 to the number

3. Divide by 2

4. Minus 1

5. Multiply by 2

6. Now you should have your original number

Very true, but the point is to create a set of response y-values with random noise that when fitted with lm(y ~ x) deliver a least-squares fit with exact slope/intercept that I can predefine.

Cheers,

Andrej

OK, I see where you’re coming from now. I can imagine it might be useful for teaching where you want nice integer valued coefficients.

This is a lot more work than necessary. You can calculate residuals from a regression of noise on x and call that “noise 2”. If you add that new noise to a *noiseless* model (with the exact parameter values you want), you get data with the desired least squares fit.

Yip, good catch! Any ideas for an application of this method, other than “cheating” y-values, i.e. when some data with defined slope (effect) is desired?

Most efficient to use qr on chine( model.matrix(~x,…),rnorm(…)) and take the final column of Q for your errors. Is about 5 lines.

[…] Linear regression with random error giving EXACT predefined parameter estimates When simulating linear models based on some defined slope/intercept and added gaussian noise, the parameter estimates vary after least-squares fitting. Here is some code I developed that does a double transform of these models as to obtain a fitted model with EXACT defined parameter estimates a (intercept) and b (slope). […]