Table of Contents

Weighted Least Square is an estimate used in regression situations where the error terms are heteroscedastic or has non constant variance.

To get a better understanding about Weighted Least Squares, lets first see what Ordinary Least Square is and how it differs from Weighted Least Square.

## What is Ordinary Least Square(OLS)?

In a simple linear regression model of the form,

where

is the independent variable

is the independent variable

and are the regression coefficients

is the random error or the residual.

The goal is to find a line that best fits the relationship between the outcome variable and the input variable . With OLS, the linear regression model finds the line through these points such that the sum of the squares of the difference between the actual and predicted values is minimum.

i.e., to find and such that

is minimum.

In such linear regression models, the OLS assumes that the error terms or the residuals (the difference between actual and predicted values) are normally distributed with mean zero and constant variance. This constant variance condition is called homoscedasticity.

If this assumption of homoscedasticity does not hold, the various inferences made with this model might not be true.

To check for constant variance across all values along the regression line, a simple plot of the residuals and the fitted outcome values and the histogram of residuals such as below can be used.

In an ideal case with normally distributed error terms with mean zero and constant variance , the plots should look like this.

From the above plots its clearly seen that the error terms are evenly distributed on both sides of the reference zero line proving that they are normally distributed with mean=0 and has constant variance.

The histogram of the residuals also seems to have datapoints symmetric on both sides proving the normality assumption.

In some cases, the variance of the error terms might be heteroscedastic, i.e., there might be changes in the variance of the error terms with increase/decrease in predictor variable.

In those cases of non-constant variance Weighted Least Squares (WLS) can be used as a measure to estimate the outcomes of a linear regression model.

Now let’s see in detail about WLS and how it differs from OLS.

## Weighted Least Square

In a Weighted Least Square model, instead of minimizing the residual sum of square as seen in Ordinary Least Square ,

It minimizes the sum of squares by adding weights to them as shown below,

where _{} is the weight for each value of .

The idea behind weighted least squares is to weigh observations with higher weights more hence penalizing bigger residuals for observations with big weights more that those with smaller residuals.

Note: OLS can be considered as a special case of WLS with all the weights =1.

The weighted least square estimates in this case are given as

where the weighted means are ,

Suppose let’s consider a model where the weights are taken as

Then the residual sum of the transformed model looks as below,

## Weighted Least Square in R

To understand WLS better let’s implement it in R. Here we have used the Computer assisted learning dataset which contains the records of students who had done computer assisted learning. The variables include

cost – the cost of used computer time (in cents) and

num.responses – the number of responses in completing the lesson

#### Downloading and exploring the dataset:

Let’s first download the dataset from the ‘HoRM’ package.

1 2 3 4 5 6 7 8 9 10 11 12 13 |
install.packages('HoRM') library(HoRM) data(compasst) attach(compasst) > head(compasst, 6) num.responses cost 1 16 77 2 14 70 3 22 85 4 10 50 5 14 62 6 17 70 |

### Using Ordinary Least Square approach to predict the cost:

Let’s first use Ordinary Least Square in the lm function to predict the cost and visualize the results.

1 2 3 |
learning.lm <- lm(cost ~ num.responses, data= compasst) plot(compasst$num.response,compasst$cost) abline(learning.lm, col='red') |

The scatter plot of residuals vs responses is

1 |
plot(num.responses, learning.lm$residuals) |

Clearly from the above two plots there seems to be a linear relation ship between the input and outcome variables but the response seems to increase linearly with the standard deviation of residuals.

Also, the below histogram of residuals shows clear signs of non normally distributed error term.

1 2 |
#plotting the histogram of residuals hist(learning.lm$residuals, main = "histogram of residuals") |

Hence let’s use WLS in the lm function as below,

### Using Weighted Least Square to predict the cost:

As mentioned above weighted least squares weighs observations with higher weights more and those observations with less important measurements are given lesser weights.

Hence weights proportional to the variance of the variables are normally used for better predictions. The possible weights include

No weights | Default higher weights for higher cost or number of responses. |

1/cost | Nearly cancels out the weights for higher cost observations. |

1/cost^2 | Creates proportional weights for smaller cost values. |

Response | Nearly cancels out the weights for higher number of responses. |

1/Response^2 | Creates proportional weights for smaller number of responses. |

1/RSD | Weighs values with small relative standard deviations more than those with large relative standard deviations |

1/RSD^2 | Weighs values with small relative standard deviations clearly more than those with large relative standard deviation. |

So, in this case since the responses are proportional to the standard deviation of residuals.

σ ^{2} ∝ Response^{2}

Let’s take the weights as

w_{i} = 1/ Response^{2}

Using the above weights in the lm function predicts as below.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
w = 1/(num.responses^2) #predicting cost by using WLS in lm function learning.wlm <- lm(cost ~ num.responses, data= compasst, weight=w) #results of learning.wlm > summary(learning.wlm) Call: lm(formula = cost ~ num.responses, data = compasst, weights = w) Weighted Residuals: Min 1Q Median 3Q Max -0.3603 -0.2508 -0.0104 0.3052 0.3447 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 17.4530 4.8970 3.564 0.00515 ** num.responses 3.4100 0.3649 9.346 2.94e-06 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.2975 on 10 degrees of freedom Multiple R-squared: 0.8973, Adjusted R-squared: 0.887 F-statistic: 87.34 on 1 and 10 DF, p-value: 2.945e-06 |

Whereas the results of OLS looks like this

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
#results of learning.lm > summary(learning.lm) Call: lm(formula = cost ~ num.responses, data = compasst) Residuals: Min 1Q Median 3Q Max -6.389 -3.536 -0.334 3.319 6.418 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 19.4727 5.5162 3.530 0.00545 ** num.responses 3.2689 0.3651 8.955 4.33e-06 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 4.598 on 10 degrees of freedom Multiple R-squared: 0.8891, Adjusted R-squared: 0.878 F-statistic: 80.19 on 1 and 10 DF, p-value: 4.33e-06 |

Comparing the residuals in both the cases, note that the residuals in the case of WLS is much lesser compared to those in the OLS model.

1 2 3 4 5 6 7 8 9 |
#OLS residuals Residuals: Min 1Q Median 3Q Max -6.389 -3.536 -0.334 3.319 6.418 #WLS residuals Weighted Residuals: Min 1Q Median 3Q Max -0.3603 -0.2508 -0.0104 0.3052 0.3447 |

### Goodness of the fit using R-Squared :

Now let’s compare the R-Squared values in both the cases.

1 2 3 4 |
> summary(learning.lm)$r.squared [1] 0.8891177 > summary(learning.wlm)$r.squared [1] 0.8972716 |

From the above R squared values it is clearly seen that adding weights to the lm model has improved the overall predictability.

Now let’s implement the same example in Python.

## Weighted Least Square in Python:

Let’s now import the same dataset which contains records of students who had done computer assisted learning. The dataset can be found here.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
#importing libraries import pandas as pd import numpy as np import statsmodels.api as sm #for OLS,WLS import seaborn as sns import matplotlib.pyplot as plt #importing the dataset learning = pd.read_csv("learning.csv") learning.head() Out[1]: num.responses cost 0 16 77 1 14 70 2 22 85 3 10 50 4 14 62 |

The goal here is to predict the cost which is the cost of used computer time given the num.responses which is the number of responses in completing the lesson.

Now let’s first use Ordinary Least Square method to predict the cost.

1 2 3 4 |
#OLS Y = learning.cost X = learning["num.responses"] learning_ols = sm.OLS(Y,X).fit() |

Visualizing the results

1 2 3 4 5 6 7 |
#cost Vs num.Response Y_pred = learning_ols.predict(X) plt.scatter(X, Y) plt.xlabel("num.responses") plt.ylabel("cost") plt.plot(X, Y_pred, color='red') plt.show() |

The above scatter plot shows a linear relationship between cost and number of responses. Now let’s plot the residuals to check for constant variance(homoscedasticity).

1 2 |
#residual plot sns.residplot(X, Y) |

The above residual plot shows that the number of responses seems to increase linearly with the standard deviation of residuals, hence proving heteroscedasticity (non-constant variance).

Now let’s check the histogram of the residuals.

1 2 3 4 |
#Histogram of residuals ax = plt.hist(learning_ols.resid) plt.xlim(-40,50) plt.xlabel('Residuals') |

The histogram of the residuals shows clear signs of non-normality.So, the above predictions that were made based on the assumption of normally distributed error terms with mean=0 and constant variance might be suspect.

Now let’s use Weighted Least Square method to predict the cost and see how the results vary.

1 2 3 |
#WLS w = 1/(learning["num.responses"]^2) learning_wls = sm.WLS(Y,X, weights=w).fit() |

Comparing the R Square ^{ }values:

1 2 3 4 |
print('R2_ols: ', learning_ols.rsquared) print('R2_WLS: ', learning_wls.rsquared) R2_ols: 0.9915861646070941 R2_WLS: 0.9916229612643661 |

## Advantages of Weighted Least Square

One of the biggest advantages of Weighted Least Square is that it gives better predictions on regression with datapoints of varying quality.

In a Weighted Least Square regression it is easy to remove an observation from the model by just setting their weights to zero.Outliers or less performing observations can be just down weighted in Weighted Least Square to improve the overall performance of the model.

**Disadvantages of **Weighted Least Square

One of the biggest disadvantages of weighted least squares, is that Weighted Least Squares is based on the assumption that the weights are known exactly. But exact weights are almost never known in real applications, so estimated weights must be used instead.

The effect of using estimated weights is difficult to assess, but experience indicates that small variations in the weights due to estimation do not often affect a regression analysis or its interpretation.** **** **

#### Conclusion

So, in this article we have learned what Weighted Least Square is, how it performs regression, when to use it, and how it differs from Ordinary Least Square. We have also implemented it in R and Python on the Computer Assisted Learning dataset and analyzed the results.** **

Hope this article helped you get an understanding about Weighted Least Square estimates.

Do let us know your comments and feedback about this article below.