3 Easy Ways to Test for Heteroscedasticity in R [Examples]

In this article, we discuss how to test for heteroscedasticity in R. Especially in the context of regression models.

The absence of heteroscedasticity (i.e., homoscedasticity) is one of the main assumptions of linear regression. If heteroscedasticity does exist, the results of your analysis might be invalid. Therefore, it is vital to check this assumption.

In R, the easiest way to test for heteroscedasticity is with the “Residual vs. Fitted”-plot. This plot shows the distribution of the residuals against the fitted (i.e., predicted) values and makes detection of heteroscedasticity straightforward. Alternatively, you can perform the Breusch-Pagan Test or the White Test.

In this article, we show how to create and interpret the “Residual vs. Fitted”-plot, as well as how to perform both tests. To facilitate the discussion, we use examples and R code that you can use directly in your projects.

What is Heteroscedasticity?

Heteroscedasticity is the situation in which the variance of the residuals of a regression model is not the same across all values of the predicted variable. In other words, the variability of the residuals (i.e., error term) increases or decreases over the range of predictions.

For example:

Heteroscedasticity

In a regression model, one assumes the absence of heteroscedasticity (i.e., homoscedasticity). That is an equal variance of the residuals across the fitted values.

Homoscedasticity

If the homoscedasticity assumption is violated, the results of the regression model might be unreliable. Especially, the significance of the regression coefficients.

Moreover, the impact of violating the assumption increases with the degree of heteroscedasticity. Therefore, you should always check for the non-existence of heteroscedasticity in your regression model.

Do you know: 3 Ways to Check the Non-Autocorrelation Assumption and 3 Ways to Check for Multicollinearity

3 Ways to Check for Heteroscedasticity

In the sections below we show 3 ways to test for heteroscedasticity in R. We cover the “Residual vs. Fitted”-plot, the Breusch-Pagan test, and theWhite test. For each method, we include two examples.

In the examples, we test for two regression models the homoscedasticity assumption. For the first model, the residuals are homoscedastic, while for the second model, the variability of the residuals is not stable (i.e., heteroscedasticity). These examples will help you to draw the right conclusions in your analysis.

The first model estimates the Miles per Gallon (mpg) based on the gross Horsepower (hp) of a vehicle. For this example, we will use the mtcars dataset.

The second model predicts the volume of timber in a black cherry tree given its height in ft. The data for this example comes from the trees dataset.

1. Test for Heteroscedasticity with the “Residuals vs. Fitted”-Plot

The first way to test for heteroscedasticity in R is by using the “Residuals vs. Fitted”-plot. This plot shows the distribution of the residuals of a regression model among the fitted values.

You create a “Residuals vs. Fitted”-plot with the plot()-function which requires just one argument, namely a fitted regression model.

Syntax

par(mfrow = c(2, 2))
plot(fitted-regression-model)

Besides the “Residuals vs. Fitted”-plot, the plot()-function provides three other plots, among which is the “Scale-Location”-plot. This plot shows the square root of the standardized residual values against the fitted values.

Example without heteroscedasticity

model <- lm(mpg~hp, data = mtcars)

par(mfrow = c(2, 2))
plot(model)

The image above shows the “Residual vs. Fitted”-plot and the “Scale-Location”-plot for a regression model without heteroscedastic residuals. In other words, the variance of the residuals is the same for all values of the fitted values.

Although the lines in both plots are not flat, the variability among the (square root of the standardized) residuals seems stable. The variability does neither increase nor decrease with the fitted values. Therefore, we can assume that this regression model does not violate the homoscedasticity assumption.

However, to be completely sure, one can perform the Breusch-Pagan test or the White test.

Example with heteroscedasticity

model <- lm(Volume~Height, data = trees)

par(mfrow = c(2, 2))
plot(model)

The image above shows the “Residual vs. Fitted”-plot and the “Scale-Location”-plot for a regression model with heteroscedastic residuals.

The “Residuals vs. Fitted”-plot shows clearly an increase in variance across the fitted values. For fitted values lower than 30, the residuals are between -10 and 10. Whereas, for higher fitted values, the difference between the residuals (i.e., variability) becomes bigger. This is a clear indicator of heteroscedasticity.

Moreover, in the “Scale-Location”-plot, the increase in the square root of the standardized residuals across the fitted values is evident. Therefore, combining the “Residuals vs. Fitted”-plot and the “Scale-Location”-plot, we can conclude that this model violates the homoscedasticity assumption.

2. Perform the Breusch–Pagan Test to Check Heteroscedasticity

The second method to check for heteroscedasticity among residuals in R is by performing the Breusch-Pagan test. This test checks whether the variance of the residuals depends on the value of the independent variable.

Therefore, the hypothesis of the Breusch-Pagan Test is:

In R, you can perform the Breusch-Pagan test in different ways, for instance with:

  • The bptest function from the lmtest package,
  • The ncvTest function from the car package,
  • The plmtest functionfrom the plm package, or
  • The breusch_pagan function from the skedastic package.

In this example, we use the bptest function from the lmtest package which requires just a fitted “lm”-object as its argument (i.e., a linear regression model).

Example without heteroscedasticity

library(lmtest)
model <- lm(mpg~hp, data = mtcars)
lmtest::bptest(model)
Test for heteroscedasticity in R with the Breusch-Pagan test

The image above shows the output of the Breusch-Pagan test.

The interpretation of the Breusch-Pagan test for heteroscedasticity is simple. Because the test statistic (BP) is small and the p-value is not significant (i.e., >0.05), we do not reject the null hypothesis. Therefore, we assume that the residuals are homoscedastic.

Example with heteroscedasticity

library(lmtest)
model <- lm(Volume~Height, data = trees)
lmtest::bptest(model)
Test for homoscedasticity in R with the Breusch-Pagan test

In contrast to the previous example, the output of this Breusch-Pagan test has a high test statistic (BP=12.207) and a low p-value (<0.05). Therefore, we reject the null hypothesis and conclude that this regression model violates the homoscedasticity assumption.

Instead of the Breaush-Pagan test from the lmtest package, you can also use the ols_test_bartlett() function from the olsrr package.

3. Perform the White Test to Check Heteroscedasticity

Lastly, the third method to detect heteroscedasticity in R is by performing the White test.

The White test is a special case of the (simple) Breusch-Pagan test. The only difference between both tests is that its auxiliary regression doesn’t include cross-terms or the original squared variables. So, when should you use the White test?

The White test is an asymptotic test and is therefore meant to be used on large samples. As a consequence, the test results of smaller samples must be interpreted with caution. Also, when the regression model has many dependent variables, this test might be hard to calculate.

So, you should only use the White test to check for heteroscedasticity if you have a good reason. For example, you need your independent variable to have an interactive, non-linear effect on the variance.

Because the White test and the Breusch-Pagan test are so similar, they test the same hypothesis.

In the example below, we use the white_lm()-function from the skedastic package to perform the White test.

Example without heteroscedasticity

library(skedastic)
model <- lm(mpg~wt, data = mtcars)
skedastic::white_lm(model)
Test for homoscedasticity in R with the White test

The image above shows the outcome of the White test. Because the p-value is not significant (i.e., >0.05) we do not reject the null hypothesis. Hence, we assume that the residuals are homoscedastic.

Example with heteroscedasticity

library(skedastic)
model <- lm(Volume~Height, data = trees)
skedastic::white_lm(model)
Test for heteroscedasticity in R with the White test

Unlike the previous example, the residuals of this regression model have unequal variance. We draw this conclusion based on the high test statistic and the significant p-value (<0.05). Hence, we reject the null hypothesis and conclude that heteroscedasticity exists.