How to Replace NA’s with the Mean in R [Examples]

In this article, we discuss how to replace NA’s with the mean in R. We also show how to replace missing values with the average per group.

If you work with data, then chances are that you have to deal with missing values. Normally, you want to get rid of them and replace them with another value. Though there are many options to impute NA’s, in this article we solely focus on how the replace missing values with the column’s average.

The easiest way to replace NA’s in an R data frame is by using the replace_na() function and the mean() function. The first function identifies the missing values, whereas the latter replaces the NA’s with the mean. Moreover, both functions are compatible with the dplyr package, and therefore very convenient to replace missing values in larger chunks of code.

In this article, we show how to impute NA’s with the column mean using the dplyr package. We do this for a single column, multiple columns, and all numeric columns in a data frame. In addition, we demonstrate how to replace missing values with the mean of each group.

Before we go into details, we first create a data frame that we will use in our examples. This data frame contains 3 columns (x1, x2, and x3) in which one value is missing.

my_df <- data.frame(x1 = c(4, 2, NA, 9, 4),
                    x2 = c(-2, 0, 4, NA, 1),
                    x3 = c(1, NA, -3, -5, 2))
R data frame

Replace NAs with the Mean using R Base Code

As said before, we will use the dplyr package to remove missing values. However, if you don’t want (or can’t) install additional R packages, you can still impute NA’s with the mean by using only R Base code.

To replace NA’s with R Base code, you need the is.na() function, mean() function, and square brackets []. You use the is.na() function and the square brackets to identify the missing values, whereas the mean() function calculates and replaces the NA’s with the column’s mean.

If you use the mean() function to calculate the mean of a column, it is important to use the na.rm = TRUE option. This option ensures that the NA’s are ignored while computing the average. If you don’t use this option, the mean() function returns an error.

The R code below shows an example of how to replace NA’s with only R Base code.

my_df$x1[is.na(my_df$x1)] <- mean(my_df$x1, na.rm = TRUE)
Replace missing values in R with the mean

Replace NAs with the Mean using dplyr

Although you can replace NA’s with R Base code, it is not the most convenient way. Mainly because your code becomes unreadable quickly. A better way to impute missing values (with the mean) is by using the dplyr package.

The dplyr package provides simple “verb” functions that correspond to the most common data manipulation tasks. For example, mutate() and filter().

The simplest way to replace missing values with the mean, using the dplyr package, is by using the functions mutate(), replace_na(), and mean(). First, the mutate() function specifies which variable to modify. Then the replace_na() function identifies the NA’s. Finally, the mean() function replaces the missing values with the mean.

The replace_na() function provides an elegant way to replace missing values. Because you don’t need any additional if/else functions, nor the is.na() function, your code remains very readable.

To make the replace_na() function to work, you only need to provide:

  1. The variable you want to modify, and
  2. The value that replaces any missing values (e.g., the mean)
replace_na(variable-name, action)

Note, to use the replace_na() function, you need to install and load the tidyr package. Instead of using both the dplyr and tidyr packages, you can also use the tidyverse package.

Below we provide an example of how to replace NA’s with the column’s mean using dyplr.

library(dplyr)
library(tidyr)
my_df %>% 
  mutate(x1 = replace_na(x1,mean(x1, na.rm = TRUE)))
Replace NA's in R with th mean

Besides using the replace_na() function to impute NA’s with the mean, you can use this function also to replace missing values with the minimum, maximum, zero, mode, etc.

Replace NAs with the Mean in Multiple Columns

So far, we have replaced the NA’s in a single column. If you have many columns with missing values, you could create one line of code to treat each column individually. However, this is not the best way.

The easiest way to replace NA’s with the mean in multiple columns is by using the functions mutate_at() and vars(). These functions let you select the columns in which you want to replace the missing values. To actually replace the NA with the mean, you can use the replace_na() and mean() function.

You can use the vars() function and select multiple columns by specifying their names without quotes and separating by a comma.

Below we show an example of replacing the NA’s with the column’s mean in columns x1 and x2.

library(dplyr)
library(tidyr)
my_df %>% 
  mutate_at(vars(x1, x2), ~replace_na(.,mean(., na.rm = TRUE)))
Replace missing values in R with the average

Replace NAs with the Mean in All Columns

Although the mutate_at() function and the vars() function are useful to replace NA’s in some columns, it is not the best way to replace the missing values in all (numeric) columns.

In order to impute NA’s with the mean in all columns of an R data frame, you use the functions mutate_if() and is.numeric. First, by combining these functions, R identifies all the numeric columns and lets you modify them. Next, you can use the replace_na() function and the mean() function to replace the missing values with the column’s average.

Remember that you need to add the na.rm = TRUE option to the mean() function to correctly calculate the average.

The R code below shows an example of how to replace missing values in all (numeric) columns.

library(dplyr)
library(tidyr)
my_df %>% 
  mutate_if(is.numeric, ~replace_na(.,mean(., na.rm = TRUE)))
Replace missing values in R with the average

Replace NAs with the Mean by Group

So far, we have discussed how to impute NA’s with the overall column mean. However, if your data can be separated into groups, you might want to replace the missing value with the mean of each group instead.

Before we show how to do this, we first create a data frame that we will use in the examples. This data frame consists of four columns and six rows. The column type is used the divide the data into two groups, namely group A and group B.

my_df <- data.frame(type = c("A", "A", "A", "B", "B", "B"),
                    x1 = c(1, NA, 3, NA, 2, 4),
                    x2 = c(NA, 10, 15, -1, NA, 0),
                    x3 = c(3, 5, NA, 2, 4, NA))
R data frame

The easiest way to replace missing values with the group’s average is by using the dplyr package. First, the group_by() function divides the data into groups. Then, the functions mutate_at() and vars() specify the variables to modify. Finally, the replace_na() and mean() functions identify and replace the NA’s with the group’s mean.

In the example below, we use the group_by() function to separate our data into groups based on the value of the column type. Next, we use the functions mutate_at(), vars(), replace_na(), and mean() to replace the missing values with the average per group.

library(dplyr)
library(tidyr)
my_df %>% 
  group_by(type) %>% 
  mutate_at(vars(x1), ~replace_na(., mean(., na.rm = TRUE)))
Replace NA's in R with the mean by group

In our example, the groups are defined by just one variable. However, if your groups are defined by multiple variables, you can specify them in the group_by() function (separated by a comma).

Like in a previous section, you can also replace the missing values in multiple columns at once. Again, you should use the mutate_at() function and the vars() function. For instance:

library(dplyr)
library(tidyr)
my_df %>% 
  group_by(type) %>% 
  mutate_at(vars(x1, x2), ~replace_na(., mean(., na.rm = TRUE)))
Replace missing values in R with the average by group

Lastly, you can also replace the missing values with the group’s mean in all (numeric) columns. To do so, you need the group_by() function and a combination of the functions mutate_if(), is.numeric, replace_na(), and mean().

library(dplyr)
library(tidyr)
my_df %>% 
  group_by(type) %>% 
  mutate_if(is.numeric, ~replace_na(., mean(., na.rm = TRUE)))
Impute NA's with the mean by group in R