How to Replace NA’s with the Mode (Most Frequent Value) in R

There are many ways to replace missing values in a data frame. In this article, we demonstrate how to replace NA’s with the mode in R. That is to say, with the most frequent value.

Replacing missing values with the mode requires more work than replacing NA´s with a common value, such as the average, the maximum, zeros, etc. Because R doesn’t have a function that calculates the mode, you need to create one yourself first (or install a package).

Once we’ve shown how to create a function that calculates the mode in R, we will demonstrate how to replace NA´s with the most frequent value, both numeric and character columns. We also provide examples of how to replacing missing values with the mode in multiple columns and per group.

How to Calculate the Mode

Although R doesn’t provide a standard function to calculate the mode, creating one is easy.

These are the steps to calculate the mode in R:

  1. List all unique values from a vector.
  2. Count the occurrences of each unique value in the vector.
  3. Return the value with the most occurrences.

In the R code below, we create the function calc_mode that combines these steps are returns the mode of a vector.

calc_mode <- function(x){
  
  # List the distinct / unique values
  distinct_values <- unique(x)
  
  # Count the occurrence of each distinct value
  distinct_tabulate <- tabulate(match(x, distinct_values))
  
  # Return the value with the highest occurrence
  distinct_values[which.max(distinct_tabulate)]
}

In the vector below, the number 5 occurs more than any other value (i.e., 3 times). Hence, the calc_mode function should return 5 as the mode.

a <- c(5, 7, 5, 2, 4, 2, 5, 8, 4, 1)
calc_mode(a)
calculate mode for a numeric column

The calc_mode function also returns the mode for character vectors. For example, the letter B is the most frequent value in the following example.

b <- c("A", "C", "E", "B", "B", "E", "A", "F", "B")
calc_mode(b)
calculate the mode for a character column

Note that a vector can have multiple modes (i.e., different values that have the same number of occurrences). So, depending on your needs, you might need to modify the calc_mode function to handle these situations.

Now that we have a function to calculate the mode, we’ll demonstrate how to replace missing values with the most frequent value in R.

How to Replace NA’s with the Mode in a Numeric Column

In this section, we show how to easily replace missing values in a numeric column with the most frequent value.

To replace NA´s with the mode in a numeric column, you need to:

  1. Specify in which column you want to replace the missing values. You can use the mutate() function from the tidyverse package to do this.
  2. Check if the value in the specified column is missing and act accordingly. You can do so with the if_else() function and the is_na() function. The latter checks if a value is missing.
  3. Use the calc_mode() function to replace the NA’s with the mode.

In the example below, we create an R data frame with one column that has one missing value. The most frequent value of this column is 3.

We use the steps mentioned above to replace the missing value with 3.

if (!require("tidyverse")) install.packages("tidyverse")

my_df <- data.frame(var_1 = c(3, 8, 0, 1, 3, 6, NA, 2, 4))
my_df

my_df %>% 
  mutate(var_1 = if_else(is.na(var_1), 
                         calc_mode(var_1), 
                         var_1))
replace missing value with mode in a numeric column in r

How to Replace NA’s with the Mode in a Character Column

Similar to numeric columns, you can also replace missing values in a character column.

To replace NA´s with the mode in a character column, you first specify the name of the column that has the NA´s. Then, you use the if_else() function to find the missing values. Once you have found one, you replace them with the mode using a user-defined R function that returns the mode.

The functions to modify a column and check if a value is missing are basic R functions or are part of the tidyverse package. However, R doesn’t provide a native function to calculate the mode. Therefore, you should create one yourself.

For example, the calc_mode function that we created at the beginning of this article returns the most frequent value of a vector/column.

Below, we have a data frame with one character column of which one value is missing. The most frequent value (i.e., mode) of the non-missing values is the character B. We use the following R code to replace NA’s with the mode.

if (!require("tidyverse")) install.packages("tidyverse")

my_df <- data.frame(var_1 = c("A", "C", "AA", "B", "B", "E", NA, "AA", "B"))
my_df


my_df %>% 
  mutate(var_1 = if_else(is.na(var_1), 
                         calc_mode(var_1), 
                         var_1))
replace NA with the mode in a character column in R

How to Replace NA’s with the Mode in Multiple Columns

In the two sections above, we demonstrated how to replace missing values in a single column. However, if you have a data frame with many columns and you want to replace the NA’s in all of them, then the previously discussed method can result in a tedious task.

Therefore, in this section, we provide a way to replace NA´s with the mode in all columns of an R data frame. Since this method doesn’t require specifying the column names explicitly, it makes the method very robust and reusable in many situations.

These are the steps to replace NA´s in all columns with the mode:

  1. Use the mutate() function to modify the values in a data frame.
  2. Use the across() function to perform an operation on multiple columns.
  3. Specify the columns on which you want to perform the operation (i.e, replace missing values). If you want to replace the missing values in all columns, then you should use the everything() function.
  4. Use the replace_na() function to identify and replace missing values.
  5. Specify the value that should replace the NA´s. If you want to replace the NA´s with the mode you can use the calc_mode() function.

Note that, the calc_mode function isn’t a native R function. Instead, we have created this function at the beginning of this article to calculate the mode.

The R code below combines all steps to replace the NA’s in all columns. Notice that it isn’t necessary to explicitly specify the names of the columns in which you want to replace the NA´s.

if (!require("tidyverse")) install.packages("tidyverse")

my_df <- data.frame(var_1 = c(3, 8, 0, 1, 3, 6, NA, 2, 4),
                    var_2 = c("A", "C", "AA", "B", "B", "E", NA, "AA", "B"))
my_df

my_df %>% 
  mutate(across(everything(), ~replace_na(.x, calc_mode(.x))))
Replace all missing values in R with the most frequent value.

How to Replace NA’s with the Mode per Group

Finally, we discuss how to replace missing values with the mode per group.

Replacing NA´s with the mode of a group is similar to replacing missing values in general. You only need to add the group_by() function to specify the variables that define the groups. After defining the groups, you replace the NA’s with the mode using a combination of the mutate() function, the if_else() function, the is.na() function, and the calc_mode() function.

In the example below, we have created an R data frame with two groups, namely group A and group B. The goal is to replace the missing values with the mode of each group.

We use the group_by() function to define the groups and the user-defined calc_mode() function to calculate the mode.

if (!require("tidyverse")) install.packages("tidyverse")

my_df <- data.frame(my_group = c(rep("A", 5), rep("B",5)),
                    var_1 = c(1, 2, NA, 3, 1, 8, 9, NA, 7, 8))
my_df

my_df %>% 
  group_by(my_group) %>% 
  mutate(var_1 = if_else(is.na(var_1), 
                         calc_mode(var_1), 
                         var_1))

As the image shows, the R code imputed the NA´s with the most frequent value of each group.

Replace NA´s with the mode per group in R

Related Articles