How to Replace NA’s in R with the Maximum

Whether you are working with a small or large data frame, missing values can be bothersome when you carry out your analysis.

In this article, we take a look at missing numeric data, and how to replace NA’s with the column’s maximum.

We demonstrate how to substitute missing values in a single column, in multiple columns, and per group.

What are NA’s in R

In R, the NA symbol represents a missing value. You can find NA’s in both numeric and character data. For example,

num_data <- c(1, 2, NA, 4)
char_data <- c("A", NA, "C", "D")

There are different ways to replace missing values, but in this article, we use the tidyverse package. If you haven´t loaded this package yet, please run the R code below.

# Load Tidyverse
if (!require("tidyverse")) install.packages("tidyverse")

How to Replace Missing Values in a Single Column

Replacing missing values isn’t difficult. Especially if you want to impute NA’s in just one column.

In this section, we will discuss two methods to replace the missing values in the column below with the maximum.

R data frame with missing values

Note that the highest value of the my_values column is 11.

Replace NA’s with Basic R Code

The first method to replace NA’s in a single column with the maximum is with basic R code.

Firstly, you specify the column containing the NA’s with the $-operator. Secondly, you use the is.na() function to find the exact entries that are missing. Lastly, you replace these entries with the maximum value of the column with the max() function.

Note that, by default, the max() function doesn’t calculate the maximum value of a column with missing values. So, to ignore NA’s while calculating the maximum, you need to add the na.rm = TRUE option.

The R code below shows how to impute the missing values with the column’s maximum.

my_values <- c(3, 7, NA, 2, 11, 8, NA, 9, 4, 7)
my_df <- data.frame(my_values)
my_df


my_df$my_values[is.na(my_df$my_values)] <- max(my_df$my_values, na.rm = TRUE)
my_df
Replace NA's with the maximum (basic R code)

Replace NA’s with Tidyverse

The second method to replace missing values in an R data frame uses the tidyverse package.

To replace missing values with the tidyverse package you need the mutate() function. With this function, you specify the column that contains the NA’s. Then, with the replace_na() function, you identify the missing values and replace them with the maximum.

The replace_na() function is a powerful tool from the tidyverse package to impute missing values. It has two mandatory arguments:

  1. The data argument.
  2. The replace argument.

While the data argument specifies in which column to replace missing values, the replace argument defines the replacement value. For example, the column´s highest value.

In the R code below, we use the output of the max() function as the replacement value to impute NA’s.

my_values <- c(3, 7, NA, 2, 11, 8, NA, 9, 4, 7)
my_df <- data.frame(my_values)
my_df

my_df <- my_df %>% 
  mutate(my_values = replace_na(my_values, max(my_values, na.rm = TRUE)))
my_df

As the image below shows, we’ve replaced the two missing values with the highest value of the my_values column.

Replace Missing Values with the maximum (tidyverse)

How to Replace Missing Values in Multiple Columns

So far, we’ve demonstrated how to replace NA’s in a single column. You could apply one of these methods to replace missing values in a data frame with many columns. However, this could be a tedious task and requires many lines of code.

In this section, we discuss how to replace NA’s in multiple columns with a few lines of R code.

In the examples, we will use the data frame below. It has four columns, all of which have missing values.

R data frame with multiple columns with NA's

Replace All NA’s

The first option to replace NA´s in a data frame with multiple columns is by replacing them in all columns.

You can replace the NA´s in all numeric columns with these steps:

  1. Use the mutate() function to modify the values in your data frame.
  2. Take advantage of the across() function to apply the same operation on multiple columns.
  3. Use the everything() function to specify that you want to carry out the operation (i.e., replacing missing values) on all columns.
  4. Identify the missing values with the replace_na() function.
  5. Replace the NA’s with the highest value using the max() function.

In the R code below, we show how to combine the steps above. With just one line of code, you can replace the missing values in all numeric columns with the maximum.

my_values_1 <- c(4, 9, 10, NA, 5, 12, NA, 7, 11, 8)
my_values_2 <- c(6, NA, 13, 8, 2, 11, 15, NA, 9, 10)
my_values_3 <- c(NA, 7, NA, 1, 2, 6, NA, 5, 9, 13)
my_values_4 <- c(8, 3, 2, NA, 10, 4, 6, 7, 9, NA)
my_df <- data.frame(my_values_1, my_values_2, my_values_3, my_values_4)
my_df

my_df <- my_df %>% 
  mutate(across(everything(), ~replace_na(.x, max(.x, na.rm = TRUE))))
my_df

The image below shows the outcome of running the R code. Note that, the NA’s are replaced with the highest value of each column.

Replace all missing values with the maximum

Important: The code above (using the everything() function) works both for numeric and character columns. However, the new values in the character columns might be unexpected. Therefore, please read this section, to replace missing values only in numeric columns.

Replace NA’s in Multiple Columns Based on their Names

Instead of replacing the missing values in all columns, you can also replace them in some columns. For example, based on the names of the columns.

Again, to impute NA’s in multiple columns based on the column’s name, you use the across() function. Firstly, you specify the names of the columns in an R vector. Then, you use the replace_na() function to impute the missing values. For example, with the maximum.

With the first argument of the across() function, you specify in which columns you want to replace the NA’s. You can use a vector for this. Remember that the column names must be written between quotes.

The R code below shows how to replace the NA’s in the columns my_values_1 and my_values_4 with their maximums.

my_values_1 <- c(4, 9, 10, NA, 5, 12, NA, 7, 11, 8)
my_values_2 <- c(6, NA, 13, 8, 2, 11, 15, NA, 9, 10)
my_values_3 <- c(NA, 7, NA, 1, 2, 6, NA, 5, 9, 13)
my_values_4 <- c(8, 3, 2, NA, 10, 4, 6, 7, 9, NA)
my_df <- data.frame(my_values_1, my_values_2, my_values_3, my_values_4)
my_df

my_df <- my_df %>% 
  mutate(across(c("my_values_1","my_values_4"), ~replace_na(.x, max(.x, na.rm = TRUE))))
my_df
Replace NA's in multiple columns based on their names

Replace NA’s in Multiple Columns Based on their Positions

Similar to replacing NA’s in columns based on the column names, you can also replace missing values in several columns based on their positions in a data frame.

To substitute NA’s with the maximum in multiple columns based on their positions, you use the across() function. With the first argument of this function, you can specify the position of the columns. Then, with the second argument, you define how to impute missing values (e.g., the maximum).

In the example below, we use an R vector to replace the missing values in columns 1, 2, and 4.

my_values_1 <- c(4, 9, 10, NA, 5, 12, NA, 7, 11, 8)
my_values_2 <- c(6, NA, 13, 8, 2, 11, 15, NA, 9, 10)
my_values_3 <- c(NA, 7, NA, 1, 2, 6, NA, 5, 9, 13)
my_values_4 <- c(8, 3, 2, NA, 10, 4, 6, 7, 9, NA)
my_df <- data.frame(my_values_1, my_values_2, my_values_3, my_values_4)
my_df

my_df <- my_df %>% 
  mutate(across(c(1:2,4), ~replace_na(.x, max(.x, na.rm = TRUE))))
my_df
Replace NA's in multiple columns based on their positions

Replace NA’s in Columns with a Common Prefix/Suffix

So far, we’ve demonstrated how to replace NA’s in all (numeric) columns and in columns based on their names or positions within a data frame. However, a frequently asked question is how to replace missing values in columns with a common prefix or suffix.

For example, we only want to impute NA’s in the columns that start with “my_”. Instead of explicitly specifying the column names or positions, we want to create more flexible R code.

R data frame with different column names

To replace NA’s in columns with a common prefix/suffix, you can follow the next steps:

  1. Use the grep() function to identify the columns with the prefix/suffix. If you provide the grep() function with the prefix/suffix and a list of column names, then the function returns the positions of these columns.
  2. Use the mutate() function to modify values in a data frame.
  3. Apply the across() function to perform the same operation on multiple columns.
  4. Define the positions of the columns with the prefix/suffix. You can use the outcome of the grep() function.
  5. Use the replace_na() function to replace the missing values with the column’s maximum.

In the R code below, we combine the steps above. First, we identify the columns with the prefix and save their positions in a variable. This variable will serve as the first argument of the across() function.

my_values_1 <- c(4, 9, 10, NA, 5, 12, NA, 7, 11, 8)
my_values_2 <- c(6, NA, 13, 8, 2, 11, 15, NA, 9, 10)
your_values_1 <- c(NA, 7, NA, 1, 2, 6, NA, 5, 9, 13)
your_values_2 <- c(8, 3, 2, NA, 10, 4, 6, 7, 9, NA)
my_df <- data.frame(my_values_1, my_values_2, your_values_1, your_values_2)
my_df


selected_columns <- grep("my_", names(my_df))
selected_columns

my_df <- my_df %>% 
  mutate(across(selected_columns, ~replace_na(.x, max(.x, na.rm = TRUE))))
my_df

As you can see in the image below, we’ve only replaced the missing values in the columns that start with “my_“. However, we didn’t explicitly specify the column names. This makes this code robust and reusable.

Replace Missing Values in columns with a common prefix/suffix

How to Replace All NA’s in Numeric Columns Only

In a previous section, we discussed a method to replace the NA’s in all columns with the maximum. However, if your data frame has both numeric and character columns, applying this method might result in unexpected values in the character columns.

For example, the data frame below has 5 columns with missing values, 4 of which are numeric.

To replace NA’s in the numeric columns only, you use the across() function and the is.numeric() function. With the across() function you replace all missing values. However, with the is.numeric() function you only limit this operation to the numeric columns.

In the following example, we show how to replace NA’s with the maximum value in all numeric columns.

my_values_1 <- c(4, 9, 10, NA, 5, 12, NA, 7, 11, 8)
my_values_2 <- c(6, NA, 13, 8, 2, 11, 15, NA, 9, 10)
my_values_3 <- c(NA, 7, NA, 1, 2, 6, NA, 5, 9, 13)
my_values_4 <- c(8, 3, 2, NA, 10, 4, 6, 7, 9, NA)
my_values_5 <- c(NA, "A", "B", "A", "C", "C", NA, "B", "A", "A")
my_df <- data.frame(my_values_1, my_values_2, my_values_3, my_values_4,my_values_5)
my_df


my_df <- my_df %>% 
  mutate(across(is.numeric, ~replace_na(.x, max(.x, na.rm = TRUE))))
my_df

How to Replace Missing Values with the Maximum per Group

Finally, we demonstrate how to replace NA’s in R with the maximum per group.

To replace NA´s with the maximum of each group, you first need to define the variable that defines the group. You can do this with the group_by() function. Once you have defined the groups, you can replace the missing values with the maximum by using the replace_na() function and the max() function.

In the example below, we demonstrate how to replace the NA´s with the highest values per group in one column. Nevertheless, you can use all previously discussed methods in combination with the group_by() function.

However, if you want to replace missing values with the maximum of a group for columns with a common prefix/suffix, please read this article.

my_groups <- c(rep("A",5), rep("B",5))
my_values <- c(4, 9, 10, NA, 5, 12, NA, 7, 11, 8)
my_df <- data.frame(my_groups, my_values)
my_df


my_df <- my_df %>% 
  group_by(my_groups) %>% 
  mutate(my_values = replace_na(my_values, max(my_values, na.rm = TRUE)))
my_df
Replace NA´s with Maximum per Group in R

Related Articles