How to Replace Missing Values with the Minimum in R

Often, when you work with data, you need to deal with missing values. There are many options to impute NA’s with, such as the average or a zero. But, how do you replace missing values with the minimum in R?

To replace missing values in R with the minimum, you can use the tidyverse package. Firstly, you use the mutate() function to specify the column in which you want to replace the missing values. Secondly, you call the replace() function to identify the NA’s and to substitute them with the column lowest value.

This article discusses two options to impute missing values with the minimum, namely with basic R code and with the tidyverse package. Besides, we also explain how to replace the NA’s in all numeric columns.

How to Replace Missing Values with the Minimum in a Single Column

We use the following table to demonstrate how to replace missing values in a single column with the lowest value.

This table has 10 rows, 8 with values and 2 NA’s. We want to replace the NA’s with the value 4, i.e., the lowest value of the my_values column.

Replace Missing Values with Basic R Code

The first method to substitute missing values with the lowest value is with basic R code. In other words, you don’t need to install additional packages. This might be useful if you run a basic R installation without permissions to install packages.

So, how do you replace missing values with basic R code?

To replace the missing values, you first identify the NA’s with the is.na() function and the $-operator. Then, you use the min() function to replace the NA’s with the lowest value.

The R code below shows how to create a data frame with missing values and, subsequently, how to replace them with the lowest value.

my_values <- c(4, 9, 10, NA, 5, 12, NA, 7, 11, 8)
my_df <- data.frame(my_values)
my_df

my_df$my_values[is.na(my_df$my_values)] <- min(my_df$my_values, na.rm = T)
my_df

Remember, by default, the min() function doesn’t ignore missing values. As a result, the function returns a null value if it encounters one or more NA’s. Therefore, you need to add the na.rm = T option to the min() function. This option ignores missing values while calculation the minimum.

The image next shows the result of running the R code above.

Replace missing values in R with the minimum.

Replace Missing Values with Tidyverse

Another method to replace missing values in R is with the tidyverse package.

The tidyverse package is one of the most used packages in R. Especially in the field of data science and data analysis. But, how do you use tidyverse to replace missing values with the minimum?

To impute missing values in a data frame with the minimum, you use the mutate() and the replace() function. Firstly, the mutate() function specifies the column with the missing values. Secondly, the replace() function defines the new value of the NA’s.

The replace() consists of 3 parts:

  1. The column (i.e., vector) in which to replace values.
  2. The values to replace. You use the is.na() function to impute all missing values in a column.
  3. The replacement value. To replace the NA’s with the minimum, you use the min() function.

As discussed before, you need the specify na.rm = T option so that the min() function can calculate the minimum.

The R code below shows how to replace the missing values with the minimum.

library(tidyverse)

my_values <- c(4, 9, 10, NA, 5, 12, NA, 7, 11, 8)
my_df <- data.frame(my_values)
my_df

my_df %>% 
  mutate(my_values = replace(., is.na(.), min(., na.rm = T)))
Replace NA's with the minimum in R

How to Replace Missing Values with the Minimum in a Multiple Columns

So far, we’ve discussed two methods to replace NA’s in a single column. However, if your data frame has many columns, it can be tedious to impute all missing values one column at a time.

Therefore, in this section, we discuss how to replace missing values in multiple columns at once.

How to Replace Misssing Values with the Minimum in All Numeric Columns

The easiest way to impute every NA in all numeric columns is with the tidyverse packages. Firstly, you use the mutate_if() function and the is.numeric() function to identify the numeric columns. Then, you use the ifelse() function to find the missing values and replace them with the column’s lowest value.

As an example, the image below shows an R data frame with two columns. The goal is to replace all NA’s with only one simple line of code.

The following example shows how to apply the steps mentioned above.

my_values_1 <- c(4, 9, 10, NA, 5, 12, NA, 7, 11, 8)
my_values_2 <- c(6, NA, 13, 8, 2, 11, 15, NA, 9, 10)
my_df <- data.frame(my_values_1, my_values_2)
my_df

my_df %>% 
  mutate_if(is.numeric, function(x) ifelse(is.na(x), min(x, na.rm = T), x))
Replace missing values in all numeric columns

How to Replace Missing Values with the Minimum in Columns with a Specifc Prefix or Suffix

Another frequently asked question is how to replace missing values in columns that have a common prefix or suffix.

The R data frame in the image below has 4 columns, two of which start with the prefix “my_“. The goal is to replace only the missing values in these two columns with the minimum.

These are the steps to replace missing values in columns with a common prefix/suffix:

  1. Create a logical vector to identify the columns with the prefix/suffix.

    You can use the grepl() function to identify the column names with the desired prefix/suffix. The grepl() function requires two arguments. Firstly, the prefix/suffix between quotes, and secondly, the column names of your data frame.

  2. Replace the missing values in the identified columns.

    You can use the lapply() function to apply a function to the columns with the prefix/suffix. For example, a function that replaces NA’s with the minimum value.

In the block below, we show the R code that replaces the missing values with the minimum in the columns that start with “my_“.

my_values_1 <- c(4, 9, 10, NA, 5, 12, NA, 7, 11, 8)
my_values_2 <- c(6, NA, 13, 8, 2, 11, 15, NA, 9, 10)
your_values_1 <- c(NA, 7, NA, 1, 2, 6, NA, NA, 9, 13)
your_values_2 <- c(8, 3, 2, NA, NA, NA, 6, 7, 9, NA)
my_df <- data.frame(my_values_1, my_values_2, your_values_1, your_values_2)
my_df

selected_columns <- grepl("my_", names(my_df))
my_df[selected_columns] <- lapply(my_df[selected_columns], 
                                  function(x) 
                                    replace(x,is.na(x), min(x, na.rm = T)
                                            ))
my_df
my_df
Impute missing values in columns with a common prefix/suffix

How to Replace Missing Values with the Minimum in a Range of Columns

Lastly, we discuss how to replace missing values in a range of columns.

The data frame below has 4 columns. The goal is to substitute the NA’s in the second, third, and fourth column.

You can easily replace missing values in a range of columns with the lapply() function. Start by specifying the columns in which you want to replace the missing values. You can do this with a simple vector. Then, use the replace() function to identify NA’s and replace them with, for example, the minimum.

The R code below shows how to use the lapply() function to impute NA’s in the second, third, and fourth column.

my_values_1 <- c(4, 9, 10, NA, 5, 12, NA, 7, 11, 8)
my_values_2 <- c(6, NA, 13, 8, 2, 11, 15, NA, 9, 10)
your_values_1 <- c(NA, 7, NA, 1, 2, 6, NA, NA, 9, 13)
your_values_2 <- c(8, 3, 2, NA, NA, NA, 6, 7, 9, NA)
my_df <- data.frame(my_values_1, my_values_2, your_values_1, your_values_2)
my_df

selected_columns <- c(2:4)
my_df[selected_columns] <- lapply(my_df[selected_columns], 
                                  function(x) 
                                    replace(x,is.na(x), min(x, na.rm = T)
                                            ))
my_df
Replace missing values in R in a range of columns

Related Articles