3 Ways to Replace NA’s with Zeros in R [Examples]

In this article, we discuss how to replace NA’s (i.e., missing values) with zeros in R.

There are many reasons why a dataset might have missing values, for example, due to missing responses in a survey or NA´s in imported data. Similarly, you can replace missing values with many values, such as the mean, median, or mode. Nevertheless, zeros are probably the most common option.

In R, you replace NA’s with zeros using either basic R code, the COALESCE() function, or the REPLACE_NA() function. All are easy to understand, but the COALESCE() and the REPLACE_NA() function have the advantage that they form part of the tidyverse.

In this article, we discuss all 3 methods using clear examples. We also show how to use the REPLACE_NA() function in combination with the MUTATE_ALL() and MUTATE_AT() functions to replace missing values in those columns that meet a specific condition.

3 Ways to Replace Missing Values with Zeros

1. Replace NA’s with Zeros using R Base Code

The classic way to replace NA’s in R is by using the IS.NA() function.

The IS.NA() function takes a vector or data frame as input and returns a logical object that indicates whether a value is missing (TRUE or VALUE). Next, you can use this logical object to create a subset of the missing values and assign them a zero.

For example:

vec <- c(1,2,NA,4,5)
vec[is.na(vec)] <- 0
vec
Replace NA's with Zeros in a Vector in R

You can also use the IS.NA() function to identify and replace missing values in a data frame.

set.seed(12345)
mat <- matrix(sample(c(1:10,NA),replace = TRUE,80),10)
df <- as.data.frame(mat)
df[is.na(df)] <- 0
Replace Missing Values with Zeros in R

Although the IS.NA() function is intuitive, it is also a relatively slow function. Moreover, IS.NA() always overwrites the existing vector or data frame. On top of that, you can’t use this function in combination with tidyverse syntax.

2. Replace NA’s with Zeros using the COALESCE() Function

Alternatively, you can use the COALESCE() function to substitute missing values.

The COALESCE() function is part of the dplyr package and returns the first non-missing value of its arguments. Therefore, you can combine the MUTATE_ALL() function with the COALESCE() function, whose second argument is a zero, to replace all NA’s with zeros.

This R code shows an example.

library(dplyr)

set.seed(12345)
mat <- matrix(sample(c(1:10,NA),replace = TRUE,80),10)
df <- as.data.frame(mat)

mutate_all(df, ~coalesce(.,0))
Replace NA's with Zeros in R

The COALESCE() function has two main benefits, namely, it’s quick and works with the pipe operator.

3. Replace NA’s with Zeros using the REPLACE_NA() Function

The easiest and most versatile way to replace NA’s with zeros in R is by using the REPLACE_NA() function.

The REPLACE_NA() function is part of the tidyr package, takes a vector, column, or data frame as input, and replaces the missing values with a zero. This function has the advantage that it is fast, explicit and part of the tidyverse package.

The next example shows how to apply the REPLACE_NA() function.

library(dplyr)
library(tidyr)
mutate_all(df, ~replace_na(.,0))
Replace Missing Values with 0 in R

Although we’ve used REPLACE_NA() to replace missing values with zeros, its purpose is not limited to this operation. You can use the REPLACE_NA() function also to substitute NA´s with other values.

Replace Missing Values with Zeros in Multiple Columns

Above, we have demonstrated how to replace all missing values in a data frame. However, this might not be also necessary. Instead, you might want to substitute the NA´s in just some columns.

Therefore, in this section, we explain how to use the MUTATE_AT() function and REPLACE_NA() function to replace missing values in those columns that meet a specific condition.

Replace NA’s based on Column Position

First, we discuss how to replace NA´s in columns based on their position. For example, the first, third, or eighth column.

One Position

Replacing the missing values with zeros in one column given its position is easy. You use the REPLACE_AT() function and provide two arguments, namely:

  1. The position of the column in the dataframe in which you want to replace the NA´s.
  2. The REPALCE_NA() funcion to replace the NA’s with a zero.

The R code below shows an example to replace all missing values in the first column.

df %>% mutate_at(1, ~replace_na(.,0))
Replace Missing Values based on Column Position

Multiple Positions

Instead, you can use the MUTATE_AT() function also to replace missing values in multiple columns given their positions.

These are the steps.

  • First, you create a vector with the positions of the columns with the c() function.
  • Then, you use this vector as the first argument of the MUTATE_ALL() function.
  • Finally, you use the REPLACE_NA() function to replace the NA´s with zeros.

In the R code below, we substitute the missing values in columns, 1, 3, 4, 5, and 8.

df %>% mutate_at(c(1,3:5,8), ~replace_na(.,0))

Replace NA’s based on Column Name

Alternatively, you can also replace missing values in a data frame based on the column names.

Below, we show a variety of options to identify columns using their names. But first, we create a vector and assign each column a name.

set.seed(12345)
mat <- matrix(sample(c(1:10,NA),replace = TRUE,80),10)
df <- as.data.frame(mat)
names(df) <- c("AAA",
               "AAB",
               "ABA",
               "BAA",
               "ABB",
               "BBA",
               "BAB",
               "BBB")

In general, to replace NA´s with zeros in columns given their names, you need the VARS() function. This function helps R to select columns based on their names.

One Name

To replace missing values in one column based on its name, you need to provide the column name to the VARS() function. For example, vars(“AAA”). Then, you use this as the first argument of the MUTATE_AT() function. Finally, you use the REPLACE_NA() function as the second argument of the MUTATE_AT() function to replace the NA´s with zeros.

Below we show an example.

df %>% mutate_at(vars("AAA"), ~replace_na(.,0))
Replace NA's with Zeros based on Column Name

Multiple Names

Instead, you can provide the VARS() function with multiple column names and replace NA´s in different columns at once. However, it is important to define the names of the columns in the c() function. You can’t write different column names in the VARS() function directly.

For example, below we replace the missing values with zeros in five columns based on their names.

df %>% mutate_at(vars(c("AAA", "BAA":"BBA", "BBB")), ~replace_na(.,0))

Names Containing Specific Characters

Alternatively, you can also replace NA´s in those columns where the column name meets a specific condition.

For example, you can use the CONTAINS() function to identify all columns that have a specific pattern of characters. In the R code below, we replace the NA´s with zeros only in those columns that contain the characters “AB”.

df %>% mutate_at(vars(contains("AB")), ~replace_na(.,0))

Note that, the CONTAINS() function is case sensitive. In other words, “AB” and “ab” are not the same.

Names Starting with Specific Characters

Instead of looking for a pattern of characters throughout the entire column name, you can use the STARTS_WITH() function to replace NA´s with zeros in those columns that start with a specific character(s).

In the example below, we use the STARTS_WITH() function to identify all columns that start with “BB”. Next, we use the VARS() function to obtain the column positions in the data frame. Lastly, we use the REPLACE_NA() function to replace the missing values with zeros.

df %>% mutate_at(vars(starts_with("BB")), ~replace_na(.,0))
Replace Missing Values with Zeros in Columns that start with a specific character in R

Names Ending with Specific Characters

Like the STARTS_WITH() function, R also offers the ENDS_WITH() function.

You can use the ENDS_WITH() function to identify column names that end with a specific pattern of characters and replace NA´s only in these columns.

For example, below we substitute missing values with zeros only in the columns whose names end with “BB”.

df %>% mutate_at(vars(ends_with("BB")), ~replace_na(.,0))

Replace NA’s based on Column Type

Instead of replacing NA’s in columns based on their position or name, you can also replace missing values in columns of a specific type. For example, integer columns, character columns, or factor columns. In this section, we explain how.

But first, we create a data frame with different types of columns.

num_col <- c(1,2,NA,4,5)
int_col <- as.integer(num_col)
char_col <- c("A", "B", NA, "D", "E")
fct_col <- as.factor(num_col)
df <- data.frame(num_col, int_col, char_col, fct_col)

In contrast to previous sections, we won’t use the MUTATE_AT() function. Instead, we will use the MUTATE_IF() function.

Numeric Columns

To replace missing values only in the numeric columns, you can use the is.numeric function. This function identifies the numeric columns and returns their positions within a data frame.

You can use the is.numeric function as the first argument of the MUTATE_IF() function. If you use the REPLACE_NA() function as the second argument, then you can replace the NA’s with a zero in all numeric columns.

Note that R distinguishes between numeric and integer data types. However, if you use the is.numeric function, then R returns both numeric and integer columns.

Below we show an example.

df %>% mutate_if(is.numeric, ~replace_na(.,0))
Replace Missing Values in Numeric Characters in R

Integer Columns

Instead, if you only want to replace the missing values in integer columns, you can use the is.integer function as the first argument of the MUTATE_IF() function.

The next R code shows how to use the is.integer function to replace missing values.

df %>% mutate_if(is.integer, ~replace_na(.,0))
Replace NA's in R in Integer Columns

Character Columns

Like the is.numeric and is.integer, R also provides the is.character function.

The is.character identifies the position of character columns in a data frame and can help you to replace NA’s only in character columns. These are the steps:

  • Start the MUTATE_IF() function.
  • Use the is.character function as the first argument.
  • Use the REPLACE_NA() function as the second argument to replace NA’s with zeros.

For example:

df %>% mutate_if(is.character, ~replace_na(.,0))
Replace NA's with Zeros in Character Columns in R

Factor Columns

Probably the most difficult is to replace NA’s with zeros in a factor column and consider the 0 as a factor level.

Nevertheless, these are the steps:

  1. Identify all factor columns in a data frame with the SAPPLY function.
  2. Convert all factor columns to a character column with the LAPPLY function.
  3. Replace the NA’s in converted factor columns with zeros using the REPLACE_NA() function.
  4. Reconvert the original factor columns into factor columns again.

The R code below shows all the steps.

all_fct_column <- sapply(df,is.factor)
df[all_fct_column] <- lapply(df[all_fct_column], as.character)
df %>% 
  mutate_at(which(all_fct_column), ~replace_na(.,0)) %>% 
  mutate_at(which(all_fct_column), factor) 

Note that, these steps don’t affect the character column.

Replace NA's with Zeros in Factor Columns in R