How to Replace NA’s with Next Non-Missing Value in R [Examples]

In this article, we discuss how to replace NA’s with the next, non-missing value (per group) in R.

When you work with data, it is likely that you encounter missing values (i.e., NA’s). Because of missing values functions might not work, the conclusions you draw might be incorrect, or your statistical model might be less powerful. Therefore, it is crucial to assess them.

In general, you can treat NA’s in two ways. You can either remove them or replace them. Common options to are replace NA’s with zeros or the mean. However, another option is to substitute missing values with the next, non-missing value.

In R, the easiest way to replace NA’s with the next, non-missing value (per group) is with the fill() function from the tidyr package. This function automatically detects and replaces missing values in a data frame. Alternatively, you can use the na.locf() function or the setnafill() function.

In the remainder of this article, we will discuss these 3 options and consider their advantages and disadvantages. We also provide examples with R code that you can use directly in your projects.

3 Ways to Replace NA’s with the Next Non-Missing Value in R

Before we start, we create an R data frame that we will use in the examples.

The data frame has 6 rows (i.e., observations) and 4 columns. The first 3 columns are numeric, whereas the last column contains character data.

Also, note that the second column ends with a missing value. This might be problematic as we want to replace NA’s with the next, non-missing value.

my_data <- data.frame(x1 = c(1,2,NA,4,NA,6),
                      x2 = c(0,NA,0,1,0,NA),
                      x3 = c(NA,4,6,NA,NA,10),
                      x4 = c("A", NA, "C", NA, "E", "F"))
my_data

1. Replace NA’s with Next Non-Missing Value with the setfillna() Function from the data.table Package

The first option to replace NA’s in R with the next, non-missing value is by using the setnafill() function from the data.table package.

The setnafill() function needs 2 arguments to replace NA’s with the next, non-missing value:

  1. x: A data frame with only numeric columns.
  2. type: An instruction on how to replace missing values.

As the first argument indicates, the setnafill() function only works with numeric input. So, if your data frame contains numeric and character data, you need to subset your data first. This is a big disadvantage of the setnafill() function.

The second argument, i.e., the type=-argument provides 2 options to replace missing values. Namely:

  1. Replacing NA’s with the next, non-missing value using type=”nocb”, or
  2. Replacing NA’s with the last, non-missing value using type=”locf”.

The fact that you can use the setnafill() function to replace NA’s both with the last and next, non-missing value is an advantage. For more information about replacing NA’s with the last, non-missing value (using the setnafill() function), see this article.

This code snippet shows how to use the setnafill() function.

# 1.1 Replace NA with Next Non-Missing Value with 'setnafill()' function
library(data.table)

my_data_corrected <- setnafill(x = my_data[,c(1:3)], type = "nocb")
my_data_corrected

As you can see, we first needed to select only the numeric columns. Also, the missing value in the last observation of the second column is still missing since the was no next value.

2. Replace NA’s with Next Non-Missing Value with the na.locf() Function from the zoo Package

The second method to replace NA’s in R with the next, non-missing value is by using the na.locf() function from the zoo package.

Orginally, the na.locf() function was designed to replace NA’s with the last, non-missing value. However, by adding an additional argument, you can use this function also to replace NA’s with the next, non-missing value.

These are the arguments of the na.locf() function:

  • A data frame with missing values.
  • The fromLast=-argument to indicate to replace missing values with the next value.

Without using the fromLast=-argument, the na.locf() replaces NA’s with the last, non-missing values. However, by setting fromLast=TRUE, you can replace NA’s with the next, non-missing values.

With respect to the first argument, i.e., the data frame, the na.locf() function has one big advantage over the setnafill() function. Namely, the na.locf() function works with both numeric and character data.

Below we provide an example.

# 1.2 Replace NA with Next Non-Missing Value with 'na.locf()' function
library(zoo)

my_data_corrected <- na.locf(my_data, fromLast = TRUE)
my_data_corrected

As the image above shows, the output data frame has one observation less than the original. Becuase the last observation contains a missing value, and hence it has no next value, the na.locf() function removes this row.

You can change this behaviour by adding the na.rm=TRUE argument to the na.locf() function. This argument tells R to keep all rows, even if a NA does not have a next value.

For example:

# 1.2 Replace NA with Next Non-Missing Value with 'na.locf()' function
library(zoo)

my_data_corrected <- na.locf(my_data, fromLast = TRUE, na.rm = FALSE)
my_data_corrected

3. Replace NA’s with Next Non-Missing Value with the fill() Function from the tidyr Package

The third (and best) method to replace NA’s in R with the next, non-missing value is with the fill() function from the tidyr package.

The tidyr package is part of the tidyverse collection and provides function to clean (i.e., tidy) your data. For example, by replacing missing values.

The fill() function requires 3 arguments to replace NA’s with the next, non-missing value:

  1. A data frame.
  2. A vector of column names.
  3. An instruction on how to replace the NA’s.

In general, the first argument (i.e., the data=-argument) is omitted when you use the pipe operator (%>%) from the dplyr package. With this operator, the output of a previous step (or first step) is used as the input of the next step. Therefore, it is not necessary to specify this argument.

The second argument defines in which columns you want to replace the NA’s. You can either specify them one by one, or use the names() function to select all columns from the data frame.

Lastly, you use the .direction=-argument to specify how to replace the missing values. In order to replace the NA’s with next, non-missing value, you use .direction=”up”. (Note mandatory dot at the beginning of the argument).

For example:

# 1.3 Replace NA with  Non-Missing Value with 'fill()' function
library(tidyverse)

my_data %>% 
  fill(names(.),.direction = "up")

Like the setnafill() function and the na.locf() function, the fill() function does not replace the last missing value in the second column. However, you can replace this value with the previous value by setting the .direction=-argument to “updown“.

By using .direction=”updown”, R first tries to replace NA’s with the next, non-missing value. But, if such as value does not exists, the fill() function replaces the NA with last, non-missing value, instead.

For example:

# 1.3 Replace NA with  Non-Missing Value with 'fill()' function
library(tidyverse)

my_data %>% 
  fill(names(.),.direction = "updown")

Instead, if you want to replace the NA’s with the last, non-missing value, you can use .direction=”down”.

How to Replace NA’s with the Next Non-Missing Value per Group in R

Lastly, we demonstrate how to replace NA’s in R with the next, non-missing value per group.

For this purpose we first create a data frame that we will use in the examples. Again, this data frame has 6 observations and 4 columns. However, this time, we’ve split the data frame in 2 groups of 3 rows (i.e., group A and B).

my_data <- data.frame(my_groups = c("A", "A", "A", "B", "B", "B"),
                      x1 = c(1,2,NA,4,NA,6),
                      x2 = c(0,NA,0,1,0,NA),
                      x3 = c(NA,4,6,NA,NA,10),
                      x4 = c("A", NA, "C", NA, "E", "F")) 
my_data

The goal is to replace the NA’s with the next, non-missing value of the same group. These are the steps:

  1. Specify the input data frame.
  2. Create a grouped data frame using the group_by() function. If the groups are defined by multiple columns, then the column names must be separated by columns.
  3. Use the fill() function to replace the NA’s.
  4. Ungroup the data with the ungroup() function.

The fill() function requires two additional arguments to replace the NA’s with the next, non-missing values, namely:

  1. The columns in which you want to replace the NA’s.
  2. An instruction on how to replace the missing values.

For the first argument, you can create a vector with the columns names. However, if you want to replace the NA’s in all columns, you can use the names() function instead. This function creates a vector that contains all column names of your data frame.

The second argument specifies how to replace the NA’s. If you want to replace NA’s with the next, non-missing value, you use .direction=”up”.

For example:

# 2.1 Replace NA with  Non-Missing Value by Group ("up")
library(tidyverse)

my_data %>% 
  group_by(my_groups) %>% 
  fill(names(.), .direction = "up") %>% 
  ungroup()

As the image above demonstrates, all but 2 NA’s where replaced. The 2 missing values that remained are the last values of a group, and therefore do not have a next value. However, you can still replace them by setting the .direction=-argument to “updown”.

By using .direction=”updown”, the fill() function first tries to replace the NA’s with next, non-missing value. However, if this is not possible, it uses the last, non-missing value instead.

For instance:

# 2.2 Replace NA with  Non-Missing Value by Group ("updown")
library(tidyverse)

my_data %>% 
  group_by(my_groups) %>% 
  fill(names(.), .direction = "updown") %>% 
  ungroup()