How to Replace NA’s with Last Non-Missing Value in R [Examples]

This article discusses how to replace missing values (i.e., NA’s) in an R data frame with the last, non-missing value (by group).

Missing values bother. They make that functions don’t work, cause incorrect conclusions, or reduce the statistical power of your model. Fortunately, R provides various methods that make dealing with NA’s a piece of cake.

Commonly, you can deal with missing values in two ways. Namely, you either remove observations that have one or more NA’s or replace them. In the latter case, you have a wide plethora of options.

For example, you can replace missing values with zeros or the average. Another frequently used option is to replace NA’s with the last, non-missing value. This can either be the value of the previous observation, or the value of an observation many lines above if all in-between observations are missing too.

In R, the easiest way to replace NA’s with the last, non-missing value is by using the fill() function from tidyr package. This function detects and substitutes missing values in a data frame with the last, non-missing value (per group). Alternatively, you can use the na.locf() function or the setnafill() function.

In this article, we discuss the fill(), na.locf(), and setnafill() functions and provide snippets with example code that you can use directly in your projects.

3 Ways to Replace NA’s with the Last Non-Missing Value in R

Before we start our discussion, we first create a data frame that we will use in the examples.

This data frame has 6 observations and 4 columns. The first 3 columns are numeric, while the last column contains character data. Also, in contrast to the other columns, the third column starts with a missing value, which makes substituting it with the last, non-missing value impossible.

my_data <- data.frame(x1 = c(1,2,NA,4,NA,6),
                      x2 = c(0,NA,0,1,NA,NA),
                      x3 = c(NA,4,6,NA,NA,10),
                      x4 = c("A", NA, "C", NA, "E", "F"))
An R data frame with missing values.

These are the 3 methods you can use to replace NA’s in R with the last, non-missing value.

1. Replace NA’s with the Last Non-Missing Value using the na.locf() Function from the zoo Package.

The first way to impute missing values with the last, non-missing value uses the na.locf() function from the zoo package.

Although this package’s main focus is time series data, it also provides the na.locf() function which stands for “Last Observation Carried Forward“. And, even though it might not be directly clear from its name, the na.locf() carries the last, non-missing observation instead of any observation. Therefore, this function is perfect for replacing NA’s with the previous, non-missing value.

The na.locf() function has one mandatory argument. Namely the object=-argument, which can be a vector, a data frame, etc.

The code snippet below shows how to use the na.locf() function and replace missing values with the last, non-missing value.

library(zoo)

my_data_corrected <- na.locf(object = my_data)
my_data_corrected
Replace NA's in R with last, non-missing value using the na.locf() function (without na.rm=-option).

As you can see in the image above, the first observation is missing. Note that this observation had a missing value in the third column.

By default, the na.locf() function removes the first observation if one of the columns contains a missing value. In order to keep all observations, you can use the na.rm=-argument and set it to FALSE. By using na.rm=FALSE, the new data frame contains all observations, and missing values in the first observation are not changed.

For example:

library(zoo)

my_data_corrected <- na.locf(object = my_data, na.rm = FALSE)
my_data_corrected
Replace NA's in R with last, non-missing value using the na.locf() function (with na.rm=-option).

2. Replace NA’s with the Last Non-Missing Value using the setnafill() Function from the data.table Package.

The second way to replace NA’s in R with the last, non-missing value uses the setnafill() function from the data.table package.

This package provides functions to modify data frames (especially large ones) in an efficient way. One of these functions is the setnafill() function which fills missing values with a constant value, the last value, or the next value.

In order to use the setnafill() function, you have to provide 2 mandatory arguments:

  1. x: a vector, list, data frame, or data table of numeric values.
  2. type: a string that specifies how to replace missing values. Use “locf” to replace NA’s with the last, non-missing value.

A serious disadvantage of the setnafill() function is that it can only handle numeric data. Therefore, if your data frame has both numeric and character data, you need to subset the numeric data before you can use this function.

On the other hand, the setnafill() has also one big advantage. Namely, you can use this function not only to replace NA’s with the last, non-missing value but also with the next, non-missing value. In a separate article, we discuss in more detail how to substitute NA’s with the next, non-missing value.

A second advantage of the setnafill() function is that, by default, it does not remove the first observation if it contains a missing value. In such cases, it keeps the first row and leaves the missing value(s) untouched.

The code snippet below shows how to use the setnafill() function. First, we subset our data set to select only the first three, numeric columns. Then, we use the type=-argument to specify how to replace the missing values.

library(data.table)

my_data_corrected <- setnafill(x = my_data[,c(1:3)], type = "locf")
my_data_corrected
Replace NA's in R with the last, no-missing values using the setnafil() function.

3. Replace NA’s with the Last Non-Missing Value using the fill() Function from the tidyr Package.

The third and best way to replace NA’s in R with the last, non-missing values is by using the fill() function from the tidyr package.

The tidyr package is part of the tidyverse and helps you to tidy (i.e., clean) your data. For example, with the fill() function that fills missing values with the previous or next, non-missing value. Because this function is part of the tidyverse, you can also use it in combination with functions from the dplyr package.

In order to use the fill() function, you need to provide 3 arguments:

  1. A data frame. Normally, you can omit this argument if you provide a data frame using a pipe operator (i.e., %>%). (See example below).
  2. One or more column names from the data frame.
  3. A direction that defines how to replace NA’s. This can be “down”, “up, “downup”, or “updown”.

For the .direction=-argument (note the ‘.’), we use “down” (or “downup“) to replace NA’s with the last, non-missing values. However, if you want to replace NA’s with the next, non-missing value, you can use “up” or “updown”.

Compared to the other methods, the fill() function is the best to replace NA’s in R with the last, non-missing value, because:

  • It can both use numeric and character data.
  • It does not remove the first row if it is incomplete (i.e., contains at least one NA).
  • It can replace NA’s with the last or next, non-missing value.

Moreover, it provides also ways to replace NA’s if there is no last (or next), non-missing value. For example, in the case of a NA in the first observation.

The example below demonstrates how to use the fill() function. We use .direction = “down” to replace NA’s with the last, non-missing value. Instead of specifying all column names, we use the names() function to select all columns in an efficient way.

library(tidyverse)

my_data %>% 
  fill(names(.),.direction = "down")
Replace NA's with the last, non-missing value in R using the fill() function.

As you can see in the image above, the fill() function did the trick. However, the first observation in the third column is still missing.

If you want to replace all missing values, even if the last, non-missing value does not exist, then you can use .direction = “downup” instead. Setting this argument to “downup“, the fill() function tries to replace NA’s first with the last, non-missing value. However, if this value does not exist, it uses the next, non-missing value instead.

For example:

library(tidyverse)

my_data %>% 
  fill(names(.),.direction = "downup")
Replace NA's with the last, non-missing value in R using the fill() function. With direction = downup.

Replace NA’s with the Last Non-Missing Value by Group in R

To conclude this discussion, we demonstrate how to replace NA’s in R with the last, non-missing value per group.

First, we create a data frame that we will use in the examples. This data frame is the same as the one we’ve used to support the methods above. However, this time the column my_groups separates the data frame into 2 groups of 3 observations (group A and group B).

my_data <- data.frame(my_groups = c("A", "A", "A", "B", "B", "B"),
                      x1 = c(1,2,NA,4,NA,6),
                      x2 = c(0,NA,0,1,NA,NA),
                      x3 = c(NA,4,6,NA,NA,10),
                      x4 = c("A", NA, "C", NA, "E", "F")) 
A grouped R data frame.

As for ungrouped data, the best way to replace NA’s in grouped data is with the fill() function from the tidyr package.

These are the steps to replace NA’s in R with the last, non-missing value per group (using tidyverse)

  1. Specify the name of the input data frame.

  2. Group your data.

    You use the group_by() function to create a grouped data frame. This function contains the column names that define the groups. If your groups are defined by multiple columns, then you must separate the column names with a comma.

  3. Use the fill() function and set its arguments to replace the NA’s

  4. Set the columns argument

    You specify the columns in which you want to replace the missing values with the first argument. In order to select all columns, you can use names(.).

  5. Set the .direction argument

    You specify the way you want to replace the NA’s with the .direction=-argument. In order to replace NA’s with the last, non-missing value, you use .direction = “down”.

  6. Ungroup your data.

    Use the ungroup() function to ungroup the data in your data frame.

For example:

my_data %>% 
  group_by(my_groups) %>% 
  fill(names(.), .direction = "down") %>% 
  ungroup()
Replace NA's with the last, non-missing value per group in R.

As the image above shows, some values are still missing. In this case, the last, non-missing value does not exist. However, if you still want to replace these NA’s you can use the .direction=-argument and set it to “downup“.

By using .direction=”downup”, the fill() function first looks for the last, non-missing value of that group. However, if this value does not exist, it uses the next, non-missing value of the group instead.

For example:

my_data %>% 
  group_by(my_groups) %>% 
  fill(names(.), .direction = "downup") %>% 
  ungroup()
Replace NA's with the last, non-missing value per group in R.