3 Ways to Drop Rows with NA’s in One/Some/All Columns in R [Examples]

In this article, we discuss 3 ways to remove rows from an R data frame with NA’s (i.e., missing values) considering one, multiple, or all columns.

Normally, you first identify columns with missing values and then decide what to do. You either replace the NA’s (e.g., with a zero) or you remove the entire row. In this article, we demonstrate how to do the latter in 3 different ways.

The best way to drop rows with NA’s in R is by using the drop_na() function from the tidyr package. This function inspects one or more columns and removes an observation if at least one value is missing. Alternatively, you can use the complete.cases() function or a combination of functions from the dplyr package.

In this article, we discuss the 3 methods and their (dis)advantages. Moreover, we provide R code that you can easily reuse in your projects.

Sample Data

The data frame we use for the examples in this article has 5 rows and 6 columns. The first column contains just an identifier whereas the columns x1 to x5 contain some type of fictitious score. Most scores are available, but some are missing.

The goal is to drop the rows that contain NA’s considering one, multiple, or all columns.

``````my_data <- data.frame(id = LETTERS[1:5],
x1 = c(3, 6, 4, 2, NA),
x2 = c(5, NA, NA, 0, 7),
x3 = c(1, 4, NA, 9, 2),
x4 = c(7, 2, 9, 3, 1),
x5 = c(5, NA, 5, 8, NA))``````

1. Remove Rows with NA’s in R using complete.cases()

The first option to remove rows with missing values is by using the complete.cases() function.

The complete.cases() function is a standard R function that returns are logical vector indicating which rows are complete, i.e., have no missing values.

By default, the complete.cases() function considers all columns when assessing if a row is complete. However, by providing an additional argument, this function limits its analysis to just one or more columns. Typically, this argument is a vector of column names or a range of column positions that you don’t want to have NA’s.

By using the bracket notation [] and the logical vector that the complete.cases() function returns, you can easily select only the complete cases. Or, in other words, remove the rows with missing values.

Below we give some examples of how to use the complete.cases() function.

Remove Rows with NA’s in All Columns

In the first example, we want to keep only those rows that are complete. Put in another way, we want to remove the rows with at least one missing value.

By providing the complete.cases() function a data frame as its argument, the function checks for each row if the row is complete (i.e., has no missing values). If so, it returns TRUE, otherwise, it returns FALSE.

For example, in the data frame below, complete.cases() returns the logical vector (TRUE, FALSE, FALSE, TRUE, FALSE) because only the first row and the fourth row are complete.

Next, one can use the logical vector and the bracket notation [] to remove all the rows with at least one missing value.

``my_data[complete.cases(my_data),]``

Remove Rows with NA’s in Multiple Columns

Instead of considering all columns, you can also use the complete.cases() function to remove rows with NA’s in some columns.

Normally, the complete.cases() function takes a complete data frame as its argument. However, by proving just a subset of all columns, the complete.cases() function considers only those columns.

For instance, in the R code below, we check for complete rows (i.e., rows without NA’s) considering only columns x2 and x3. As a result, the complete.cases() function returns the vector (TRUE, FALSE, FALSE, TRUE, TRUE) because the second row and the third row contain a NA in column x2 or x3 (or both).

``my_data[complete.cases(my_data[,c("x2","x3")]),]``

Remove Rows with NA’s in a Specific Column

Similar to the example above, you can also remove rows that have a missing value in one specific column (without considering the values in the other columns).

Again, you can use the complete.cases() function. As its argument, you provide the column you don’t want to have missing values. Next, the function returns a logical vector which helps you to delete the rows with an NA in the evaluated column.

For example, in the data frame below, we consider only the column x1. The complete.cases() function returns the logical vector (TRUE, TRUE, TRUE, TRUE, FALSE) because only the fifth row has a missing value. Next, we use the square brackets [] to remove this row.

``my_data[complete.cases(my_data[,c("x1")]),]``

2. Drop Rows with NA’s using tidyr

The second method to drop rows with NA’s in R is by using the tidyr package.

The tidyr package provides tools to easily “tidy” your data. In other words, it helps you to create a clean data set. For example, by removing missing data with the drop_na() function.

The drop_na() function is the best way to remove rows from an R data frame with NA’s in any specified column. It inspects one or more columns for missing values and drops the corresponding row if it finds an NA. Besides its intuitiveness, the drop_na() function is also compatible with other tidyverse functions.

Before you can use the drop_na() function, you need to (install and) load the tidyr package.

``library("tidyr")``

Drop Rows with NA’s in All Columns

By default, the drop_na() function checks all columns for missing data. If it finds one or more NA’s in a specific row, then it removes that record from the data frame. In other words, it only keeps complete rows.

The data frame below has 5 observations of which 3 have one or more missing values. Therefore, the drop_na() function removes these rows from the data frame.

``````my_data %>%
drop_na()``````

Drop Rows with NA’s in Some Columns

Instead of considering all columns when deciding which rows to keep and which to remove, the drop_na() function can also limit its scope to a subset of all the columns.

To limit the scope of the drop_na() function, you provide a list of column names or column positions. For instance, if you only want to consider columns x2 and x3, then you can use the vector c(“x2”, “x3”) as the argument of the function.

By using the vector c(“x2”, “x3”) as its argument, the drop_na() checks only for NA’s in these columns and ignores any missing values in the other columns. Therefore, in the example below, it removes only rows 2 and 3. For all other rows, the columns x2 and x3 don’t have missing values.

``````my_data %>%
drop_na(c("x2", "x3"))``````

Drop Rows with NA’s in One Column

Similarly, you can also use the drop_na() function to check for missing values in just one column to remove rows.

As its argument, you need to provide the drop_na() function with the column (name or position) you want to inspect. For example, with the following R code, we base our decision to keep or remove a row whether it has an NA in column x1.

``````my_data %>%
drop_na(c("x1"))``````

As the image shows, only the fifth row has a missing value in column x1 and is therefore removed by the drop_na() function.

3. Drop Rows with NA’s using dplyr

The third method to remove rows based on missing values in one or more columns uses the dplyr package.

Like the tidyr package, the dplyr package is part of the tidyverse and helps you to clean your data in an easy-to-follow way. Also, you need to (install and) load the dplyr package before you can use it.

``library("dplyr")``

Although you can use dplyr to remove observations with NA’s, the tidyr remains our preferred method. The drop_na() function from the tidyr package is especially built to carry out this task. In contrast, if you use the dplyr package, you (might) need additional functions which makes your code harder to read.

Drop Rows with NA’s in All Columns

To remove observations with missing values in at least one column, you can use the na.omit() function.

The na.omit() function in the R language inspects all columns from a data frame and drops rows that have NA’s in one or more columns. Because this function doesn’t require additional argument, it’s easy to understand. Moreover, the na.omit() function is compatible with the dplyr pipe (i.e., %>%).

The example below shows how to use the na.omit() function. Because rows 2, 3, and 5 contain one or more missing values, the na.omit() function eliminates these observations from the data frame.

``````my_data %>%
na.omit()``````

Drop Rows with NA’s in Some Columns

Unlike the drop_na() function, the na.omit() function can’t inspect a subset of all columns. It always considers all column. Therefore, you need another approach to remove observations with missing values in just some columns, ignoring the values in the remaining ones.

To drop rows with NA’s in some specific columns, you can use the filter() function from the dplyr package and the in.na() function. First, the latter one determines if a value in a column is missing and returns a TRUE or FALSE. Next, the filter function drops all rows with an NA.

To check for missing values in multiple columns, you need the &-sign.

The R code below shows how to use the filter() function and the is.na() function to remove rows with missing values in the columns x2 and/or x3.

``````my_data %>%
filter(!is.na(x2) & !is.na(x3))``````

Drop Rows with NA’s in One Column

Instead, if you want to remove observations with an NA in one specific column with dplyr you can also use the functions is.na() and filter(). See the example below.

``````my_data %>%
filter(!is.na(x1))``````