In this article, we discuss 3 ways to count the number of NA’s per row in an R data frame.
If you work with data, then sooner or later you will encounter missing values (i.e., NA’s). Missing values can cause R to show errors, or even worse, provide incorrect results. Therefore, it is crucial to identify the NA’s as soon as possible.
In a previous post, we showed how to find the columns with missing values. Now, we will focus on counting the NA’s.
There are two types of counting missing values, i.e., per column (column-wise) or per row (row-wise).
In R, the easiest way to find the number of missing values per row is a two-step process. First, the is.na() function assesses all values in a data frame and returns TRUE if a value is missing. Then, the rowsSums() function counts the number of TRUE’s (i.e., missing values) per row. Alternatively, you could use a user-defined function or the dplyr package.
Next, we will show 3 ways to find the number of NA’s per row in a data frame. We support each method with an example and the R code.
3 Ways to Count the Number of Missing Values per Row
For the examples in this article, we use a simple data frame that has 5 rows and 5 columns of mixed data types (i.e., numeric and character).
my_df <- data.frame(x1 = c(1, 2, 3, 4, NA), x2 = c(1, 0, NA, 0, NA), x3 = c("A", NA, "B", "C", "A"), x4 = c(NA, NA, "A", "B", "B"), x5 = c(5, NA, 4, 2, 1))
As the image below shows, the 5 rows have 1, 3, 1, 0, and 2 missing values, respectively. The goal is to add a new column to the data frame with these occurrences.
1. Count the Number of NA’s per Row with rowSums()
The first method to find the number of NA’s per row in R uses the power of the functions is.na() and rowSums().
Both the is.na() function and the rowSums() function are R base functions. Therefore, it is not necessary to install additional packages. This makes this method ideal for those who are new to R.
These are the steps to find the number of missing values per row in an R data frame:
- Convert the original data frame into a TRUE/FALSE matrix
In this new matrix, the TRUEs and FALSEs represent missing and non-missing values, respectively. You can use the is.na() function for this purpose.
- Count the number of TRUEs (i.e., missing values) per row
You can use the rowSums() function to do this. As the name suggests, this function sums the values of all elements in a row. Since TRUEs are equal to 1 and FALSEs are equal to 0, summing the number of TRUEs is the same as counting the number of NA’s.
- (Optionally) Save the outcome in a new column
The rowSums() function returns a numeric vector. Therefore, you can save the values in a new column and add them to the original data frame. You can do this with the $-operator.
The R code below shows an example of the steps above.
my_df$count_na <- rowSums(is.na(my_df))
The image proves that R with this method correctly identifies the number of NA’s per row.
If you want to count the number of missing values per row from a subset of all columns, you can use the bracket notation. For example, with the next R code, we count the number of NA’s in the first 3 columns.
my_df$count_na_x1tox3 <- rowSums(is.na(my_df[,1:3]))
As you can see, R takes only the first 3 columns into account and ignores the remaining 2.
2. Count the Number of NA’s per Row with apply()
The second method to count the number of NA’s uses a user-defined function and the apply() function.
First, you create your own function that counts the number of NA’s in a vector. Next, you use the apply() function to loop through the data frame, create a vector of each row, and pass it to the user-defined function. As a result, R counts the number of missing values for each row.
The apply() function plays an important role in this method and has 3 parameters, namely:
- The input data.
- An indicator that specifies how to loop trhough the data. (1=row-wise, 2=column-wise).
- An operation that should be performed on the row or column.
In the example below, we show how to combine these steps.
count_na_func <- function(x) sum(is.na(x)) my_df$count_na <- apply(my_df, 1, count_na_func)
First, we created a user-defined function called count_na_func that counts the number of NA’s in a vector. Then, we use the apply() function to loop row-wise through our data frame my_df and pass each row to the count_na_func function.
Like the previous method, this method can also count the number of NA’s in a subset of all columns. See the example below.
my_df$count_na_x1tox3 <- apply(my_df[,1:3], 1, count_na_func)
3. Count the Number of NA’s per Row with dplyr
The third method to count the number of NA’s per row in R requires the most code. However, it has the advantage that you can use the pipe operator from the dplyr package. Therefore, this method is the best option if you want to carry out other operations besides counting the number of NA’s.
These are the steps:
- Load the dplyr package.
- Create a used-defined function that counts the number of NA’s in a vector.
- Specify the name of your data frame and pass it through to the next step with the pipe operator
- Use the mutate() function to create a new column.
- Use the apply() function and the user-defined function to define the new column.
The next R code shows how to combine these steps and count the NA’s per row.
library(dplyr) count_na_func <- function(x) sum(is.na(x)) my_df %>% mutate(count_na = apply(., 1, count_na_func))
The dot as the first argument of the apply() function represents the input data. That is to say, the data frame my_df.
Likewise, you can use this method also to count the number of NA’s in a subset of all columns.
my_df %>% mutate(count_na_x1tox3 = apply(.[1:3], 1, count_na_func))