3 Ways to Count the Number of NA’s per Column in R [Examples]

In this article, we demonstrate 3 ways to count the number of NA’s per column in R.

Missing values can occur because of various reasons. Normally, you want to replace them (e.g., with zeros), but sometimes you just want to count them.

One kind of counting the number of NA’s is row-wise. That is to say, to count the frequency of the missing values per row. On the contrary, you can also count the number of NA’s per column (i.e., column-wise).

Although there exist many ways to count the number of missing values per column in R, the easiest approach is by using the colSums() function and the is.na() function. Combining these functions will show for each column name the number of NA’s it contains. Alternatively, one can also use the sapply() function or functions from the dplyr (tidyverse) package.

In this article, besides the colSums() function, we demonstrate other methods to count the NA’s per column. For example, with the dplyr package. We support all methods with examples that you can use directly in your R projects.

For the examples in this article, we use the following data frame.

my_df <- data.frame(x1 = c(1, 2, NA, 4, NA),
                    x2 = c("A", NA, "C", "D", "E"),
                    x3 = c(NA, 20, NA, NA, 50))
R data frame with missing values

This data frame has 5 rows and 3 columns of which at least one value is missing. Also, note that columns x1 and x3 are numeric, whereas column x2 contains characters.

Count the Number of Missing Values in a Specific Column

The easiest way to count the number of NA’s in R in a single column is by using the functions sum() and is.na().

The is.na() function takes one column as input and converts all the missing values into ones and all other values into zeros. Then, using the sum() function, one can sum all the ones and thus count the number of NA’s in a column.

The advantage of this approach is that it’s easy to understand and that it works for all types of columns (numeric, character, etc.). However, if you want to know the number of missing values for many columns, this method requires one line of code per column (not optimal!).

Example

sum(is.na(my_df$x1))
Count the number of NA's in a specific column

3 Ways to Count the Number of NA’s per Column

In contrast to the section above, here we demonstrate 3 ways to find the number of NA’s of all columns in a data frame.

We briefly explain how each method works, discuss its (dis)advantages and show an example.

1. Count the number of Missing Values with summary

A quick way to find the number of NA’s per column in R is by using the summary() function.

The summary() function is a generic R Base function that summarizes to most important information per column. For numeric columns, it shows (amongst others) the minimum, the maximum, and the number of missing values.

However, for character columns, it provides only the number of rows. Hence, the summary() function does not calculate the number of NA’s for character columns.

Another disadvantage of the summary() function is that it returns a table of character data. Therefore, you can’t easily use the results as input for other operations.

Nevertheless, the summary() function is easy to use and requires just one argument, namely a data frame.

summary(my_df)
R count the number of missing values per column with the summary function

2. Count the number of Missing Values with sapply

The second method to find the number of missing values in the columns of an R data frame is by using the sapply() function.

The sapply() function is part of the apply family and allows users to iterate over the columns of a data frame performing the same operation. For example, counting the number of NA’s.

An advantage of the sapply() function is that it’s relatively fast compared to its alternative (the for-loop). However, the syntax of the sapply() function might be difficult to read. Especially for new R-users.

The sapply() needs two arguments, namely:

  1. A data frame
  2. An operation (i.e., function) to be performed on all columns of the data frame.

The second argument (i.e., the operation) might need some extra explanation.

The operation can be either a generic R function (e.g., min, max, sum, etc.) or a user-defined function. Since there exists no generic R function to count the number of NA’s per column, you should create this function first.

You can create this user-defined function either before calling the sapply() function or define it directly within the sapply() function.

We will use the function sum(is.na(x)), where the x represents one column of the data frame.

See the example below.

sapply(my_df, function(x) sum(is.na(x)))
Count the number of NA's per column in R with the sapply function

As the image above shows, an advantage of this approach is that the sapply() function finds the number of NA’s in both numeric as character columns.

3. Count the number of Missing Values with colSums

The best way to count the number of NA’s in the columns of an R data frame is by using the colSums() function.

As the name suggests, the colSums() function calculates the sum of all elements per column. However, to count the number of missing values per column, we first need to convert the NA’s into ones and all other values into zeros before we can sum them. For this, we can use the is.na() function.

Combining the functions is.na() and colSums() to find the number of NA’s per column has 3 advantages:

  1. It is easy to read and understand.
  2. It is fast.
  3. It works for both numeric and character columns.

Below we provide an example.

colSums(is.na(my_df))
Count the number of NA's in R with the colSums function

Count the Number of NA’s per Column with dplyr

Lastly, we show a way to count the number of NA’s per column using the dplyr package.

The dplyr package (part of the Tidyverse) provides tools to manipulate your data in a readable way. Moreover, with the pipe operator (i.e., %>%), you can combine multiple operations in a sequence.

To count the number of missing values per column with dpyr, you need the summarise_all() function. This function summarises an important fact (e.g., the number of NA’s) per column. Besides, the summarise_all() function, you also need the functions sum() and is.na().

The advantages of using the tidyverse language to calculate the number of NA’s per column are:

  1. It is easy to read.
  2. It works for both numeric and character columns.

On the other hand, you need to install and/or load the dplyr package first. So, if you don’t want to install additional packages, this method is not for you.

Example:

library(dplyr)
my_df %>% 
  summarise_all(~sum(is.na(.)))
Use the dplyr package to count the number of NA's per column in R