As a result of importing data or combining rows, you might have a data frame with empty columns. In general, you don’t need these columns. So, in this article, we discuss how to identify and remove empty columns from a data frame in R?
Before we continue and discuss 3 easy ways to recognize and delete empty columns, we need to define what an empty column is. In this article, an empty column is a column where all values are either NA’s or “”. However, you can easily modify all examples if your definition of an empty column is different.
In our example, we use the data frame below. It has 4 columns of which 2 follow our definition of an empty column, namely X2 and X4.
The goal is to write R code that removes the empty columns without explicitly specifying their names. This makes the code flexible and reusable.
You can create this data frame with the following code.
my_df <- data.frame(X1 = c(1:10), X2 = rep(NA, 10), X3 = letters[seq( from = 1, to = 10 )], X4 = rep("",10)) my_df
How to Remove Empty Columns in R with ColSums
The first method to delete all empty columns from a data frame uses only basic R code.
These are the steps to remove empty columns:
1. Identify the empty columns
You can identify the empty columns by comparing the number of rows with empty values with the total number of rows. If both are equal, that the column is empty.
You can use the colSums() function to count the empty values in a column. With the code below we count the rows with NA’s or “” from our example data frame.
> colSums(is.na(my_df) | my_df == "") X1 X2 X3 X4 0 10 0 10
Next, we use the nrow() function to compare the outcome with the total number of rows, and create a boolean variable that indicates if a column is empty (True) or not (False)
> empty_columns <- colSums(is.na(my_df) | my_df == "") == nrow(my_df) > empty_columns X1 X2 X3 X4 FALSE TRUE FALSE TRUE
2. Remove empty columns
Now that we have identified the empty columns, we can easily delete them with the following code.
The R code below shows all the steps in two simple lines of reusable code.
empty_columns <- colSums(is.na(my_df) | my_df == "") == nrow(my_df) my_df[, !empty_columns]
How to Remove Empty Columns in R with sapply
The second method to remove empty columns from an R data frame uses the sapply() function.
The sapply() function takes a data frame as input and applies a specific operation to all columns. In this case, the operation checks if all values of a column are missing.
You can check if a column is empty with the all() function. If all the values of a column are NA’s or missing, then the all() function returns TRUE.
Once you have identified all empty columns you can easily remove them. Note that, you need the !-operator to keep the columns that are not empty.
empty_columns <- sapply(my_df, function(x) all(is.na(x) | x == "")) my_df[, !empty_columns]
How to Remove Empty Columns in R with discard
The third method to delete all empty columns from a table in R uses the discard() function.
The discard() function is part of the tidyverse package and removes all columns from a data frame that don’t comply with a specific condition. For example, complete emptiness.
To remove all empty columns from an R data frame with the discard() function, you only need to identify the empty columns. You can recognize them with the all() function. Once you have identified them, the discard() function removes them automatically.
The R code below demonstrates how to apply the discard() function to remove empty columns. The first line of code loads the tidyverse package.
if (!require("tidyverse")) install.packages("tidyverse") my_df %>% discard(~all(is.na(.) | . ==""))