3 Ways to Remove Columns by Name in R

This article discusses how to remove columns from an R data frame by their name.

There are many reasons why you want to remove columns from a data frame. For example, you’ve created a temporary column and now want to eliminate it. Similarly, you might want to drop columns with duplicate information or columns that are empty.

Like there are many reasons to remove columns, there are also many ways to drop columns in R. In general, you can remove columns in two ways, namely by their position or by their name. In this article, we focus on the latter.

In R, the easiest way to remove columns from a data frame based on their name is by using the %in% operator. This operator lets you specify the redundant column names and, in combination with the names() function, removes them from the data frame. Alternatively, you can use the subset() function or the dplyr package.

In this article, we show the details of how to use these 3 methods and provide examples and code snippets. For the examples, we will use the following data frame.

This data frame has 6 columns and 5 rows. As you can see, the column names have some similarities which will result useful when demonstrating how to drop columns based on a specific pre- or suffix.

1. Remove Columns by Name with the %in% Operator

The first method in R to remove columns by their name uses the %in% operator and the names() function.

First, you create a vector that contains the names of the columns you want to remove. You must write the names between (double) quotes and separate them with commas.

Then, you use the names() function the obtain all column names of your data frame. Alternatively, you can use the colnames() function.

Finally, you use the !-operator and the %in% operator to create a TRUE/FALSE vector that indicates which columns must be removed. If you omit the !-operator, R drops all columns except those you want to remove.

The R code below contains an example.

# 1. Remove columns with %in% operator
my_df <- data.frame(char1 = letters[1:5],
                    char2 = letters[22:26],
                    char3 = c("@", "$", "%", "&", "!"),
                    num1 = c(1:5),
                    num2 = seq(10,50, by = 10),
                    num3 = c(0,1,0,1,0))
my_df

to_remove <- c("char1", "char2")
my_df[ , !(names(my_df) %in% to_remove)]

In the code above, we first create a new variable to_remove which contains the names of the columns we want to eliminate (i.e., char1 and char2.). Then, we use the names() function and the %in%-operator to drop them.

Remove columns in R with the %in% operator

Note that, in this case, we only print the outcome of the operation. We didn’t create a new data frame with the 4 remaining columns. If you want to do so, you need to use the assign (i.e., <-) operation.

2. Remove Columns by Name with the subset() Function

The second option to drop columns from a data frame is by using the subset() function.

This function is primarily used to select (i.e., keep) columns from a data frame. However, with a simple modification, you can also use the subset() function to remove columns.

The subset() function has two mandatory arguments, namely:

  1. A data frame from with you want to remove columns.
  2. The names of the columns you want to remove.

You can select the columns that need to be eliminated with the select=-option. To do so, you need to provide a vector with the column names (without quotes!). By adding a minus sign before this vector, the select=-option drops to columns instead of keeping them.

In the example below, we use the subset() function to remove the columns char1 and num1.

# 2. Remove columns with subset()
my_df <- data.frame(char1 = letters[1:5],
                    char2 = letters[22:26],
                    char3 = c("@", "$", "%", "&", "!"),
                    num1 = c(1:5),
                    num2 = seq(10,50, by = 10),
                    num3 = c(0,1,0,1,0))
my_df

subset(my_df, select = -c(char1, num1))
Drop columns in R with the subset() function

In contrast to the first method, the subset() function directly removes the columns from the original data frame. In other words, it overwrites you data frame directly.

3. Remove Columns by Name with the dplyr Package

The last method to remove columns in R based on their name uses the dplyr package.

The dplyr package (part of the tidyverse) is an extremely useful package for data modification. It provides functions with intuitive names to carry out all kinds of operations. For example, selecting columns, filtering groups, grouping observations, etc.

Alike, you can use the select() function from the dplyr package to remove columns. Originally, this function was designed to keep columns from a data frame. However, with a simple modification, you can also use it to remove columns (based on their name).

These are the steps to remove columns from a data frame based on their name using the dplyr package.

  1. First, you specify the name of your data frame.
  2. Then, you use the %>%-operator to pass this data frame on to the select() function.
  3. Lastly, you provide the select() function with a vector of column names you want to remove, preceded by a minus sign.

The following R code snippet contains an example.

# 3. Remove columns with dplyr
my_df <- data.frame(char1 = letters[1:5],
                    char2 = letters[22:26],
                    char3 = c("@", "$", "%", "&", "!"),
                    num1 = c(1:5),
                    num2 = seq(10,50, by = 10),
                    num3 = c(0,1,0,1,0))
my_df

library(dplyr)
my_df %>% 
  select(-c("char3", "num3"))
Remove columns by name in R with the dplyr packages

This method provides the same result as the two other methods. However, the dplyr package becomes extremely useful when you want to eliminate columns with a specific pre- or suffix.

Remove Column Names that Start with a Specific String (Prefix)

The methods discussed above are handy if you want to drop a limited number of columns. However, because you must specify the name of each column explicitly, these methods are not very efficient. Fortunately, R provides some additional functions that make your life easier.

For example, the starts_with() function creates a vector of all column names that start with a specific pattern of characters (i.e., prefix). Then, in combination with the select() function, you can efficiently remove many columns with little effort.

For instance, the code below uses starts_with(“num”) to eliminate all columns that start with the pattern num. By doing so, we don’t need to specify the column names num1, num2, and num3 separately to remove them.

#3a. Remove columns that start with "num"
my_df <- data.frame(char1 = letters[1:5],
                    char2 = letters[22:26],
                    char3 = c("@", "$", "%", "&", "!"),
                    num1 = c(1:5),
                    num2 = seq(10,50, by = 10),
                    num3 = c(0,1,0,1,0))
my_df

library(dplyr)
my_df %>% 
  select(-starts_with("num"))
Remove columns by name in R with a specific prefix

Remove Column Names that End with a Specific String (Suffix)

Like the starts_with() function, the dplyr package also provide the ends_with() function.

As the name suggests, this function creates a vector of all column names that end with a specific pattern of characters (i.e., suffix). By combining this function with the select() function, you can easily remove many columns without explicitly mentioning their names.

For example, with the code below, we remove all columns that end with the character 3.

#3b. Remove columns that end with "3"
my_df <- data.frame(char1 = letters[1:5],
                    char2 = letters[22:26],
                    char3 = c("@", "$", "%", "&", "!"),
                    num1 = c(1:5),
                    num2 = seq(10,50, by = 10),
                    num3 = c(0,1,0,1,0))
my_df

library(dplyr)
my_df %>% 
  select(-ends_with("3"))
Drop columns in R with a specific suffix

Remove Column Names that Contain a Specific String

Lastly, you also use the function contains() to drop columns effortlessly.

In contrast to the starts_with() function and the ends_with() function, the contains() function checks for a pattern of characters throughout the complete string. Once you have identified the column names that match this pattern, it’s easy to eliminate them with the select() function.

As an example, we remove all columns that contain the pattern ar.

#3c. Remove columns that contain the string "ar"
my_df <- data.frame(char1 = letters[1:5],
                    char2 = letters[22:26],
                    char3 = c("@", "$", "%", "&", "!"),
                    num1 = c(1:5),
                    num2 = seq(10,50, by = 10),
                    num3 = c(0,1,0,1,0))
my_df

library(dplyr)
my_df %>% 
  select(-contains("ar"))
Remove all columns in R that contain a specific string

Related Content: